Examples on POWER

In the definitions/power/examples directory of the Microprobe distribution (if you installed the microprobe_target_power package), you will find different examples showing the usage of Microprobe for the power architecture. Although we have split the examples by architecture, the concepts we introduce in these examples are common in all the architectures.

We recommend users to go through the code of these examples to understand specific details on how to use the framework.

Contents:

isa_power_v206_info.py
power_v206_power7_ppc64_linux_gcc_profile.py
power_v206_power7_ppc64_linux_gcc_fu_stress.py
power_v206_power7_ppc64_linux_gcc_memory.py
power_v206_power7_ppc64_linux_gcc_random.py
power_v206_power7_ppc64_linux_gcc_custom.py
power_v206_power7_ppc64_linux_gcc_genetic.py

isa_power_v206_info.py

The first example we show is isa_power_v206_info.py. This example shows how to search for architecture definitions (e.g. the ISA properties), how to import the definitions and then how to dump the definition. If you execute the following command:

> ./isa_power_v206_info.py

will generate the following output, which shows all the details of the POWER v2.06 architecture (first and last 20 lines for brevity):

--------------------------------------------------------------------------------
ISA Name: power_v206
ISA Description: power_v206
--------------------------------------------------------------------------------
Register Types:
     GPR: General Register (bit size: 64)
    VSCR: Vector Status and Control Register (bit size: 32)
     FPR: Floating-Point Register (bit size: 64)
     SPR: Special Purpose Register (64 bits) (bit size: 64)
      VR: Vector Register (bit size: 128)
     MSR: Machine State Register (bit size: 64)
   SPR32: Special Purpose Register (32 bits) (bit size: 32)
     VSR: Vector Scalar Register (bit size: 128)
   FPSCR: Floating-Point Status and Control Register (bit size: 32)
      CR: Condition Register (bit size: 4)
--------------------------------------------------------------------------------
Architected registers:
    AESR : AESR Register (Type: SPR)
    AMOR : AMOR Register (Type: SPR)
     AMR : Authority Mask Register (Type: SPR)
...
	access_storage              :	False	(Boolean indicating if the instruction has storage operands                                                          )
	access_storage_with_update  :	False	(Boolean indicating if the instruction accesses to storage and updates the source register with the generated address)
	algebraic                   :	False	(Boolean indicating if operation uses algebraic rules to keep values                                                 )
	branch                      :	False	(Boolean indicating if the instruction is a branch                                                                   )
	branch_conditional          :	False	(Boolean indicating if the instruction is a branch conditional                                                       )
	branch_relative             :	False	(Boolean indicating if the instruction is a relative branch                                                          )
	category                    :	VSX  	(String indicating if the instruction the instruction category                                                       )
	decimal                     :	False	(Boolean indication if the instruction requires inputs in decimal format                                             )
	disable_asm                 :	False	(Boolean indicating if ASM generation is disabled for the instruction. If so, binary codification is used.           )
	hypervisor                  :	False	(Boolean indicating if the instruction need hypervisor mode                                                          )
	privileged                  :	False	(Boolean indicating if the instruction is privileged                                                                 )
	privileged_optional         :	False	(Boolean indicating the instrucion is priviledged or not depending on the input values                               )
	switching                   :	None 	(Input values required to maximize the computational switching                                                       )
	syscall                     :	False	(Boolean indicating if the instruction is a syscall or return from one                                               )
	trap                        :	False	(Boolean indicating if the instruction is a trap                                                                     )


 Instructions defined: 938 
 Variants defined: 964 
--------------------------------------------------------------------------------

The following code is what has been executed:

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
isa_power_v206_info.py

Example module to show how to access to isa definitions.
"""

# Futures
from __future__ import absolute_import, print_function

# Built-in modules
import os

# Own modules
from microprobe.target.isa import find_isa_definitions, import_isa_definition

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Constants
ISANAME = "power_v206"

# Functions

# Classes

# Main

# Search and import definition
ISADEF = import_isa_definition(
    os.path.dirname([
        isa for isa in find_isa_definitions() if isa.name == ISANAME
    ][0].filename))

# Print definition
print((ISADEF.full_report()))
exit(0)

In this simple code, first the find_isa_definitions, import_isa_definition from the microprobe.target.isa module are imported (line 14). Then, the first one is used to look for definitions of architectures, a list returned and filtered and only the one with name power_v206 is imported using the second method: import_isa_definition (lines 34-37). Finally, the full report of the ISADEF object is printed to standard output in line 40.

In the case, the full report is printed but the user can query any information about the particular ISA that has been imported by using the microprobe.target.isa.ISA API.

power_v206_power7_ppc64_linux_gcc_profile.py

The aim of this example is to show how the code generation works in Microprobe. In particular, this example shows how to generate, for each instruction of the ISA, an endless loop containing such instruction. The size of the loop and the dependency distance between the instructions of the loop can specified as a parameter. Using Microprobe you can generate thousands of microbenchmarks in few minutes. Let’s start with the command line interface. Executing:

> ./power_v206_power7_ppc64_linux_gcc_profile.py --help

will generate the following output:

power_v206_power7_ppc64_linux_gcc_profile.py: INFO: Processing input arguments...
usage: power_v206_power7_ppc64_linux_gcc_profile.py [-h]
                                                    [-P SEARCH_PATH [SEARCH_PATH ...]]
                                                    [-V] [-v] [-d]
                                                    [-i INSTRUCTION_NAME [INSTRUCTION_NAME ...]]
                                                    [--output_prefix PREFIX]
                                                    [-O PATH] [-p NUM_JOBS]
                                                    [-S BENCHMARK_SIZE]
                                                    [-D DEPENDECY_DISTANCE]

ISA power v206 profile example

optional arguments:
  -h, --help            show this help message and exit
  -P SEARCH_PATH [SEARCH_PATH ...], --default_paths SEARCH_PATH [SEARCH_PATH ...]
                        Default search paths for microprobe target definitions
  -V, --version         Show Microprobe version and exit
  -v, --verbosity       Verbosity level (Values: [0,1,2,3,4]). Each time this
                        argument is specified the verbosity level is
                        increased. By default, no logging messages are shown.
                        These are the four levels available:
                        
                          -v (1): critical messages
                          -v -v (2): critical and error messages
                          -v -v -v (3): critical, error and warning messages
                          -v -v -v -v (4): critical, error, warning and info messages
                        
                        Specifying more than four verbosity flags, will
                        default to the maximum of four. If you need extra
                        information, enable the debug mode (--debug or -d
                        flags).
  -d, --debug           Enable debug mode in Microprobe framework. Lots of
                        output messages will be generated
  -i INSTRUCTION_NAME [INSTRUCTION_NAME ...], --instruction INSTRUCTION_NAME [INSTRUCTION_NAME ...]
                        Instruction names to generate. Default: All
                        instructions
  --output_prefix PREFIX
                        Output prefix of the generated files. Default:
                        POWER_V206_PROFILE
  -O PATH, --output_path PATH
                        Output path. Default: current path
  -p NUM_JOBS, --parallel NUM_JOBS
                        Number of parallel jobs. Default: number of CPUs
                        available (80). Valid values: 1, 2, 3, 4, 5, 6, 7, 8,
                        9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
                        23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
                        36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,
                        49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
                        62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74,
                        75, 76, 77, 78, 79, 80
  -S BENCHMARK_SIZE, --size BENCHMARK_SIZE
                        Benchmark size (number of instructions in the endless
                        loop). Default: 64 instructions
  -D DEPENDECY_DISTANCE, --dependency_distance DEPENDECY_DISTANCE
                        Average dependency distance between the instructions.
                        Default: 1000 (no dependencies)

Environment variables:

  MICROPROBETEMPLATES    Default path for microprobe templates
  MICROPROBEDEBUG        If set, enable debug
  MICROPROBEDEBUGPASSES  If set, enable debug during passes
  MICROPROBEASMHEXFMT    Assembly hexadecimal format. Options:
                         'all' -> All immediates in hex format
                         'address' -> Address immediates in hex format (default)
                         'none' -> All immediate in integer format

Lets look at the code to see how this command line tool is implemented. This is the complete code of the script:

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
power_v206_power7_ppc64_linux_gcc_profile.py

Example module to show how to generate a benchmark for each instruction
of the ISA
"""

# Futures
from __future__ import absolute_import

# Built-in modules
import multiprocessing as mp
import os
import sys
import traceback

# Third party modules

# Own modules
import microprobe.code.ins
import microprobe.passes.address
import microprobe.passes.branch
import microprobe.passes.decimal
import microprobe.passes.float
import microprobe.passes.ilp
import microprobe.passes.initialization
import microprobe.passes.instruction
import microprobe.passes.memory
import microprobe.passes.register
import microprobe.passes.structure
import microprobe.utils.cmdline
from microprobe import MICROPROBE_RC
from microprobe.exceptions import MicroprobeException
from microprobe.target import import_definition
from microprobe.utils.cmdline import existing_dir, \
    int_type, print_error, print_info, print_warning
from microprobe.utils.logger import get_logger

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Constants
LOG = get_logger(__name__)  # Get the generic logging interface


# Functions
def main_setup():
    """
    Set up the command line interface (CLI) with the arguments required by
    this command line tool.
    """

    args = sys.argv[1:]

    # Create the CLI interface object
    cmdline = microprobe.utils.cmdline.CLI("ISA power v206 profile example",
                                           config_options=False,
                                           target_options=False,
                                           debug_options=False)

    # Add the different parameters for this particular tool
    cmdline.add_option(
        "instruction",
        "i",
        None,
        "Instruction names to generate. Default: All instructions",
        required=False,
        nargs="+",
        metavar="INSTRUCTION_NAME")

    cmdline.add_option(
        "output_prefix",
        None,
        "POWER_V206_PROFILE",
        "Output prefix of the generated files. Default: POWER_V206_PROFILE",
        opt_type=str,
        required=False,
        metavar="PREFIX")

    cmdline.add_option("output_path",
                       "O",
                       "./",
                       "Output path. Default: current path",
                       opt_type=existing_dir,
                       metavar="PATH")

    cmdline.add_option(
        "parallel",
        "p",
        MICROPROBE_RC['cpus'],
        "Number of parallel jobs. Default: number of CPUs available (%s)" %
        mp.cpu_count(),
        opt_type=int,
        choices=list(range(1, MICROPROBE_RC['cpus'] + 1)),
        metavar="NUM_JOBS")

    cmdline.add_option(
        "size",
        "S",
        64, "Benchmark size (number of instructions in the endless loop). "
        "Default: 64 instructions",
        opt_type=int_type(1, 2**20),
        metavar="BENCHMARK_SIZE")

    cmdline.add_option("dependency_distance",
                       "D",
                       1000,
                       "Average dependency distance between the instructions. "
                       "Default: 1000 (no dependencies)",
                       opt_type=int_type(1, 1000),
                       metavar="DEPENDECY_DISTANCE")

    # Start the main
    print_info("Processing input arguments...")
    cmdline.main(args, _main)


def _main(arguments):
    """
    Main program. Called after the arguments from the CLI interface have
    been processed.
    """

    print_info("Arguments processed!")

    print_info("Importing target definition "
               "'power_v206-power7-ppc64_linux_gcc'...")
    target = import_definition("power_v206-power7-ppc64_linux_gcc")

    # Get the arguments
    instructions = arguments.get("instruction", None)
    prefix = arguments["output_prefix"]
    output_path = arguments["output_path"]
    parallel_jobs = arguments["parallel"]
    size = arguments["size"]
    distance = arguments["dependency_distance"]

    # Process the arguments
    if instructions is not None:

        # If the user has provided some instructions, make sure they
        # exists and then we call the generation function

        instructions = _validate_instructions(instructions, target)

        if len(instructions) == 0:
            print_error("No valid instructions defined.")
            exit(-1)

        # Set more verbose level
        # set_log_level(10)
        #
        list(
            map(_generate_benchmark,
                [(instruction, prefix, output_path, target, size, distance)
                 for instruction in instructions]))

    else:

        # If the user has not provided any instruction, go for all of them
        # and then call he generation function

        instructions = _generate_instructions(target, output_path, prefix)

        # Since several benchmark will be generated, reduce verbose level
        # and call the generation function in parallel

        # set_log_level(30)

        if parallel_jobs > 1:
            pool = mp.Pool(processes=parallel_jobs)
            pool.map(
                _generate_benchmark,
                [(instruction, prefix, output_path, target, size, distance)
                 for instruction in instructions], 1)
        else:
            list(
                map(_generate_benchmark,
                    [(instruction, prefix, output_path, target, size, distance)
                     for instruction in instructions]))


def _validate_instructions(instructions, target):
    """
    Validate the provided instruction for a given target
    """

    nins = []
    for instruction in instructions:

        if instruction not in list(target.isa.instructions.keys()):
            print_warning("'%s' not defined in the ISA. Skipping..." %
                          instruction)
            continue
        nins.append(instruction)
    return nins


def _generate_instructions(target, path, prefix):
    """
    Generate the list of instruction to be generated for a given target
    """

    instructions = []
    for name, instr in target.instructions.items():

        if instr.privileged or instr.hypervisor:
            # Skip priv/hyper instructions
            continue

        if instr.branch and not instr.branch_relative:
            # Skip branch absolute due to relocation problems
            continue

        if instr.category in ['LMA', 'LMV', 'DS', 'EC']:
            # Skip some instruction categories
            continue

        if name in [
                'LSWI_V0', 'LSWX_V0', 'LMW_V0', 'STSWX_V0', 'LD_V1', 'LWZ_V1',
                'STW_V1'
        ]:
            # Some instructions are not completely supported yet
            # String-related instructions and load multiple

            continue

        # Skip if the files already exists

        fname = "%s/%s_%s.c" % (path, prefix, name)
        ffname = "%s/%s_%s.c.fail" % (path, prefix, name)

        if os.path.isfile(fname):
            print_warning("Skip %s. '%s' already generated" % (name, fname))
            continue

        if os.path.isfile(ffname):
            print_warning("Skip %s. '%s' already generated (failed)" %
                          (name, ffname))
            continue

        instructions.append(name)

    return instructions


def _generate_benchmark(args):
    """
    Actual benchmark generation policy. This is the function that defines
    how the microbenchmark are going to be generated
    """

    instr_name, prefix, output_path, target, size, distance = args

    try:

        # Name of the output file
        fname = "%s/%s_%s" % (output_path, prefix, instr_name)

        # Name of the fail output file (generated in case of exception)
        ffname = "%s.c.fail" % (fname)

        print_info("Generating %s ..." % (fname))

        instruction = microprobe.code.ins.Instruction()
        instruction.set_arch_type(target.instructions[instr_name])
        sequence = [target.instructions[instr_name]]

        # Get the wrapper object. The wrapper object is in charge of
        # translating the internal representation of the microbenchmark
        # to the final output format.
        #
        # In this case, we obtain the 'CInfGen' wrapper, which embeds
        # the generated code within an infinite loop using C plus
        # in-line assembly statements.
        cwrapper = microprobe.code.get_wrapper("CInfGen")

        # Create the synthesizer object, which is in charge of driving the
        # generation of the microbenchmark, given a set of passes
        # (a.k.a. transformations) to apply to the an empty internal
        # representation of the microbenchmark
        synth = microprobe.code.Synthesizer(target,
                                            cwrapper(),
                                            value=0b01010101)

        # Add the transformation passes

        #######################################################################
        # Pass 1: Init integer registers to a given value                     #
        #######################################################################
        synth.add_pass(
            microprobe.passes.initialization.InitializeRegistersPass(
                value=_init_value()))
        floating = False
        vector = False

        for operand in instruction.operands():
            if operand.type.immediate:
                continue

            if operand.type.float:
                floating = True

            if operand.type.vector:
                vector = True

        if vector and floating:
            ###################################################################
            # Pass 1.A: if instruction uses vector floats, init vector        #
            #           registers to float values                             #
            ###################################################################
            synth.add_pass(
                microprobe.passes.initialization.InitializeRegistersPass(
                    v_value=(1.000000000000001, 64)))
        elif vector:
            ###################################################################
            # Pass 1.B: if instruction uses vector but not floats, init       #
            #           vector registers to integer value                     #
            ###################################################################
            synth.add_pass(
                microprobe.passes.initialization.InitializeRegistersPass(
                    v_value=(_init_value(), 64)))
        elif floating:
            ###################################################################
            # Pass 1.C: if instruction uses floats, init float                #
            #           registers to float values                             #
            ###################################################################
            synth.add_pass(
                microprobe.passes.initialization.InitializeRegistersPass(
                    fp_value=1.000000000000001))

        #######################################################################
        # Pass 2: Add a building block of size 'size'                         #
        #######################################################################
        synth.add_pass(
            microprobe.passes.structure.SimpleBuildingBlockPass(size))

        #######################################################################
        # Pass 3: Fill the building block with the instruction sequence       #
        #######################################################################
        synth.add_pass(
            microprobe.passes.instruction.SetInstructionTypeBySequencePass(
                sequence))

        #######################################################################
        # Pass 4: Compute addresses of instructions (this pass is needed to   #
        #         update the internal representation information so that in   #
        #         case addresses are required, they are up to date).          #
        #######################################################################
        synth.add_pass(
            microprobe.passes.address.UpdateInstructionAddressesPass())

        #######################################################################
        # Pass 5: Set target of branches to be the next instruction in the    #
        #         instruction stream                                          #
        #######################################################################
        synth.add_pass(microprobe.passes.branch.BranchNextPass())

        #######################################################################
        # Pass 6: Set memory-related operands to access 16 storage locations  #
        #         in a round-robin fashion in stride 256 bytes.               #
        #         The pattern would be: 0, 256, 512, .... 3840, 0, 256, ...   #
        #######################################################################
        synth.add_pass(microprobe.passes.memory.SingleMemoryStreamPass(
            16, 256))

        #######################################################################
        # Pass 7.A: Initialize the storage locations accessed by floating     #
        #           point instructions to have a valid floating point value   #
        #######################################################################
        synth.add_pass(
            microprobe.passes.float.InitializeMemoryFloatPass(
                value=1.000000000000001))

        #######################################################################
        # Pass 7.B: Initialize the storage locations accessed by decimal      #
        #           instructions to have a valid decimal value                #
        #######################################################################
        synth.add_pass(
            microprobe.passes.decimal.InitializeMemoryDecimalPass(value=1))

        #######################################################################
        # Pass 8: Set the remaining instructions operands (if not set)        #
        #         (Required to set remaining immediate operands)              #
        #######################################################################
        synth.add_pass(
            microprobe.passes.register.DefaultRegisterAllocationPass(
                dd=distance))

        # Synthesize the microbenchmark.The synthesize applies the set of
        # transformation passes added before and returns object representing
        # the microbenchmark
        bench = synth.synthesize()

        # Save the microbenchmark to the file 'fname'
        synth.save(fname, bench=bench)

        print_info("%s generated!" % (fname))

        # Remove fail file if exists
        if os.path.isfile(ffname):
            os.remove(ffname)

    except MicroprobeException:

        # In case of exception during the generation of the microbenchmark,
        # print the error, write the fail file and exit
        print_error(traceback.format_exc())
        open(ffname, 'a').close()
        exit(-1)


def _init_value():
    """ Return a init value """
    return 0b0101010101010101010101010101010101010101010101010101010101010101


# Main
if __name__ == '__main__':
    # run main if executed from the command line
    # and the main method exists

    if callable(locals().get('main_setup')):
        main_setup()
        exit(0)

The code is self-documented. You can take a look to understand the basic concepts of the code generation in Microprobe. In order to help the readers, let us summarize and elaborate the explanations in the code. The following are the suggested steps required to implement a command line tool to generate microbenchmarks using Microprobe:

Define the command line interface and parameters (main_setup() function in the example). This includes:
1. Create a command line interface object
2. Define parameters using the add_option interface
3. Call the actual main with the arguments
Define the function to process the input parameters (_main() function in the example). This includes:
1. Import target definition
2. Get processed arguments
3. Validate and use the arguments to call the actual microbenchmark generation function
Define the function to generate the microbenchmark (_generate_benchmark function in the example). The main elements are the following:
1. Get the wrapper object. The wrapper object defines the general characteristics of code being generated (i.e. how the internal representation will be translated to the final file being generated). General characteristics are, for instance, code prologs such as #include <header.h> directives, the main function declaration, epilogs, etc. In this case, the wrapper selected is the CInfGen. This wrapper generates C code with an infinite loop of instructions. This results in the following code:
```
#include <stdio.h>
#include <string.h>

// <declaration of variables>

int main(int argc, char** argv, char** envp) {

    // <initialization_code>

    while(1) {

        // <generated_code>

    } // end while
}
```
  The user can subclass or define their own wrappers to fulfill their needs. See microprobe.code.wrapper.Wrapper for more details.
2. Instantiate synthesizer. The benchmark synthesizer object is in charge of driving the code generation object by applying the set of transformation passes defined by the user.
3. Define the transformation passes. The transformation passes will fill the declaration of variables, <initialization_code> and <generated_code> sections of the previous code block. Depending on the order and the type of passes applied, the code generated will be different. The user has plenty of transformation passes to apply. See microprobe.passes and all its submodules for further details. Also, the use can define its own passes by subclassing the class microprobe.passes.Pass.
4. Finally, once the generation policy is defined, the user only has to synthesize the benchmark and save it to a file.

power_v206_power7_ppc64_linux_gcc_fu_stress.py

The following example shows how to generate microbenchmarks that stress a particular functional unit of the architecture. The code is self explanatory:

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
power_v206_power7_ppc64_linux_gcc_fu_stress.py

Example module to show how to generate a benchmark stressing a particular
functional unit of the microarchitecture at different rate using the
average latency of instructions as well as the average dependency distance
between the instructions
"""

# Futures
from __future__ import absolute_import

# Built-in modules
import os
import sys
import traceback

# Own modules
import microprobe.code.ins
import microprobe.passes.address
import microprobe.passes.branch
import microprobe.passes.decimal
import microprobe.passes.float
import microprobe.passes.ilp
import microprobe.passes.initialization
import microprobe.passes.instruction
import microprobe.passes.memory
import microprobe.passes.register
import microprobe.passes.structure
import microprobe.utils.cmdline
from microprobe.exceptions import MicroprobeException, \
    MicroprobeTargetDefinitionError
from microprobe.target import import_definition
from microprobe.utils.cmdline import dict_key, existing_dir, \
    float_type, int_type, print_error, print_info
from microprobe.utils.logger import get_logger

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Constants
LOG = get_logger(__name__)  # Get the generic logging interface


# Functions
def main_setup():
    """
    Set up the command line interface (CLI) with the arguments required by
    this command line tool.
    """

    args = sys.argv[1:]

    # Get the target definition
    try:
        target = import_definition("power_v206-power7-ppc64_linux_gcc")
    except MicroprobeTargetDefinitionError as exc:
        print_error("Unable to import target definition")
        print_error("Exception message: %s" % str(exc))
        exit(-1)

    func_units = {}
    valid_units = [elem.name for elem in target.elements.values()]

    for instr in target.isa.instructions.values():
        if instr.execution_units == "None":
            LOG.debug("Execution units for: '%s' not defined", instr.name)
            continue

        for unit in instr.execution_units:
            if unit not in valid_units:
                continue

            if unit not in func_units:
                func_units[unit] = [
                    elem for elem in target.elements.values()
                    if elem.name == unit
                ][0]

    # Create the CLI interface object
    cmdline = microprobe.utils.cmdline.CLI("ISA power v206 profile example",
                                           config_options=False,
                                           target_options=False,
                                           debug_options=False)

    # Add the different parameters for this particular tool
    cmdline.add_option("functional_unit",
                       "f", [func_units['ALU']],
                       "Functional units to stress. Default: ALU",
                       required=False,
                       nargs="+",
                       choices=func_units,
                       opt_type=dict_key(func_units),
                       metavar="FUNCTIONAL_UNIT_NAME")

    cmdline.add_option(
        "output_prefix",
        None,
        "POWER_V206_FU_STRESS",
        "Output prefix of the generated files. Default: POWER_V206_FU_STRESS",
        opt_type=str,
        required=False,
        metavar="PREFIX")

    cmdline.add_option("output_path",
                       "O",
                       "./",
                       "Output path. Default: current path",
                       opt_type=existing_dir,
                       metavar="PATH")

    cmdline.add_option(
        "size",
        "S",
        64, "Benchmark size (number of instructions in the endless loop). "
        "Default: 64 instructions",
        opt_type=int_type(1, 2**20),
        metavar="BENCHMARK_SIZE")

    cmdline.add_option("dependency_distance",
                       "D",
                       1000,
                       "Average dependency distance between the instructions. "
                       "Default: 1000 (no dependencies)",
                       opt_type=int_type(1, 1000),
                       metavar="DEPENDECY_DISTANCE")

    cmdline.add_option("average_latency",
                       "L",
                       2, "Average latency of the selected instructins. "
                       "Default: 2 cycles",
                       opt_type=float_type(1, 1000),
                       metavar="AVERAGE_LATENCY")

    # Start the main
    print_info("Processing input arguments...")
    cmdline.main(args, _main)


def _main(arguments):
    """
    Main program. Called after the arguments from the CLI interface have
    been processed.
    """

    print_info("Arguments processed!")

    print_info("Importing target definition "
               "'power_v206-power7-ppc64_linux_gcc'...")
    target = import_definition("power_v206-power7-ppc64_linux_gcc")

    # Get the arguments
    functional_units = arguments["functional_unit"]
    prefix = arguments["output_prefix"]
    output_path = arguments["output_path"]
    size = arguments["size"]
    latency = arguments["average_latency"]
    distance = arguments["dependency_distance"]

    if functional_units is None:
        functional_units = ["ALL"]

    _generate_benchmark(target, "%s/%s_" % (output_path, prefix),
                        (functional_units, size, latency, distance))


def _generate_benchmark(target, output_prefix, args):
    """
    Actual benchmark generation policy. This is the function that defines
    how the microbenchmark are going to be generated
    """

    functional_units, size, latency, distance = args

    try:

        # Name of the output file
        func_unit_names = [unit.name for unit in functional_units]
        fname = "%s%s" % (output_prefix, "_".join(func_unit_names))
        fname = "%s_LAT_%s" % (fname, latency)
        fname = "%s_DEP_%s" % (fname, distance)

        # Name of the fail output file (generated in case of exception)
        ffname = "%s.c.fail" % (fname)

        print_info("Generating %s ..." % (fname))

        # Get the wrapper object. The wrapper object is in charge of
        # translating the internal representation of the microbenchmark
        # to the final output format.
        #
        # In this case, we obtain the 'CInfGen' wrapper, which embeds
        # the generated code within an infinite loop using C plus
        # in-line assembly statements.
        cwrapper = microprobe.code.get_wrapper("CInfGen")

        # Create the synthesizer object, which is in charge of driving the
        # generation of the microbenchmark, given a set of passes
        # (a.k.a. transformations) to apply to the an empty internal
        # representation of the microbenchmark
        synth = microprobe.code.Synthesizer(target,
                                            cwrapper(),
                                            value=0b01010101)

        # Add the transformation passes

        #######################################################################
        # Pass 1: Init integer registers to a given value                     #
        #######################################################################
        synth.add_pass(
            microprobe.passes.initialization.InitializeRegistersPass(
                value=_init_value()))

        #######################################################################
        # Pass 2: Add a building block of size 'size'                         #
        #######################################################################
        synth.add_pass(
            microprobe.passes.structure.SimpleBuildingBlockPass(size))

        #######################################################################
        # Pass 3: Fill the building block with the instruction sequence       #
        #######################################################################
        synth.add_pass(
            microprobe.passes.instruction.SetInstructionTypeByElementPass(
                target, functional_units, {}))

        #######################################################################
        # Pass 4: Compute addresses of instructions (this pass is needed to   #
        #         update the internal representation information so that in   #
        #         case addresses are required, they are up to date).          #
        #######################################################################
        synth.add_pass(
            microprobe.passes.address.UpdateInstructionAddressesPass())

        #######################################################################
        # Pass 5: Set target of branches to be the next instruction in the    #
        #         instruction stream                                          #
        #######################################################################
        synth.add_pass(microprobe.passes.branch.BranchNextPass())

        #######################################################################
        # Pass 6: Set memory-related operands to access 16 storage locations  #
        #         in a round-robin fashion in stride 256 bytes.               #
        #         The pattern would be: 0, 256, 512, .... 3840, 0, 256, ...   #
        #######################################################################
        synth.add_pass(microprobe.passes.memory.SingleMemoryStreamPass(
            16, 256))

        #######################################################################
        # Pass 7.A: Initialize the storage locations accessed by floating     #
        #           point instructions to have a valid floating point value   #
        #######################################################################
        synth.add_pass(
            microprobe.passes.float.InitializeMemoryFloatPass(
                value=1.000000000000001))

        #######################################################################
        # Pass 7.B: Initialize the storage locations accessed by decimal      #
        #           instructions to have a valid decimal value                #
        #######################################################################
        synth.add_pass(
            microprobe.passes.decimal.InitializeMemoryDecimalPass(value=1))

        #######################################################################
        # Pass 8: Set the remaining instructions operands (if not set)        #
        #         (Required to set remaining immediate operands)              #
        #######################################################################
        synth.add_pass(
            microprobe.passes.register.DefaultRegisterAllocationPass(
                dd=distance))

        # Synthesize the microbenchmark.The synthesize applies the set of
        # transformation passes added before and returns object representing
        # the microbenchmark
        bench = synth.synthesize()

        # Save the microbenchmark to the file 'fname'
        synth.save(fname, bench=bench)

        print_info("%s generated!" % (fname))

        # Remove fail file if exists
        if os.path.isfile(ffname):
            os.remove(ffname)

    except MicroprobeException:

        # In case of exception during the generation of the microbenchmark,
        # print the error, write the fail file and exit
        print_error(traceback.format_exc())
        open(ffname, 'a').close()
        exit(-1)


def _init_value():
    """ Return a init value """
    return 0b0101010101010101010101010101010101010101010101010101010101010101


# Main
if __name__ == '__main__':
    # run main if executed from the command line
    # and the main method exists

    if callable(locals().get('main_setup')):
        main_setup()
        exit(0)

power_v206_power7_ppc64_linux_gcc_memory.py

The following example shows how to create microbenchmarks with different activity (stress levels) on the different levels of the cache hierarchy. Note that it is not necessary to use the built-in command line interface provided by Microprobe, as the example shows.

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
power_v206_power7_ppc64_linux_gcc_memory.py

Example python script to show how to generate microbenchmarks with particular
levels of activity in the memory hierarchy.
"""

# Futures
from __future__ import absolute_import

# Built-in modules
import multiprocessing as mp
import os
import random
import sys
from typing import List, Tuple

# Own modules
import microprobe.code
import microprobe.passes.address
import microprobe.passes.ilp
import microprobe.passes.initialization
import microprobe.passes.instruction
import microprobe.passes.memory
import microprobe.passes.register
import microprobe.passes.structure
from microprobe import MICROPROBE_RC
from microprobe.exceptions import MicroprobeTargetDefinitionError
from microprobe.model.memory import EndlessLoopDataMemoryModel
from microprobe.target import import_definition
from microprobe.target.isa.instruction import InstructionType
from microprobe.target.uarch.cache import SetAssociativeCache
from microprobe.utils.cmdline import print_error, print_info
from microprobe.utils.typeguard_decorator import typeguard_testsuite

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Get the target definition
try:
    TARGET = import_definition("power_v206-power7-ppc64_linux_gcc")
except MicroprobeTargetDefinitionError as exc:
    print_error("Unable to import target definition")
    print_error("Exception message: %s" % str(exc))
    exit(-1)

assert TARGET.microarchitecture is not None, \
    "Target must have a defined microarchitecture"

BASE_ELEMENT = [
    element for element in TARGET.microarchitecture.elements.values()
    if element.name == 'L1D'
][0]
CACHE_HIERARCHY: List[SetAssociativeCache] = \
    TARGET.microarchitecture.cache_hierarchy.get_data_hierarchy_from_element(
        BASE_ELEMENT)

# Benchmark size
BENCHMARK_SIZE = 8 * 1024

# Fill a list of the models to be generated

MEMORY_MODELS: List[Tuple[str, List[SetAssociativeCache], List[int]]] = []

#
# Due to performance issues (long exec. time) this
# model is disabled
#
# MEMORY_MODELS.append(
#    (
#        "ALL", CACHE_HIERARCHY, [
#            25, 25, 25, 25]))

MEMORY_MODELS.append(("L1", CACHE_HIERARCHY, [100, 0, 0, 0]))
MEMORY_MODELS.append(("L2", CACHE_HIERARCHY, [0, 100, 0, 0]))
MEMORY_MODELS.append(("L3", CACHE_HIERARCHY, [0, 0, 100, 0]))
MEMORY_MODELS.append(("L1L3", CACHE_HIERARCHY, [50, 0, 50, 0]))
MEMORY_MODELS.append(("L1L2", CACHE_HIERARCHY, [50, 50, 0, 0]))
MEMORY_MODELS.append(("L2L3", CACHE_HIERARCHY, [0, 50, 50, 0]))
MEMORY_MODELS.append(("CACHES", CACHE_HIERARCHY, [33, 33, 34, 0]))
MEMORY_MODELS.append(("MEM", CACHE_HIERARCHY, [0, 0, 0, 100]))

# Enable parallel generation
PARALLEL = False

DIRECTORY = None


@typeguard_testsuite
def main():
    """Main function. """
    # call the generate method for each model in the memory model list

    if PARALLEL:
        print_info("Start parallel execution...")
        pool = mp.Pool(processes=MICROPROBE_RC['cpus'])
        pool.map(generate, MEMORY_MODELS, 1)
    else:
        print_info("Start sequential execution...")
        list(map(generate, MEMORY_MODELS))

    exit(0)


@typeguard_testsuite
def generate(model: Tuple[str, List[SetAssociativeCache], List[int]]):
    """Benchmark generation policy function. """

    assert DIRECTORY is not None, "DIRECTORY variable cannot be None"

    print_info(f"Creating memory model '{model[0]}' ...")
    memmodel = EndlessLoopDataMemoryModel(*model)

    modelname = memmodel.name

    print_info(f"Generating Benchmark mem-{modelname} ...")

    # Get the architecture
    garch = TARGET

    # For all the supported instructions, get the memory operations,
    sequence: List[InstructionType] = []
    for instr_name in sorted(garch.isa.instructions.keys()):

        instr = garch.isa.instructions[instr_name]

        if not instr.access_storage:
            continue
        if instr.privileged:  # Skip privileged
            continue
        if instr.hypervisor:  # Skip hypervisor
            continue
        if instr.trap:  # Skip traps
            continue
        if "String" in instr.description:  # Skip unsupported string instr.
            continue
        if "Multiple" in instr.description:  # Skip unsupported mult. ld/sts
            continue
        if instr.category in ['LMA', 'LMV', 'DS', 'EC',
                              'WT']:  # Skip unsupported categories
            continue
        if instr.access_storage_with_update:  # Not supported by mem. model
            continue
        if "Reserve Indexed" in instr.description:  # Skip (illegal intr.)
            continue
        if "Conditional Indexed" in instr.description:  # Skip (illegal intr.)
            continue
        if instr.name in ['LD_V1', 'LWZ_V1', 'STW_V1']:
            continue

        sequence.append(instr)

    # Get the loop wrapper. In this case we take the 'CInfPpc', which
    # generates an infinite loop in C using PowerPC embedded assembly.
    cwrapper = microprobe.code.get_wrapper("CInfPpc")

    # Define function to return random numbers (used afterwards)
    def rnd():
        """Return a random value. """
        return random.randrange(0, (1 << 64) - 1)

    # Create the benchmark synthesizer
    synth = microprobe.code.Synthesizer(garch, cwrapper())

    ##########################################################################
    # Add the passes we want to apply to synthesize benchmarks               #
    ##########################################################################

    # --> Init registers to random values
    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=rnd))

    # --> Add a single basic block of size 'size'
    if memmodel.name in ['MEM']:
        synth.add_pass(
            microprobe.passes.structure.SimpleBuildingBlockPass(
                BENCHMARK_SIZE * 4))
    else:
        synth.add_pass(
            microprobe.passes.structure.SimpleBuildingBlockPass(
                BENCHMARK_SIZE))

    # --> Fill the basic block using the sequence of instructions provided
    synth.add_pass(
        microprobe.passes.instruction.SetInstructionTypeBySequencePass(
            sequence))

    # --> Set the memory operations parameters to fulfill the given model
    synth.add_pass(microprobe.passes.memory.GenericMemoryModelPass(memmodel))

    # --> Set the dependency distance and the default allocation. Sets the
    # remaining undefined instruction operands (register allocation,...)
    synth.add_pass(microprobe.passes.register.NoHazardsAllocationPass())
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(dd=0))

    # Generate the benchmark (applies the passes).
    bench = synth.synthesize()

    print_info(f"Benchmark mem-{modelname} saving to disk...")

    # Save the benchmark
    synth.save(f"{DIRECTORY}/mem-{modelname}", bench=bench)

    print_info(f"Benchmark mem-{modelname} generated")
    return True


if __name__ == '__main__':
    # run main if executed from the command line
    # and the main method exists

    if len(sys.argv) != 2:
        print_info("Usage:")
        print_info("%s output_dir" % (sys.argv[0]))
        exit(-1)

    DIRECTORY = sys.argv[1]

    if not os.path.isdir(DIRECTORY):
        print_error(f"Output directory '{DIRECTORY}' does not exists")
        exit(-1)

    main()

power_v206_power7_ppc64_linux_gcc_random.py

The following example generates random microbenchmarks:

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
power_v206_power7_ppc64_linux_gcc_memory.py

Example python script to show how to generate random microbenchmarks.
"""

# Futures
from __future__ import absolute_import

# Built-in modules
import multiprocessing as mp
import os
import random
import sys
from typing import List

# Own modules
import microprobe.code
import microprobe.passes.address
import microprobe.passes.branch
import microprobe.passes.ilp
import microprobe.passes.initialization
import microprobe.passes.instruction
import microprobe.passes.memory
import microprobe.passes.register
import microprobe.passes.structure
from microprobe import MICROPROBE_RC
from microprobe.exceptions import MicroprobeError, \
    MicroprobeTargetDefinitionError
from microprobe.model.memory import EndlessLoopDataMemoryModel
from microprobe.target import import_definition
from microprobe.target.isa.instruction import InstructionType
from microprobe.utils.cmdline import print_error, print_info
from microprobe.utils.typeguard_decorator import typeguard_testsuite

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Benchmark size
BENCHMARK_SIZE = 8 * 1024

# Get the target definition
try:
    TARGET = import_definition("power_v206-power7-ppc64_linux_gcc")
except MicroprobeTargetDefinitionError as exc:
    print_error("Unable to import target definition")
    print_error("Exception message: %s" % str(exc))
    exit(-1)

assert TARGET.microarchitecture is not None, \
    "Target must have a defined microarchitecture"
BASE_ELEMENT = [
    element for element in TARGET.microarchitecture.elements.values()
    if element.name == 'L1D'
][0]
CACHE_HIERARCHY = \
    TARGET.microarchitecture.cache_hierarchy.get_data_hierarchy_from_element(
     BASE_ELEMENT)

PARALLEL = True

DIRECTORY = None


@typeguard_testsuite
def main():
    """ Main program. """
    if PARALLEL:
        pool = mp.Pool(processes=MICROPROBE_RC['cpus'])
        pool.map(generate, list(range(0, 100)), 1)
    else:
        list(map(generate, list(range(0, 100))))


@typeguard_testsuite
def generate(name: str):
    """ Benchmark generation policy. """

    assert DIRECTORY is not None, "DIRECTORY variable cannot be None"

    if os.path.isfile(f"{DIRECTORY}/random-{name}.c"):
        print_info(f"Skip {name}")
        return

    print_info(f"Generating {name}...")

    # Seed the randomness
    rand = random.Random()
    rand.seed(64)  # My favorite number ;)

    # Generate a random memory model (used afterwards)
    model: List[int] = []
    total = 100
    for mcomp in CACHE_HIERARCHY[0:-1]:
        weight = rand.randint(0, total)
        model.append(weight)
        print_info("%s: %d%%" % (mcomp, weight))
        total = total - weight

    # Fix remaining
    level = rand.randint(0, len(CACHE_HIERARCHY[0:-1]) - 1)
    model[level] += total

    # Last level always zero
    model.append(0)

    # Sanity check
    psum = 0
    for elem in model:
        psum += elem
    assert psum == 100

    modelobj = EndlessLoopDataMemoryModel("random-%s", CACHE_HIERARCHY, model)

    # Get the loop wrapper. In this case we take the 'CInfPpc', which
    # generates an infinite loop in C using PowerPC embedded assembly.
    cwrapper = microprobe.code.get_wrapper("CInfPpc")

    # Define function to return random numbers (used afterwards)
    def rnd():
        """Return a random value. """
        return rand.randrange(0, (1 << 64) - 1)

    # Create the benchmark synthesizer
    synth = microprobe.code.Synthesizer(TARGET, cwrapper())

    ##########################################################################
    # Add the passes we want to apply to synthesize benchmarks               #
    ##########################################################################

    # --> Init registers to random values
    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=rnd))

    # --> Add a single basic block of size size
    synth.add_pass(
        microprobe.passes.structure.SimpleBuildingBlockPass(BENCHMARK_SIZE))

    # --> Fill the basic block with instructions picked randomly from the list
    #     provided

    instructions: List[InstructionType] = []
    for instr in TARGET.isa.instructions.values():

        if instr.privileged:  # Skip privileged
            continue
        if instr.hypervisor:  # Skip hypervisor
            continue
        if instr.trap:  # Skip traps
            continue
        if instr.syscall:  # Skip syscall
            continue
        if "String" in instr.description:  # Skip unsupported string instr.
            continue
        if "Multiple" in instr.description:  # Skip unsupported mult. ld/sts
            continue
        if instr.category in ['LMA', 'LMV', 'DS', 'EC',
                              'WT']:  # Skip unsupported categories
            continue
        if instr.access_storage_with_update:  # Not supported by mem. model
            continue
        if instr.branch and not instr.branch_relative:  # Skip branches
            continue
        if "Reserve Indexed" in instr.description:  # Skip (illegal intr.)
            continue
        if "Conitional Indexed" in instr.description:  # Skip (illegal intr.)
            continue
        if instr.name in [
                'LD_V1',
                'LWZ_V1',
                'STW_V1',
        ]:
            continue

        instructions.append(instr)

    synth.add_pass(
        microprobe.passes.instruction.SetRandomInstructionTypePass(
            instructions, rand))

    # --> Set the memory operations parameters to fulfill the given model
    synth.add_pass(microprobe.passes.memory.GenericMemoryModelPass(modelobj))

    # --> Set target of branches to next instruction (first compute addresses)
    synth.add_pass(microprobe.passes.address.UpdateInstructionAddressesPass())
    synth.add_pass(microprobe.passes.branch.BranchNextPass())

    # --> Set the dependency distance and the default allocation. Dependency
    #     distance is randomly picked
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(
            dd=rand.randint(1, 20)))

    # Generate the benchmark (applies the passes)
    # Since it is a randomly generated code, the generation might fail
    # (e.g. not enough access to fulfill the requested memory model, etc.)
    # Because of that, we handle the exception accordingly.
    try:
        print_info(f"Synthesizing {name}...")
        bench = synth.synthesize()
        print_info(f"Synthesized {name}!")
        # Save the benchmark
        synth.save(f"{DIRECTORY}/random-{name}", bench=bench)
    except MicroprobeError:
        print_info(f"Synthesizing error in '{name}'. This is Ok.")

    return True


if __name__ == '__main__':
    # run main if executed from the command line
    # and the main method exists

    if len(sys.argv) != 2:
        print_info("Usage:")
        print_info("%s output_dir" % (sys.argv[0]))
        exit(-1)

    DIRECTORY = sys.argv[1]

    if not os.path.isdir(DIRECTORY):
        print_error(f"Output directory '{DIRECTORY}' does not exists")
        exit(-1)

    if callable(locals().get('main')):
        main()

power_v206_power7_ppc64_linux_gcc_custom.py

The following example shows different examples on how to customize the generation of microbenchmarks:

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
power_v206_power7_ppc64_linux_gcc_custom.py

Example python script to show how to generate random microbenchmarks.
"""

# Futures
from __future__ import absolute_import

# Built-in modules
import os
import sys

# Own modules
import microprobe.code
import microprobe.passes.initialization
import microprobe.passes.instruction
import microprobe.passes.memory
import microprobe.passes.register
import microprobe.passes.structure
from microprobe.exceptions import MicroprobeTargetDefinitionError
from microprobe.model.memory import EndlessLoopDataMemoryModel
from microprobe.target import import_definition
from microprobe.utils.cmdline import print_error, print_info
from microprobe.utils.misc import RNDINT

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Benchmark size
BENCHMARK_SIZE = 8 * 1024

if len(sys.argv) != 2:
    print_info("Usage:")
    print_info("%s output_dir" % (sys.argv[0]))
    exit(-1)

DIRECTORY = sys.argv[1]

if not os.path.isdir(DIRECTORY):
    print_info("Output DIRECTORY '%s' does not exists" % (DIRECTORY))
    exit(-1)

# Get the target definition
try:
    TARGET = import_definition("power_v206-power7-ppc64_linux_gcc")
except MicroprobeTargetDefinitionError as exc:
    print_error("Unable to import target definition")
    print_error("Exception message: %s" % str(exc))
    exit(-1)


###############################################################################
# Example 1: loop with instructions accessing storage , hitting the first     #
#            level of cache and with dependency distance of 3                 #
###############################################################################
def example_1():
    """ Example 1 """
    name = "L1-LOADS"

    base_element = [
        element for element in TARGET.elements.values()
        if element.name == 'L1D'
    ][0]
    cache_hierarchy = TARGET.cache_hierarchy.get_data_hierarchy_from_element(
        base_element)

    model = [0] * len(cache_hierarchy)
    model[0] = 100

    mmodel = EndlessLoopDataMemoryModel("random-%s", cache_hierarchy, model)

    profile = {}
    for instr_name in sorted(TARGET.instructions.keys()):
        instr = TARGET.instructions[instr_name]
        if not instr.access_storage:
            continue
        if instr.privileged:  # Skip privileged
            continue
        if instr.hypervisor:  # Skip hypervisor
            continue
        if "String" in instr.description:  # Skip unsupported string instr.
            continue
        if "ultiple" in instr.description:  # Skip unsupported mult. ld/sts
            continue
        if instr.category in ['DS', 'LMA', 'LMV',
                              'EC']:  # Skip unsupported categories
            continue
        if instr.access_storage_with_update:  # Not supported
            continue

        if instr.name in [
                'LD_V1',
                'LWZ_V1',
                'STW_V1',
        ]:
            continue

        if (any([moper.is_load for moper in instr.memory_operand_descriptors])
                and all([
                    not moper.is_store
                    for moper in instr.memory_operand_descriptors
                ])):
            profile[instr] = 1

    cwrapper = microprobe.code.get_wrapper("CInfPpc")
    synth = microprobe.code.Synthesizer(TARGET, cwrapper())

    synth.add_pass(
        microprobe.passes.structure.SimpleBuildingBlockPass(BENCHMARK_SIZE))
    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=RNDINT))
    synth.add_pass(
        microprobe.passes.initialization.InitializeRegisterPass("GPR1",
                                                                0,
                                                                force=True,
                                                                reserve=True))
    synth.add_pass(
        microprobe.passes.instruction.SetInstructionTypeByProfilePass(profile))
    synth.add_pass(microprobe.passes.memory.GenericMemoryModelPass(mmodel))
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(dd=3))

    print_info("Generating %s..." % name)
    bench = synth.synthesize()
    print_info("%s Generated!" % name)
    synth.save("%s/%s" % (DIRECTORY, name), bench=bench)  # Save the benchmark


###############################################################################
# Example 2: loop with instructions using the MUL unit and with dependency    #
#            distance of 4                                                    #
###############################################################################
def example_2():
    """ Example 2 """
    name = "FXU-MUL"

    cwrapper = microprobe.code.get_wrapper("CInfPpc")
    synth = microprobe.code.Synthesizer(TARGET, cwrapper())

    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=RNDINT))
    synth.add_pass(
        microprobe.passes.structure.SimpleBuildingBlockPass(BENCHMARK_SIZE))
    synth.add_pass(
        microprobe.passes.instruction.SetInstructionTypeByElementPass(
            TARGET, [TARGET.elements['MUL_FXU0_Core0_SCM_Processor']], {}))
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(dd=4))

    print_info("Generating %s..." % name)
    bench = synth.synthesize()
    print_info("%s Generated!" % name)
    synth.save("%s/%s" % (DIRECTORY, name), bench=bench)  # Save the benchmark


###############################################################################
# Example 3: loop with instructions using the ALU unit and with dependency    #
#            distance of 1                                                    #
###############################################################################
def example_3():
    """ Example 3 """
    name = "FXU-ALU"

    cwrapper = microprobe.code.get_wrapper("CInfPpc")
    synth = microprobe.code.Synthesizer(TARGET, cwrapper())

    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=RNDINT))
    synth.add_pass(
        microprobe.passes.structure.SimpleBuildingBlockPass(BENCHMARK_SIZE))
    synth.add_pass(
        microprobe.passes.instruction.SetInstructionTypeByElementPass(
            TARGET, [TARGET.elements['ALU_FXU0_Core0_SCM_Processor']], {}))
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(dd=1))

    print_info("Generating %s..." % name)
    bench = synth.synthesize()
    print_info("%s Generated!" % name)
    synth.save("%s/%s" % (DIRECTORY, name), bench=bench)  # Save the benchmark


###############################################################################
# Example 4: loop with FMUL* instructions with different weights and with     #
#            dependency distance 10                                           #
###############################################################################
def example_4():
    """ Example 4 """
    name = "VSU-FMUL"

    profile = {}
    profile[TARGET.instructions['FMUL_V0']] = 4
    profile[TARGET.instructions['FMULS_V0']] = 3
    profile[TARGET.instructions['FMULx_V0']] = 2
    profile[TARGET.instructions['FMULSx_V0']] = 1

    cwrapper = microprobe.code.get_wrapper("CInfPpc")
    synth = microprobe.code.Synthesizer(TARGET, cwrapper())

    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=RNDINT))
    synth.add_pass(
        microprobe.passes.structure.SimpleBuildingBlockPass(BENCHMARK_SIZE))
    synth.add_pass(
        microprobe.passes.instruction.SetInstructionTypeByProfilePass(profile))
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(dd=10))

    print_info("Generating %s..." % name)
    bench = synth.synthesize()
    print_info("%s Generated!" % name)
    synth.save("%s/%s" % (DIRECTORY, name), bench=bench)  # Save the benchmark


###############################################################################
# Example 5: loop with FADD* instructions with different weights and with     #
#            dependency distance 1                                            #
###############################################################################
def example_5():
    """ Example 5 """
    name = "VSU-FADD"

    profile = {}
    profile[TARGET.instructions['FADD_V0']] = 100
    profile[TARGET.instructions['FADDx_V0']] = 1
    profile[TARGET.instructions['FADDS_V0']] = 10
    profile[TARGET.instructions['FADDSx_V0']] = 1

    cwrapper = microprobe.code.get_wrapper("CInfPpc")
    synth = microprobe.code.Synthesizer(TARGET, cwrapper())

    synth.add_pass(
        microprobe.passes.initialization.InitializeRegistersPass(value=RNDINT))
    synth.add_pass(
        microprobe.passes.structure.SimpleBuildingBlockPass(BENCHMARK_SIZE))
    synth.add_pass(
        microprobe.passes.instruction.SetInstructionTypeByProfilePass(profile))
    synth.add_pass(
        microprobe.passes.register.DefaultRegisterAllocationPass(dd=1))

    print_info("Generating %s..." % name)
    bench = synth.synthesize()
    print_info("%s Generated!" % name)
    synth.save("%s/%s" % (DIRECTORY, name), bench=bench)  # Save the benchmark


###############################################################################
# Call the examples                                                           #
###############################################################################
example_1()
example_2()
example_3()
example_4()
example_5()
exit(0)

power_v206_power7_ppc64_linux_gcc_genetic.py

Deprecated since version 0.5: Support for the PyEvolve and genetic algorithm based searches has been discontinued

The following example shows how to use the design exploration module and the genetic algorithm based searches to look for a solution. In particular, for each functional unit of the architecture and a range of IPCs (instruction per cycle), the example looks for a solution that stresses that functional unit at the given IPC. External commands (not included) are needed to evaluate the generated microbenchmarks in the target platform.

#!/usr/bin/env python
# Copyright 2011-2021 IBM Corporation
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
power_v206_power7_ppc64_linux_gcc_genetic.py

Example python script to show how to generate a set of microbenchmark
stressing a particular unit but at different IPC ratio using a genetic
search algorithm to play with two knobs: average latency and dependency
distance.

An IPC evaluation and scoring script is required. For instance:

.. code:: bash

   #!/bin/bash
   # ARGS: $1 is the target IPC
   #       $2 is the name of the generate benchnark
   target_ipc=$1
   source_bench=$2

   # Compile the benchmark
   gcc -O0 -mcpu=power7 -mtune=power7 -std=c99 $source_bench.c -o $source_bench

   # Evaluate the ipc
   ipc=< your preferred commands to evaluate the IPC >

   # Compute the score (the closer to the target IPC the
   score=(1/($ipc-$target_ipc))^2 | bc -l

   echo $score

Use the script above as a template for your own GA-based search.
"""

# Futures
from __future__ import absolute_import, division

# Built-in modules
import datetime
import os
import sys
import time as runtime
from typing import List, Tuple

# Own modules
import microprobe.code
import microprobe.driver.genetic
import microprobe.passes.ilp
import microprobe.passes.initialization
import microprobe.passes.instruction
import microprobe.passes.register
import microprobe.passes.structure
from microprobe.exceptions import MicroprobeTargetDefinitionError
from microprobe.target import import_definition
from microprobe.utils.cmdline import print_error, print_info, print_warning
from microprobe.utils.misc import RNDINT
from microprobe.utils.typeguard_decorator import typeguard_testsuite

__author__ = "Ramon Bertran"
__copyright__ = "Copyright 2011-2021 IBM Corporation"
__credits__ = []
__license__ = "IBM (c) 2011-2021 All rights reserved"
__version__ = "0.5"
__maintainer__ = "Ramon Bertran"
__email__ = "rbertra@us.ibm.com"
__status__ = "Development"  # "Prototype", "Development", or "Production"

# Benchmark size
BENCHMARK_SIZE = 20

COMMAND = None
DIRECTORY = None

# Get the target definition
try:
    TARGET = import_definition("power_v206-power7-ppc64_linux_gcc")
except MicroprobeTargetDefinitionError as exc:
    print_error("Unable to import target definition")
    print_error("Exception message: %s" % str(exc))
    exit(-1)


@typeguard_testsuite
def main():
    """Main function."""

    component_list = ["FXU", "FXU-noLSU", "FXU-LSU", "VSU", "VSU-FXU"]
    ipcs = [float(x) / 10 for x in range(1, 41)]
    ipcs = ipcs[5:] + ipcs[:5]

    for name in component_list:
        for ipc in ipcs:
            generate_genetic(name, ipc)


@typeguard_testsuite
def generate_genetic(compname: str, ipc: float):
    """Generate a microbenchmark stressing compname at the given ipc."""

    assert COMMAND is not None, "COMMAND variable cannot be None"
    assert DIRECTORY is not None, "DIRECTORY variable cannot be None"

    comps = []
    bcomps = []
    any_comp: bool = False

    assert TARGET.microarchitecture is not None, \
        "Target must have a defined microarchitecture"

    if compname.find("FXU") >= 0:
        comps.append(
            TARGET.microarchitecture.elements["FXU0_Core0_SCM_Processor"])

    if compname.find("VSU") >= 0:
        comps.append(
            TARGET.microarchitecture.elements["VSU0_Core0_SCM_Processor"])

    if len(comps) == 2:
        any_comp = True
    elif compname.find("noLSU") >= 0:
        bcomps.append(
            TARGET.microarchitecture.elements["LSU0_Core0_SCM_Processor"])
    elif compname.find("LSU") >= 0:
        comps.append(
            TARGET.microarchitecture.elements["LSU_Core0_SCM_Processor"])

    if (len(comps) == 1 and ipc > 2) or (len(comps) == 2 and ipc > 4):
        return True

    for elem in os.listdir(DIRECTORY):
        if not elem.endswith(".c"):
            continue
        if elem.startswith("%s:IPC:%.2f:DIST" % (compname, ipc)):
            print_info("Already generated: %s %d" % (compname, ipc))
            return True

    print_info(f"Going for IPC: {ipc} and Element: {compname}")

    def generate(name: str, dist: float, latency: float):
        """Benchmark generation function.

        First argument is name, second the dependency distance and the
        third is the average instruction latency.
        """
        wrapper = microprobe.code.get_wrapper("CInfPpc")
        synth = microprobe.code.Synthesizer(TARGET, wrapper())
        synth.add_pass(
            microprobe.passes.initialization.InitializeRegistersPass(
                value=RNDINT))
        synth.add_pass(
            microprobe.passes.structure.SimpleBuildingBlockPass(
                BENCHMARK_SIZE))
        synth.add_pass(
            microprobe.passes.instruction.SetInstructionTypeByElementPass(
                TARGET,
                comps, {},
                block=bcomps,
                avelatency=latency,
                any_comp=any_comp))
        synth.add_pass(
            microprobe.passes.register.DefaultRegisterAllocationPass(dd=dist))
        bench = synth.synthesize()
        synth.save(name, bench=bench)

    # Set the genetic algorithm parameters
    ga_params: List[Tuple[int, int, float]] = []
    ga_params.append((0, 20, 0.05))  # Average dependency distance design space
    ga_params.append((2, 8, 0.05))  # Average instruction latency design space

    # Set up the search driver
    driver = microprobe.driver.genetic.ExecCmdDriver(
        generate, 20, 30, 30, f"'{COMMAND}' {ipc} ", ga_params)

    starttime = runtime.time()
    print_info("Start search...")
    driver.run(1)
    print_info("Search end")
    endtime = runtime.time()

    print_info("Genetic time::"
               f"{datetime.timedelta(seconds=endtime - starttime)}")

    # Check if we found a solution
    ga_sol_params: Tuple[float, float] = driver.solution()
    score = driver.score()

    print_info(f"IPC found: {ipc}, score: {score}")

    if score < 20:
        print_warning(f"Unable to find an optimal solution with IPC: {ipc}:")
        print_info("Generating the closest solution...")
        generate(
            f"{DIRECTORY}/{compname}:IPC:{ipc:.2f}:"
            f"DIST:{ga_sol_params[0]:.2f}:LAT:{ga_sol_params[1]:.2f}-check",
            ga_sol_params[0], ga_sol_params[1])
        print_info("Closest solution generated")
    else:
        print_info("Solution found for %s and IPC %f -> dist: %f , "
                   "latency: %f " %
                   (compname, ipc, ga_sol_params[0], ga_sol_params[1]))
        print_info("Generating solution...")
        generate(
            f"{DIRECTORY}/{compname}:IPC:{ipc:.2f}:"
            f"DIST:{ga_sol_params[0]:.2f}:LAT:{ga_sol_params[1]:.2f}",
            ga_sol_params[0], ga_sol_params[1])
        print_info("Solution generated")
    return True


if __name__ == '__main__':
    # run main if executed from the COMMAND line
    # and the main method exists

    if len(sys.argv) != 3:
        print_info("Usage:")
        print_info("%s output_dir eval_cmd" % (sys.argv[0]))
        print_info("")
        print_info("Output dir: output directory for the generated benchmarks")
        print_info("eval_cmd: command accepting 2 parameters: the target IPC")
        print_info("          and the filename of the generate benchmark. ")
        print_info("          Output: the score used for the GA search. E.g.")
        print_info("          the close the IPC of the generated benchmark to")
        print_info("          the target IPC, the cmd should give a higher  ")
        print_info("          score. ")
        exit(-1)

    DIRECTORY = sys.argv[1]
    COMMAND = sys.argv[2]

    if not os.path.isdir(DIRECTORY):
        print_info("Output DIRECTORY '%s' does not exists" % (DIRECTORY))
        exit(-1)

    if not os.path.isfile(COMMAND):
        print_info("The COMMAND '%s' does not exists" % (COMMAND))
        exit(-1)

    if callable(locals().get('main')):
        main()