Tool: mp_seqtune

Overview

The mp_seqtune tool provides an interface to generate stressmarks around a base instruction sequence. Given a base instruction sequence, the tool generates variations around it based on the user input parameters. One can modify the memory access patterns, add instructions from time to time, replace other instructions from time to time, maximize the data switching factors, model the branch behavior, etc. All these transformations are usually useful during the search of maximum power stressmarks.

Basic usage

> mp_seqtune -T TARGET -D OUTPUT_DIR -seq INSTRUCTION_SEQUENCE

where:

Flag/Argument	Description
`-seq INSTRS`, `--sequence INSTRS`	Comma separated list of instructions that define the base sequence.
`-T TARGET`, `--target TARGET`	Target definition string. Check: Command line target definition scheme.
`-D OUTPUT_DIR`, `--seq-output-dir OUTPUTDIR`	Output directory.

There are other parameters to tune the code being generated. Check the rest of this document for details. The example section provides a detailed use case scenario that uses some of the extra parameters.

Full usage

mp_seqtune.py: INFO: Processing input arguments...
usage: mp_seqtune.py [-h] [-P SEARCH_PATH [SEARCH_PATH ...]] [-V] [-v] [-d]
                     [-c CONFIG_FILE [CONFIG_FILE ...]] [-C FORCE_CONFIG_FILE]
                     [--dump-configuration-file OUTPUT_CONFIG_FILE]
                     [--dump-full-configuration-file OUTPUT_CONFIG_FILE]
                     [-A ARCHITECTURE_PATHS] [-M MICROARCHITECTURE_PATHS]
                     [-E ENVIRONMENT_PATHS] -T TARGET [--list-architectures]
                     [--list-microarchitectures] [--list-environments]
                     [--traceback] [--profile PROFILE_OUTPUT] -D
                     SEQ_OUTPUT_DIR -seq SEQUENCE [SEQUENCE ...] [-r REPEAT]
                     [-re REPLACE_EVERY] [-ae ADD_EVERY] [-be BRANCH_EVERY]
                     [-bp BRANCH_PATTERN] [-ds] [-sb] [-ms]
                     [-me MEMORY_STREAM] [-B BENCHMARK_SIZE]
                     [-dd DEPENDENCY_DISTANCE] [-R] [-e] [-p]
                     [-bn BATCH_NUMBER] [-nb NUM_BATCHES] [-s] [-sn] [-CC]
                     [-N] [-Mm MAX_MEMORY] [-mm MIN_MEMORY] [-Mr MAX_REGSETS]
                     [-ie]

Microprobe seqtune tool

optional arguments:
  -h, --help            show this help message and exit
  -P SEARCH_PATH [SEARCH_PATH ...], --default_paths SEARCH_PATH [SEARCH_PATH ...]
                        Default search paths for microprobe target definitions
  -V, --version         Show Microprobe version and exit
  -v, --verbosity       Verbosity level (Values: [0,1,2,3,4]). Each time this
                        argument is specified the verbosity level is
                        increased. By default, no logging messages are shown.
                        These are the four levels available:
                        
                          -v (1): critical messages
                          -v -v (2): critical and error messages
                          -v -v -v (3): critical, error and warning messages
                          -v -v -v -v (4): critical, error, warning and info messages
                        
                        Specifying more than four verbosity flags, will
                        default to the maximum of four. If you need extra
                        information, enable the debug mode (--debug or -d
                        flags).
  -d, --debug           Enable debug mode in Microprobe framework. Lots of
                        output messages will be generated

Configuration arguments:

  Command arguments related to configuration file handling

  -c CONFIG_FILE [CONFIG_FILE ...], --configuration CONFIG_FILE [CONFIG_FILE ...]
                        Configuration file. The configuration files will be
                        readed in order of appearance. Values are reset by the
                        last configuration file in case of non-list values.
                        List values will be appended (not reset)
  -C FORCE_CONFIG_FILE, --force-configuration FORCE_CONFIG_FILE
                        Force configuration file. Use this configuration file
                        as the default start configuration. This disables any
                        system-wide, or user-provided configuration.
  --dump-configuration-file OUTPUT_CONFIG_FILE
                        Dump a configuration file with the actual
                        configuration used
  --dump-full-configuration-file OUTPUT_CONFIG_FILE
                        Dump a configuration file with the actual
                        configuration used plus all the configuration options
                        not set

Target path arguments:

  Command arguments related to target paths

  -A ARCHITECTURE_PATHS, --architecture-paths ARCHITECTURE_PATHS
                        Search path for architecture definitions. Microprobe
                        will search in these paths for architecture
                        definitions
  -M MICROARCHITECTURE_PATHS, --microarchitecture-paths MICROARCHITECTURE_PATHS
                        Search path for microarchitecture definitions.
                        Microprobe will search in these paths for
                        microarchitecture definitions
  -E ENVIRONMENT_PATHS, --environment-paths ENVIRONMENT_PATHS
                        Search path for environment definitions. Microprobe
                        will search in these paths for environment definitions

Target arguments:

  Command arguments related to target specification and queries

  -T TARGET, --target TARGET
                        Target tuple. Microprobe follows a GCC-like target
                        definition scheme, where a target is defined by a
                        tuple as following:
                        
                          <arch-name>-<uarch-name>-<env-name>
                        
                        where:
                        
                          <arch-name>: is the name of the architecture
                          <uarch-name>: is the name of the microarchitecture
                          <env-name>: is the name of the environment
                        
                        One can use --list-* options to get the list of
                        definitions available in the default search paths or
                        the paths specified by the different --*-paths options
  --list-architectures  Generate a list of architectures available in the
                        defined search paths and exit
  --list-microarchitectures
                        Generate a list of microarchitectures available in the
                        defined search paths and exit
  --list-environments   Generate a list of environments available in the
                        defined search paths and exit

Debug arguments:

  Command arguments related to debugging facilities

  --traceback           show a traceback and starts a python debugger (pdb)
                        when an error occurs. 'pdb' is an interactive python
                        shell that facilitates the debugging of errors
  --profile PROFILE_OUTPUT
                        dump profiling information into given file (see
                        'pstats' module)

SEQTUNE arguments:

  Command arguments related to Sequence tuner generator

  -D SEQ_OUTPUT_DIR, --seq-output-dir SEQ_OUTPUT_DIR
                        Output directory name
  -seq SEQUENCE [SEQUENCE ...], --sequence SEQUENCE [SEQUENCE ...]
                        Base instruction sequence to modify (command separated
                        list of instructions). If multiple sequences are
                        provided (separated by a space) they are combined
                        (product).
  -r REPEAT, --repeat REPEAT
                        If multiple sequences are provided in --sequence, this
                        parameter specifies how many of them are concated to
                        generate the final sequence.
  -re REPLACE_EVERY, --replace-every REPLACE_EVERY
                        Replace every. String with the format
                        'INSTR1:INSTR2:RANGE' to specfy that INSTR1 will be
                        replaced by INSTR2 every RANGE instructions. Range can
                        be just an integer or a RANGE specifier of the form:
                        #1:#2 to generate a range from #1 to #2, or #1:#2:#3
                        to generate a range between #1 to #2 with step #3.
                        E.g. 10:20 generates 10, 11, 12 ... 19, 20 and 10:20:2
                        generates 10, 12, 14, ... 18, 20.
  -ae ADD_EVERY, --add-every ADD_EVERY
                        Add every. String with the format 'INSTR1:RANGE' to
                        specfy that INSTR1 will be added to the sequence every
                        RANGE instructions. Range can be just an integer or a
                        RANGE specifier of the form: #1-#2 to generate a range
                        from #1 to #2, or #1-#2-#3 to generate a range between
                        #1 to #2 with step #3. E.g. 10:20 generates 10, 11, 12
                        ... 19, 20 and 10:20:2 generates 10, 12, 14, ... 18,
                        20.
  -be BRANCH_EVERY, --branch-every BRANCH_EVERY
                        Conditional branches are modeled not taken by default.
                        Using this paratemeter, every N will be taken.
  -bp BRANCH_PATTERN, --branch-pattern BRANCH_PATTERN
                        Branch pattern (in binary) to be generated. E.g 0010
                        will model the conditional branches as NotTaken
                        NotTaken Taken NotTaken in a round robin fashion. One
                        can use 'L<range>' to generate all the patterns
                        possible of length #. E.g. 'L2-5' will generate all
                        unique possible branch patterns of length 2,3,4 and 5.
  -ds, --data-switch    Enable data switching for instruction. It tries to
                        maximize the data switching factor on inputs and
                        outputs of instructions.
  -sb, --switch-branch  Switch branch pattern in each iteration.
  -ms, --memory-switch  Enable data switching for memory operations. It tries
                        to maximize the data switching factor on loads and
                        store operations.
  -me MEMORY_STREAM, --memory-stream MEMORY_STREAM
                        Memory stream definition. String with the format
                        NUM:SIZE:WEIGHT:STRIDE:REGS:RND:LOC1:LOC2 where NUM is
                        the number of streams of this type, SIZE is the
                        working set size of the stream in bytes, WEIGHT is the
                        probability of the stream. E.g. streams with the
                        sameweight will have same probability to be generated.
                        STRIDE is the stride access patern between the
                        elements of the stream and REGS is the number of
                        register sets (address base + address index) to be
                        used for each stream. RND controls the randomess of
                        the generated memory access stream. -1 is full
                        randomness, 0 is not randomness, and any value above 0
                        control the randomness range. E.g. a value of 1024
                        randomizes the accesses within 1024 bytes memory
                        ranges.LOC1 and LOC2 control the temporal locality of
                        the memory access stream in the following way: the
                        last LOC1 accesses are going to be accessed again LOC2
                        times before moving forward to the next addresses. If
                        LOC2 is 0 not temporal locality is generated besides
                        the implicit one from the memory access stream
                        definition. All the elements of this format stream can
                        be just a number or a range specfication (start-end)
                        or (start-end-step). This flag can be specified
                        multiple times
  -B BENCHMARK_SIZE, --benchmark-size BENCHMARK_SIZE
                        Size in instructions of the microbenchmark main loop.
  -dd DEPENDENCY_DISTANCE, --dependency-distance DEPENDENCY_DISTANCE
                        Average dependency distance between instructions. A
                        value below 1 means not dependency between
                        instructions. A value of 1 means a chain of dependent
                        instructions.
  -R, --reset           Reset the register contents on each loop iteration
  -e, --endless         Some backends allow the control to wrap the sequence
                        generated in an endless loop. Depending on the target
                        specified, this flag will force to generate sequences
                        in an endless loop (some targets might ignore it)
  -p, --parallel        Generate benchmarks in parallel
  -bn BATCH_NUMBER, --batch-number BATCH_NUMBER
                        Batch number to generate. Check --num-batches option
                        for more details
  -nb NUM_BATCHES, --num-batches NUM_BATCHES
                        Number of batches. The number of microbenchmark to
                        generate is divided by this number, and the number the
                        batch number specified using -bn option is generated.
                        This is useful to split the generation of many test
                        cases in various batches.
  -s, --skip            Skip benchmarks already generated
  -sn, --shortnames     Use short output names
  -CC, --compress       Compress output files
  -N, --count           Only count the number of sequence to generate. Do not
                        generate anything
  -Mm MAX_MEMORY, --max-memory MAX_MEMORY
                        Maximum memory for all the streams generated
  -mm MIN_MEMORY, --min-memory MIN_MEMORY
                        Minimum memory for all the streams generated
  -Mr MAX_REGSETS, --max-regsets MAX_REGSETS
                        Maximum number of regsets for all the streams
                        generated
  -ie, --ignore-errors  Ignore error during code generation (continue in case
                        of an error in a particular parameter combination)

Environment variables:

  MICROPROBETEMPLATES    Default path for microprobe templates
  MICROPROBEDEBUG        If set, enable debug
  MICROPROBEDEBUGPASSES  If set, enable debug during passes
  MICROPROBEASMHEXFMT    Assembly hexadecimal format. Options:
                         'all' -> All immediates in hex format
                         'address' -> Address immediates in hex format (default)
                         'none' -> All immediate in integer format

Example use case

This use case is using the power_v300-power9-ppc64_linux_gcc target for illustrative purposes. The same can be done on other targets.

Let’s assume that you have analyzed different instruction sequences (check Tool: mp_seq) on the target and you have decided a base instruction sequence to be the following:

> SUBFIC_V0,LVXL_V0,LWA_V0,SUBFIC_V0,LXVW4X_V0,VMHADDSHS_V0

Then, you want to generate variations around that base sequence that generate different memory access patterns that do the following:

Four memory access patterns that access, each of them, a memory range from 2K to 32K in steps of 1K.
Each memory access pattern access its own memory range in a round-robin fashion using a minimum stride of 144 bytes.
Each memory access pattern has the same probability to be used.
Each memory access pattern use a single set for base/index registers.
No added randomness in the memory access pattern.
No added temporal locality in the memory access pattern.

To do so, we need to issue the following command:

> mp_seqtune -T power_v300-power9-ppc64_linux_gcc -D . -seq SUBFIC_V0,LVXL_V0,LWA_V0,SUBFIC_V0,LXVW4X_V0,VMHADDSHS_V0 -me 4:2048-32768-1024:1:144:1:0:1:0

This will generate 31 microbenchmarks in the current directory. One with 4 streams accessing each 2K memory region, one with 4 streams accessing each a 3K memory reagion, etc. up to 32K memory region.

In the command above, we used the -me parameter to specify the variations to be generated around the memory behavior. The parameter value is split in 8 fields using : symbol. The meaning of the fields is the following:

4 : number of memory streams
2048-32768-1024 : memory sizes for each stream. This tuple is interpreted as <start>-<end>-<step>. Note that the <step> field is optional. This format can be used in other parameters and fields.
1 : weight these streams. This directly translates to the probability of a given stream to be used. E.g. if we define 3 memory streams with one having a weight of 2 and the other 2 a weight of 1, the probability will be: 50%, 25% and 25% for each of them. This results in that every 4 memory accesses 2 will use stream 1, and the other 2 stream 2 and 3, respectively.
144 : stream stride in bytes. Stride between consecutive accesses in the same stream. In this case, the stream will access positions 0, 144, 288, etc. until the maximum size is reached. Then, it will start from zero again.
1 : number of register sets to be use for the stream. Streams require the reservation of base/index registers for address computations. If there are enough registers available, one might want to increase this number to increase the ILP between address computation and usage.
0 : No randomness. Memory accesses will be performance sequentially using stride specified. If the value is set to -1, the memory access stream is completelly random. If set to a value > 0, the memory access stream is random within the specified range.
1:0 : No added temporaral locality since last 1 memory access will be repeated 0 times before moving to next memory address.

Note

One can use the -N flag to check the sequence definition and the number of sequences that are going to be generated before starting the generation process.