Tool: mp_seqtune
Overview
The mp_seqtune tool provides an interface to generate stressmarks around a base instruction sequence. Given a base instruction sequence, the tool generates variations around it based on the user input parameters. One can modify the memory access patterns, add instructions from time to time, replace other instructions from time to time, maximize the data switching factors, model the branch behavior, etc. All these transformations are usually useful during the search of maximum power stressmarks.
Basic usage
> mp_seqtune -T TARGET -D OUTPUT_DIR -seq INSTRUCTION_SEQUENCE
where:
Flag/Argument |
Description |
---|---|
|
Comma separated list of instructions that define the base sequence. |
|
Target definition string. Check: Command line target definition scheme. |
|
Output directory. |
There are other parameters to tune the code being generated. Check the rest of this document for details. The example section provides a detailed use case scenario that uses some of the extra parameters.
Full usage
mp_seqtune.py: INFO: Processing input arguments...
usage: mp_seqtune.py [-h] [-P SEARCH_PATH [SEARCH_PATH ...]] [-V] [-v] [-d]
[-c CONFIG_FILE [CONFIG_FILE ...]] [-C FORCE_CONFIG_FILE]
[--dump-configuration-file OUTPUT_CONFIG_FILE]
[--dump-full-configuration-file OUTPUT_CONFIG_FILE]
[-A ARCHITECTURE_PATHS] [-M MICROARCHITECTURE_PATHS]
[-E ENVIRONMENT_PATHS] -T TARGET [--list-architectures]
[--list-microarchitectures] [--list-environments]
[--traceback] [--profile PROFILE_OUTPUT] -D
SEQ_OUTPUT_DIR -seq SEQUENCE [SEQUENCE ...] [-r REPEAT]
[-re REPLACE_EVERY] [-ae ADD_EVERY] [-be BRANCH_EVERY]
[-bp BRANCH_PATTERN] [-ds] [-sb] [-ms]
[-me MEMORY_STREAM] [-B BENCHMARK_SIZE]
[-dd DEPENDENCY_DISTANCE] [-R] [-e] [-p]
[-bn BATCH_NUMBER] [-nb NUM_BATCHES] [-s] [-sn] [-CC]
[-N] [-Mm MAX_MEMORY] [-mm MIN_MEMORY] [-Mr MAX_REGSETS]
[-ie]
Microprobe seqtune tool
optional arguments:
-h, --help show this help message and exit
-P SEARCH_PATH [SEARCH_PATH ...], --default_paths SEARCH_PATH [SEARCH_PATH ...]
Default search paths for microprobe target definitions
-V, --version Show Microprobe version and exit
-v, --verbosity Verbosity level (Values: [0,1,2,3,4]). Each time this
argument is specified the verbosity level is
increased. By default, no logging messages are shown.
These are the four levels available:
-v (1): critical messages
-v -v (2): critical and error messages
-v -v -v (3): critical, error and warning messages
-v -v -v -v (4): critical, error, warning and info messages
Specifying more than four verbosity flags, will
default to the maximum of four. If you need extra
information, enable the debug mode (--debug or -d
flags).
-d, --debug Enable debug mode in Microprobe framework. Lots of
output messages will be generated
Configuration arguments:
Command arguments related to configuration file handling
-c CONFIG_FILE [CONFIG_FILE ...], --configuration CONFIG_FILE [CONFIG_FILE ...]
Configuration file. The configuration files will be
readed in order of appearance. Values are reset by the
last configuration file in case of non-list values.
List values will be appended (not reset)
-C FORCE_CONFIG_FILE, --force-configuration FORCE_CONFIG_FILE
Force configuration file. Use this configuration file
as the default start configuration. This disables any
system-wide, or user-provided configuration.
--dump-configuration-file OUTPUT_CONFIG_FILE
Dump a configuration file with the actual
configuration used
--dump-full-configuration-file OUTPUT_CONFIG_FILE
Dump a configuration file with the actual
configuration used plus all the configuration options
not set
Target path arguments:
Command arguments related to target paths
-A ARCHITECTURE_PATHS, --architecture-paths ARCHITECTURE_PATHS
Search path for architecture definitions. Microprobe
will search in these paths for architecture
definitions
-M MICROARCHITECTURE_PATHS, --microarchitecture-paths MICROARCHITECTURE_PATHS
Search path for microarchitecture definitions.
Microprobe will search in these paths for
microarchitecture definitions
-E ENVIRONMENT_PATHS, --environment-paths ENVIRONMENT_PATHS
Search path for environment definitions. Microprobe
will search in these paths for environment definitions
Target arguments:
Command arguments related to target specification and queries
-T TARGET, --target TARGET
Target tuple. Microprobe follows a GCC-like target
definition scheme, where a target is defined by a
tuple as following:
<arch-name>-<uarch-name>-<env-name>
where:
<arch-name>: is the name of the architecture
<uarch-name>: is the name of the microarchitecture
<env-name>: is the name of the environment
One can use --list-* options to get the list of
definitions available in the default search paths or
the paths specified by the different --*-paths options
--list-architectures Generate a list of architectures available in the
defined search paths and exit
--list-microarchitectures
Generate a list of microarchitectures available in the
defined search paths and exit
--list-environments Generate a list of environments available in the
defined search paths and exit
Debug arguments:
Command arguments related to debugging facilities
--traceback show a traceback and starts a python debugger (pdb)
when an error occurs. 'pdb' is an interactive python
shell that facilitates the debugging of errors
--profile PROFILE_OUTPUT
dump profiling information into given file (see
'pstats' module)
SEQTUNE arguments:
Command arguments related to Sequence tuner generator
-D SEQ_OUTPUT_DIR, --seq-output-dir SEQ_OUTPUT_DIR
Output directory name
-seq SEQUENCE [SEQUENCE ...], --sequence SEQUENCE [SEQUENCE ...]
Base instruction sequence to modify (command separated
list of instructions). If multiple sequences are
provided (separated by a space) they are combined
(product).
-r REPEAT, --repeat REPEAT
If multiple sequences are provided in --sequence, this
parameter specifies how many of them are concated to
generate the final sequence.
-re REPLACE_EVERY, --replace-every REPLACE_EVERY
Replace every. String with the format
'INSTR1:INSTR2:RANGE' to specfy that INSTR1 will be
replaced by INSTR2 every RANGE instructions. Range can
be just an integer or a RANGE specifier of the form:
#1:#2 to generate a range from #1 to #2, or #1:#2:#3
to generate a range between #1 to #2 with step #3.
E.g. 10:20 generates 10, 11, 12 ... 19, 20 and 10:20:2
generates 10, 12, 14, ... 18, 20.
-ae ADD_EVERY, --add-every ADD_EVERY
Add every. String with the format 'INSTR1:RANGE' to
specfy that INSTR1 will be added to the sequence every
RANGE instructions. Range can be just an integer or a
RANGE specifier of the form: #1-#2 to generate a range
from #1 to #2, or #1-#2-#3 to generate a range between
#1 to #2 with step #3. E.g. 10:20 generates 10, 11, 12
... 19, 20 and 10:20:2 generates 10, 12, 14, ... 18,
20.
-be BRANCH_EVERY, --branch-every BRANCH_EVERY
Conditional branches are modeled not taken by default.
Using this paratemeter, every N will be taken.
-bp BRANCH_PATTERN, --branch-pattern BRANCH_PATTERN
Branch pattern (in binary) to be generated. E.g 0010
will model the conditional branches as NotTaken
NotTaken Taken NotTaken in a round robin fashion. One
can use 'L<range>' to generate all the patterns
possible of length #. E.g. 'L2-5' will generate all
unique possible branch patterns of length 2,3,4 and 5.
-ds, --data-switch Enable data switching for instruction. It tries to
maximize the data switching factor on inputs and
outputs of instructions.
-sb, --switch-branch Switch branch pattern in each iteration.
-ms, --memory-switch Enable data switching for memory operations. It tries
to maximize the data switching factor on loads and
store operations.
-me MEMORY_STREAM, --memory-stream MEMORY_STREAM
Memory stream definition. String with the format
NUM:SIZE:WEIGHT:STRIDE:REGS:RND:LOC1:LOC2 where NUM is
the number of streams of this type, SIZE is the
working set size of the stream in bytes, WEIGHT is the
probability of the stream. E.g. streams with the
sameweight will have same probability to be generated.
STRIDE is the stride access patern between the
elements of the stream and REGS is the number of
register sets (address base + address index) to be
used for each stream. RND controls the randomess of
the generated memory access stream. -1 is full
randomness, 0 is not randomness, and any value above 0
control the randomness range. E.g. a value of 1024
randomizes the accesses within 1024 bytes memory
ranges.LOC1 and LOC2 control the temporal locality of
the memory access stream in the following way: the
last LOC1 accesses are going to be accessed again LOC2
times before moving forward to the next addresses. If
LOC2 is 0 not temporal locality is generated besides
the implicit one from the memory access stream
definition. All the elements of this format stream can
be just a number or a range specfication (start-end)
or (start-end-step). This flag can be specified
multiple times
-B BENCHMARK_SIZE, --benchmark-size BENCHMARK_SIZE
Size in instructions of the microbenchmark main loop.
-dd DEPENDENCY_DISTANCE, --dependency-distance DEPENDENCY_DISTANCE
Average dependency distance between instructions. A
value below 1 means not dependency between
instructions. A value of 1 means a chain of dependent
instructions.
-R, --reset Reset the register contents on each loop iteration
-e, --endless Some backends allow the control to wrap the sequence
generated in an endless loop. Depending on the target
specified, this flag will force to generate sequences
in an endless loop (some targets might ignore it)
-p, --parallel Generate benchmarks in parallel
-bn BATCH_NUMBER, --batch-number BATCH_NUMBER
Batch number to generate. Check --num-batches option
for more details
-nb NUM_BATCHES, --num-batches NUM_BATCHES
Number of batches. The number of microbenchmark to
generate is divided by this number, and the number the
batch number specified using -bn option is generated.
This is useful to split the generation of many test
cases in various batches.
-s, --skip Skip benchmarks already generated
-sn, --shortnames Use short output names
-CC, --compress Compress output files
-N, --count Only count the number of sequence to generate. Do not
generate anything
-Mm MAX_MEMORY, --max-memory MAX_MEMORY
Maximum memory for all the streams generated
-mm MIN_MEMORY, --min-memory MIN_MEMORY
Minimum memory for all the streams generated
-Mr MAX_REGSETS, --max-regsets MAX_REGSETS
Maximum number of regsets for all the streams
generated
-ie, --ignore-errors Ignore error during code generation (continue in case
of an error in a particular parameter combination)
Environment variables:
MICROPROBETEMPLATES Default path for microprobe templates
MICROPROBEDEBUG If set, enable debug
MICROPROBEDEBUGPASSES If set, enable debug during passes
MICROPROBEASMHEXFMT Assembly hexadecimal format. Options:
'all' -> All immediates in hex format
'address' -> Address immediates in hex format (default)
'none' -> All immediate in integer format
Example use case
This use case is using the power_v300-power9-ppc64_linux_gcc target for illustrative purposes. The same can be done on other targets.
Let’s assume that you have analyzed different instruction sequences (check Tool: mp_seq) on the target and you have decided a base instruction sequence to be the following:
> SUBFIC_V0,LVXL_V0,LWA_V0,SUBFIC_V0,LXVW4X_V0,VMHADDSHS_V0
Then, you want to generate variations around that base sequence that generate different memory access patterns that do the following:
Four memory access patterns that access, each of them, a memory range from 2K to 32K in steps of 1K.
Each memory access pattern access its own memory range in a round-robin fashion using a minimum stride of 144 bytes.
Each memory access pattern has the same probability to be used.
Each memory access pattern use a single set for base/index registers.
No added randomness in the memory access pattern.
No added temporal locality in the memory access pattern.
To do so, we need to issue the following command:
> mp_seqtune -T power_v300-power9-ppc64_linux_gcc -D . -seq SUBFIC_V0,LVXL_V0,LWA_V0,SUBFIC_V0,LXVW4X_V0,VMHADDSHS_V0 -me 4:2048-32768-1024:1:144:1:0:1:0
This will generate 31 microbenchmarks in the current directory. One with 4 streams accessing each 2K memory region, one with 4 streams accessing each a 3K memory reagion, etc. up to 32K memory region.
In the command above, we used the -me
parameter to specify the
variations to be generated around the memory behavior. The parameter value
is split in 8 fields using :
symbol. The meaning of the fields is the
following:
4 : number of memory streams
2048-32768-1024 : memory sizes for each stream. This tuple is interpreted as
<start>-<end>-<step>
. Note that the<step>
field is optional. This format can be used in other parameters and fields.1 : weight these streams. This directly translates to the probability of a given stream to be used. E.g. if we define 3 memory streams with one having a weight of 2 and the other 2 a weight of 1, the probability will be: 50%, 25% and 25% for each of them. This results in that every 4 memory accesses 2 will use stream 1, and the other 2 stream 2 and 3, respectively.
144 : stream stride in bytes. Stride between consecutive accesses in the same stream. In this case, the stream will access positions 0, 144, 288, etc. until the maximum size is reached. Then, it will start from zero again.
1 : number of register sets to be use for the stream. Streams require the reservation of base/index registers for address computations. If there are enough registers available, one might want to increase this number to increase the ILP between address computation and usage.
0 : No randomness. Memory accesses will be performance sequentially using stride specified. If the value is set to -1, the memory access stream is completelly random. If set to a value > 0, the memory access stream is random within the specified range.
1:0 : No added temporaral locality since last 1 memory access will be repeated 0 times before moving to next memory address.
Note
One can use the -N
flag to check the sequence definition and the number
of sequences that are going to be generated before starting the generation
process.