Tool: mp_seq
Overview
The mp_seq tool provides an interface to generate stressmarks that execute sequences that combine the instructions provided in a certain way to comply the requirements provided. Sequence stressmarks are loops with a particular sequence repeated several times. They are ideal to perform core characterizations during the first steps of the maximum power generation process (i.e. find the sequence that maximizes the power consumption of the cores). Users can control the level of instruction-level parallelism and the loop size via configurable parameters.
Basic usage
> mp_seq -T TARGET -D OUTPUT_DIR -ig group1 group2
where:
Flag/Argument |
Description |
---|---|
|
Comma separated list of instruction candidates per group. E.g. -ins ins1,ins2 ins3,ins4. Defines two groups of instruction candidates: group 1: ins1,ins2 and group 2: ins3,ins4. |
|
Target definition string. Check: Command line target definition scheme. |
|
Output directory. |
There are other parameters to tune the code being generated. Check the rest of this document for details. The example section provides a very detailed use case scenario that uses all the parameters.
Full usage
mp_seq.py: INFO: Processing input arguments...
usage: mp_seq.py [-h] [-P SEARCH_PATH [SEARCH_PATH ...]] [-V] [-v] [-d]
[-c CONFIG_FILE [CONFIG_FILE ...]] [-C FORCE_CONFIG_FILE]
[--dump-configuration-file OUTPUT_CONFIG_FILE]
[--dump-full-configuration-file OUTPUT_CONFIG_FILE]
[-A ARCHITECTURE_PATHS] [-M MICROARCHITECTURE_PATHS]
[-E ENVIRONMENT_PATHS] -T TARGET [--list-architectures]
[--list-microarchitectures] [--list-environments]
[--traceback] [--profile PROFILE_OUTPUT] -D SEQ_OUTPUT_DIR
[-is INSTRUCTION_SLOTS]
[-ig INSTRUCTION_GROUPS [INSTRUCTION_GROUPS ...]]
[-im INSTRUCTION_MAP [INSTRUCTION_MAP ...]] [-bs BASE_SEQ]
[-gM GROUP_MAX [GROUP_MAX ...]]
[-gm GROUP_MIN [GROUP_MIN ...]] [-B BENCHMARK_SIZE]
[-dd DEPENDENCY_DISTANCE] [-fs] [-e] [-R] [-p]
[-bn BATCH_NUMBER] [-nb NUM_BATCHES] [-s] [-combinations]
[-sn] [-CC] [-N]
Microprobe seq tool
optional arguments:
-h, --help show this help message and exit
-P SEARCH_PATH [SEARCH_PATH ...], --default_paths SEARCH_PATH [SEARCH_PATH ...]
Default search paths for microprobe target definitions
-V, --version Show Microprobe version and exit
-v, --verbosity Verbosity level (Values: [0,1,2,3,4]). Each time this
argument is specified the verbosity level is
increased. By default, no logging messages are shown.
These are the four levels available:
-v (1): critical messages
-v -v (2): critical and error messages
-v -v -v (3): critical, error and warning messages
-v -v -v -v (4): critical, error, warning and info messages
Specifying more than four verbosity flags, will
default to the maximum of four. If you need extra
information, enable the debug mode (--debug or -d
flags).
-d, --debug Enable debug mode in Microprobe framework. Lots of
output messages will be generated
Configuration arguments:
Command arguments related to configuration file handling
-c CONFIG_FILE [CONFIG_FILE ...], --configuration CONFIG_FILE [CONFIG_FILE ...]
Configuration file. The configuration files will be
readed in order of appearance. Values are reset by the
last configuration file in case of non-list values.
List values will be appended (not reset)
-C FORCE_CONFIG_FILE, --force-configuration FORCE_CONFIG_FILE
Force configuration file. Use this configuration file
as the default start configuration. This disables any
system-wide, or user-provided configuration.
--dump-configuration-file OUTPUT_CONFIG_FILE
Dump a configuration file with the actual
configuration used
--dump-full-configuration-file OUTPUT_CONFIG_FILE
Dump a configuration file with the actual
configuration used plus all the configuration options
not set
Target path arguments:
Command arguments related to target paths
-A ARCHITECTURE_PATHS, --architecture-paths ARCHITECTURE_PATHS
Search path for architecture definitions. Microprobe
will search in these paths for architecture
definitions
-M MICROARCHITECTURE_PATHS, --microarchitecture-paths MICROARCHITECTURE_PATHS
Search path for microarchitecture definitions.
Microprobe will search in these paths for
microarchitecture definitions
-E ENVIRONMENT_PATHS, --environment-paths ENVIRONMENT_PATHS
Search path for environment definitions. Microprobe
will search in these paths for environment definitions
Target arguments:
Command arguments related to target specification and queries
-T TARGET, --target TARGET
Target tuple. Microprobe follows a GCC-like target
definition scheme, where a target is defined by a
tuple as following:
<arch-name>-<uarch-name>-<env-name>
where:
<arch-name>: is the name of the architecture
<uarch-name>: is the name of the microarchitecture
<env-name>: is the name of the environment
One can use --list-* options to get the list of
definitions available in the default search paths or
the paths specified by the different --*-paths options
--list-architectures Generate a list of architectures available in the
defined search paths and exit
--list-microarchitectures
Generate a list of microarchitectures available in the
defined search paths and exit
--list-environments Generate a list of environments available in the
defined search paths and exit
Debug arguments:
Command arguments related to debugging facilities
--traceback show a traceback and starts a python debugger (pdb)
when an error occurs. 'pdb' is an interactive python
shell that facilitates the debugging of errors
--profile PROFILE_OUTPUT
dump profiling information into given file (see
'pstats' module)
SEQ arguments:
Command arguments related to Sequence generation
-D SEQ_OUTPUT_DIR, --seq-output-dir SEQ_OUTPUT_DIR
Output directory name
-is INSTRUCTION_SLOTS, --instruction-slots INSTRUCTION_SLOTS
Number of instructions slots in the sequence. E.g.
'-is 4' will generate sequences of length 4.
-ig INSTRUCTION_GROUPS [INSTRUCTION_GROUPS ...], --instruction-groups INSTRUCTION_GROUPS [INSTRUCTION_GROUPS ...]
Comma separated list of instruction candidates per
group. E.g. -ins ins1,ins2 ins3,ins4. Defines two
groups of instruction candidates: group 1: ins1,ins2
and group 2: ins3,ins4.
-im INSTRUCTION_MAP [INSTRUCTION_MAP ...], --instruction-map INSTRUCTION_MAP [INSTRUCTION_MAP ...]
Comma separated list specifying groups instruction
candidate groups to be used on each instruction slot.
The list length should match the number of instruction
slots defined. A -1 value means all groups can be used
for that slot. E.g. -im 1 2,3 will generate sequences
containing in slot 1 instructions from group 1, and in
slot 2 instructions from groups 2 and 3.
-bs BASE_SEQ, --base-seq BASE_SEQ
Comma separated list specifying the base instruction
sequence
-gM GROUP_MAX [GROUP_MAX ...], --group-max GROUP_MAX [GROUP_MAX ...]
Comma separated list specifying the maximum number of
instructions of each group to be used. E.g. -gM 1,3
will generate sequences containing at most 1
instruction of group 1 and 3 instructions of group 2.
A -1 value means no maximum. The list length should
match the number of instruction groups defined.
-gm GROUP_MIN [GROUP_MIN ...], --group-min GROUP_MIN [GROUP_MIN ...]
Comma separated list specifying the minimum number of
instructions of each group to be used. E.g. -gm 1,3
will generate sequences containing at least 1
instruction of group 1 and 3 instructions of group 2.
A -1 value means no minimum. The list length should
match the number of instruction groups defined.
-B BENCHMARK_SIZE, --benchmark-size BENCHMARK_SIZE
Size in instructions of the microbenchmark. If more
instruction are needed, nested loops are automatically
generated
-dd DEPENDENCY_DISTANCE, --dependency-distance DEPENDENCY_DISTANCE
Average dependency distance between instructions. A
value below 1 means not dependency between
instructions. A value of 1 means a chain of dependent
instructions.
-fs, --force-switch Force data switching in all instructions, fail if not
supported.
-e, --endless Some backends allow the control to wrap the sequence
generated in an endless loop. Depending on the target
specified, this flag will force to generate sequences
in an endless loop (some targets might ignore it)
-R, --reset Reset the register contents on each loop iteration
-p, --parallel Generate benchmarks in parallel
-bn BATCH_NUMBER, --batch-number BATCH_NUMBER
Batch number to generate. Check --num-batches option
for more details
-nb NUM_BATCHES, --num-batches NUM_BATCHES
Number of batches. The number of microbenchmark to
generate is divided by this number, and the number the
batch number specified using -bn option is generated.
This is useful to split the generation of many test
cases in various batches.
-s, --skip Skip benchmarks already generated
-combinations, --combinations
Only generate combinations of the given instructions
-sn, --shortnames Use short output names
-CC, --compress Compress output files
-N, --count Only count the number of sequence to generate. Do not
generate anything
Environment variables:
MICROPROBETEMPLATES Default path for microprobe templates
MICROPROBEDEBUG If set, enable debug
MICROPROBEDEBUGPASSES If set, enable debug during passes
MICROPROBEASMHEXFMT Assembly hexadecimal format. Options:
'all' -> All immediates in hex format
'address' -> Address immediates in hex format (default)
'none' -> All immediate in integer format
Example use case
This use case is using the power_v300-power9-ppc64_linux_gcc target for illustrative purposes. The same can be done on other targets.
Let’s assume that you have analyzed the ISA of the target and you have selected some instruction candidates you want to use for your exploration. For instance the following:
Fix point instructions: ADDI_V0, ORI_V0, AND_V0
Vector instructions: XVMULDP_V0, XVDIVDP_V0, XVMADDADP_V0
Load instructions: LD_V0, LDX_V0, LBZ_V0
Branch instructions: B_V0, BC_V0
Then, you have to decide the instruction sequence length. In this case we pick 6 because we know that the target machine can commit up to 6 instruction per cycle. You can increase that number further but the number of combinations will explode.
To generate all the unique combination of length 6 of the instructions above you need to issue the following command:
> mp_seq -T power_v300-power9-ppc64_linux_gcc -p -s -D <output_dir> -is 6 -ig ADDI_V0,ORI_V0,AND_V0 XVMULDP_V0,XVDIVDP_V0,XVMADDADP_V0 LD_V0,LDX_V0,LBZ_V0 B_V0,BC_V0
In the command above, we used the -ig
parameter to specify 4 instruction
groups (4 groups of comma-separated instructions names). Without any other
restrictions the number of combinations to generate is 1771561.
It is better to constraint the design space and discard the sequences that we know that will not be useful for our study. First, let’s put some maximum to the number of instructions of each group we want in the sequence. Let’s say that we want a maximum of 3 fix point instructions, 2 vector instructions, 2 load instructions and 1 branch instruction per sequence. We can specify that using the following command:
> mp_seq -T power_v300-power9-ppc64_linux_gcc -p -s -D <output_dir> -is 6 -ig ADDI_V0,ORI_V0,AND_V0 XVMULDP_V0,XVDIVDP_V0,XVMADDADP_V0 LD_V0,LDX_V0,LBZ_V0 B_V0,BC_V0 -gM 3 2 2 1
That will reduce the number of combinations to 532170. We used the -gM
parameter to specify a list of maximum instructions per group (3 2 2 1
).
Similarly, we can constrain further the design space by specifying that we need at least one fixed point, one vector, one load and one branch instruction per sequence. We can specify that using the following command:
> mp_seq -T power_v300-power9-ppc64_linux_gcc -p -s -D <output_dir> -is 6 -ig ADDI_V0,ORI_V0,AND_V0 XVMULDP_V0,XVDIVDP_V0,XVMADDADP_V0 LD_V0,LDX_V0,LBZ_V0 B_V0,BC_V0 -gM 3 2 2 1 -gm 1 1 1 1
and the number of combinations is reduced to 320760 combinations.
We used the -gm
parameter with to specify of list of maximum instructions
per group (1 1 1 1
).
Finally, we control further the number of combinations to generate. Let’s say that we want the branches to be placed at the end of the sequence (position 6) and we want the vector instructions to be placed only at positions 1 or 4 of the sequence. We can specify that using the following command:
> mp_seq -T power_v300-power9-ppc64_linux_gcc -p -s -D <output_dir> -is 6 -ig ADDI_V0,ORI_V0,AND_V0 XVMULDP_V0,XVDIVDP_V0,XVMADDADP_V0 LD_V0,LDX_V0,LBZ_V0 B_V0,BC_V0 -gM 3 2 2 1 -gm 1 1 1 1 -im 1,2,3 1,3 1,3 1,2,3 1,3 4
We used the -im
parameter to specify a list of the instructions groups
allowed on each sequence slot (1,2,3 1,3 1,3 1,2,3 1,3 4
). Notice that
group 1 (fixed point instruction) is allowed at positions 1,2,3,4,5 ;
the group 2 (vector instructions) is allowed at positions 1 and 4; and so on.
This instruction mask reduces the number of sequences to 12636.
This example showed you how to constraint the number of combination to your
needs. Notice that we use the -p
flags to generate the microbenchmarks in
parallel, and the -s
flag to skip the benchmarks if they are already
generated.
Note
One can use the -N
flag to check the sequence definition and the number
of sequences that are going to be generated before starting the generation
process.