MergeKit
aisteer360.algorithms.structural_control.wrappers.mergekit
args
control
MergeKit
Bases: StructuralControl
Wrapper for merging models via MergeKit https://github.com/arcee-ai/mergekit.
MergeKit combines multiple language models using various merge strategies like linear interpolation, SLERP, and TIES. This wrapper integrates MergeKit's functionality to enable structural control through model composition.
The process involves loading a merge configuration (from YAML or dict), executing the merge operation, and optionally loading the resulting merged model. Supports caching to avoid redundant operations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_path
|
str
|
Path to YAML merge configuration file. Defaults to None. |
required |
config_dict
|
dict
|
Dictionary merge configuration. Defaults to None. |
required |
out_path
|
str
|
Output directory for merged model. |
required |
load_merged
|
bool
|
Whether to load merged model after merging. Defaults to True. |
required |
force_remerge
|
bool
|
Force remerge even if output exists. Defaults to False. |
required |
allow_cuda
|
bool
|
Use CUDA acceleration if available. Defaults to True. |
required |
device_map
|
str | dict
|
Device mapping for model loading. Defaults to None. |
required |
trust_remote_code
|
bool
|
Trust remote code when loading. Defaults to False. |
required |
dtype
|
str
|
PyTorch dtype for loading. Defaults to "float16". |
required |
Reference:
- "Arcee's MergeKit: A Toolkit for Merging Large Language Models" Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz https://aclanthology.org/2024.emnlp-industry.36
Source code in aisteer360/algorithms/structural_control/wrappers/mergekit/control.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
args = self.Args.validate(*args, **kwargs)
instance-attribute
enabled = True
class-attribute
instance-attribute
steer(model, tokenizer=None, **_)
Execute model merging via MergeKit and optionally return the merged model.
Performs structural steering by merging multiple models according to a configuration file or dictionary. Supports caching to avoid redundant merge operations and can either return the merged model or the original model based on configuration.
The method follows this logic:
- Load merge configuration from YAML file or dictionary
- Check if merged model already exists (skip if
force_remerge=False
) - Execute merge if needed using MergeKit
- Optionally load and return the merged model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model
|
PreTrainedModel
|
The base model (potentially unused depending on the method). |
required |
tokenizer
|
PreTrainedTokenizer
|
Base tokenizer (currently unused). |
None
|
**_
|
Additional arguments (ignored). |
{}
|
Returns:
Name | Type | Description |
---|---|---|
PreTrainedModel |
Either the merged model (if |
|
merged model, attempts to attach a new tokenizer if one was created during merging. |
Note:
- If out_path exists and
force_remerge=False
, skips merging and loads cached result - Merged model saved to
out_path
directory with full weights and config - If
load_merged=False
, performs merge but returns original model
Source code in aisteer360/algorithms/structural_control/wrappers/mergekit/control.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|