Skip to content

MergeKit

aisteer360.algorithms.structural_control.wrappers.mergekit

args

control

MergeKit

Bases: StructuralControl

Wrapper for merging models via MergeKit https://github.com/arcee-ai/mergekit.

MergeKit combines multiple language models using various merge strategies like linear interpolation, SLERP, and TIES. This wrapper integrates MergeKit's functionality to enable structural control through model composition.

The process involves loading a merge configuration (from YAML or dict), executing the merge operation, and optionally loading the resulting merged model. Supports caching to avoid redundant operations.

Parameters:

Name Type Description Default
config_path str

Path to YAML merge configuration file. Defaults to None.

required
config_dict dict

Dictionary merge configuration. Defaults to None.

required
out_path str

Output directory for merged model.

required
load_merged bool

Whether to load merged model after merging. Defaults to True.

required
force_remerge bool

Force remerge even if output exists. Defaults to False.

required
allow_cuda bool

Use CUDA acceleration if available. Defaults to True.

required
device_map str | dict

Device mapping for model loading. Defaults to None.

required
trust_remote_code bool

Trust remote code when loading. Defaults to False.

required
dtype str

PyTorch dtype for loading. Defaults to "float16".

required

Reference:

  • "Arcee's MergeKit: A Toolkit for Merging Large Language Models" Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz https://aclanthology.org/2024.emnlp-industry.36
Source code in aisteer360/algorithms/structural_control/wrappers/mergekit/control.py
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
class MergeKit(StructuralControl):
    """
    Wrapper for merging models via MergeKit [https://github.com/arcee-ai/mergekit](https://github.com/arcee-ai/mergekit).

    MergeKit combines multiple language models using various merge strategies like linear interpolation, SLERP, and
    TIES. This wrapper integrates MergeKit's functionality to enable structural control through model composition.

    The process involves loading a merge configuration (from YAML or dict), executing the merge operation, and
    optionally loading the resulting merged model. Supports caching to avoid redundant operations.

    Args:
        config_path (str, optional): Path to YAML merge configuration file. Defaults to None.
        config_dict (dict, optional): Dictionary merge configuration. Defaults to None.
        out_path (str): Output directory for merged model.
        load_merged (bool): Whether to load merged model after merging. Defaults to True.
        force_remerge (bool): Force remerge even if output exists. Defaults to False.
        allow_cuda (bool): Use CUDA acceleration if available. Defaults to True.
        device_map (str | dict, optional): Device mapping for model loading. Defaults to None.
        trust_remote_code (bool): Trust remote code when loading. Defaults to False.
        dtype (str): PyTorch dtype for loading. Defaults to "float16".

    Reference:

    - "Arcee's MergeKit: A Toolkit for Merging Large Language Models"
      Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict,
      Mark McQuade, Jacob Solawetz
      [https://aclanthology.org/2024.emnlp-industry.36](https://aclanthology.org/2024.emnlp-industry.36)
    """

    Args = MergeKitArgs

    def steer(
            self,
            model: PreTrainedModel,
            tokenizer: PreTrainedTokenizer = None,
            **_
    ):
        """Execute model merging via MergeKit and optionally return the merged model.

        Performs structural steering by merging multiple models according to a configuration file or dictionary.
        Supports caching to avoid redundant merge operations and can either return the merged model or the original
        model based on configuration.

        The method follows this logic:

        1. Load merge configuration from YAML file or dictionary
        2. Check if merged model already exists (skip if `force_remerge=False`)
        3. Execute merge if needed using MergeKit
        4. Optionally load and return the merged model

        Args:
            model (PreTrainedModel): The base model (potentially unused depending on the method).
            tokenizer (PreTrainedTokenizer, optional): Base tokenizer (currently unused).
            **_: Additional arguments (ignored).

        Returns:
            PreTrainedModel: Either the merged model (if `load_merged=True`) or the original model. When returning
            merged model, attempts to attach a new tokenizer if one was created during merging.

        Note:

        - If out_path exists and `force_remerge=False`, skips merging and loads cached result
        - Merged model saved to `out_path` directory with full weights and config
        - If `load_merged=False`, performs merge but returns original model
        """
        args: MergeKitArgs = self.args

        if args.config_path:
            config = mk_config.MergeConfiguration.from_yaml(args.config_path)
        else:
            config = mk_config.MergeConfiguration(**args.config_dict)

        # find merged weights
        out_path = Path(args.out_path)
        if out_path.exists() and not args.force_remerge:
            if args.load_merged:
                merged = AutoModelForCausalLM.from_pretrained(
                    pretrained_model_name_or_path=str(out_path),
                    device_map=args.device_map,
                    trust_remote_code=args.trust_remote_code,
                    torch_dtype=getattr(torch, args.dtype)
                )
                return merged
            return model

        # merge
        # with FileLock(str(out_path) + ".lock"):
        mk_merge.run_merge(
            merge_config=config,
            out_path=str(out_path),
            options=mk_merge.MergeOptions(
                use_cuda=args.allow_cuda,
                trust_remote_code=args.trust_remote_code,
            )
        )

        # load merged checkpoint (and check if merge returned new tokenizer)
        if args.load_merged:
            merged = AutoModelForCausalLM.from_pretrained(
                out_path,
                torch_dtype=getattr(torch, args.dtype),
                device_map=args.device_map,
                trust_remote_code=args.trust_remote_code,
            )
            try:
                merged.tokenizer = AutoTokenizer.from_pretrained(
                    out_path,
                    trust_remote_code=args.trust_remote_code
                )
            except Exception:
                pass
            return merged

        return model
args = self.Args.validate(*args, **kwargs) instance-attribute
enabled = True class-attribute instance-attribute
steer(model, tokenizer=None, **_)

Execute model merging via MergeKit and optionally return the merged model.

Performs structural steering by merging multiple models according to a configuration file or dictionary. Supports caching to avoid redundant merge operations and can either return the merged model or the original model based on configuration.

The method follows this logic:

  1. Load merge configuration from YAML file or dictionary
  2. Check if merged model already exists (skip if force_remerge=False)
  3. Execute merge if needed using MergeKit
  4. Optionally load and return the merged model

Parameters:

Name Type Description Default
model PreTrainedModel

The base model (potentially unused depending on the method).

required
tokenizer PreTrainedTokenizer

Base tokenizer (currently unused).

None
**_

Additional arguments (ignored).

{}

Returns:

Name Type Description
PreTrainedModel

Either the merged model (if load_merged=True) or the original model. When returning

merged model, attempts to attach a new tokenizer if one was created during merging.

Note:

  • If out_path exists and force_remerge=False, skips merging and loads cached result
  • Merged model saved to out_path directory with full weights and config
  • If load_merged=False, performs merge but returns original model
Source code in aisteer360/algorithms/structural_control/wrappers/mergekit/control.py
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
def steer(
        self,
        model: PreTrainedModel,
        tokenizer: PreTrainedTokenizer = None,
        **_
):
    """Execute model merging via MergeKit and optionally return the merged model.

    Performs structural steering by merging multiple models according to a configuration file or dictionary.
    Supports caching to avoid redundant merge operations and can either return the merged model or the original
    model based on configuration.

    The method follows this logic:

    1. Load merge configuration from YAML file or dictionary
    2. Check if merged model already exists (skip if `force_remerge=False`)
    3. Execute merge if needed using MergeKit
    4. Optionally load and return the merged model

    Args:
        model (PreTrainedModel): The base model (potentially unused depending on the method).
        tokenizer (PreTrainedTokenizer, optional): Base tokenizer (currently unused).
        **_: Additional arguments (ignored).

    Returns:
        PreTrainedModel: Either the merged model (if `load_merged=True`) or the original model. When returning
        merged model, attempts to attach a new tokenizer if one was created during merging.

    Note:

    - If out_path exists and `force_remerge=False`, skips merging and loads cached result
    - Merged model saved to `out_path` directory with full weights and config
    - If `load_merged=False`, performs merge but returns original model
    """
    args: MergeKitArgs = self.args

    if args.config_path:
        config = mk_config.MergeConfiguration.from_yaml(args.config_path)
    else:
        config = mk_config.MergeConfiguration(**args.config_dict)

    # find merged weights
    out_path = Path(args.out_path)
    if out_path.exists() and not args.force_remerge:
        if args.load_merged:
            merged = AutoModelForCausalLM.from_pretrained(
                pretrained_model_name_or_path=str(out_path),
                device_map=args.device_map,
                trust_remote_code=args.trust_remote_code,
                torch_dtype=getattr(torch, args.dtype)
            )
            return merged
        return model

    # merge
    # with FileLock(str(out_path) + ".lock"):
    mk_merge.run_merge(
        merge_config=config,
        out_path=str(out_path),
        options=mk_merge.MergeOptions(
            use_cuda=args.allow_cuda,
            trust_remote_code=args.trust_remote_code,
        )
    )

    # load merged checkpoint (and check if merge returned new tokenizer)
    if args.load_merged:
        merged = AutoModelForCausalLM.from_pretrained(
            out_path,
            torch_dtype=getattr(torch, args.dtype),
            device_map=args.device_map,
            trust_remote_code=args.trust_remote_code,
        )
        try:
            merged.tokenizer = AutoTokenizer.from_pretrained(
                out_path,
                trust_remote_code=args.trust_remote_code
            )
        except Exception:
            pass
        return merged

    return model