MergeKit

`aisteer360.algorithms.structural_control.wrappers.mergekit`

`args`

`control`

`MergeKit`

Bases: StructuralControl

Wrapper for merging models via MergeKit https://github.com/arcee-ai/mergekit.

MergeKit combines multiple language models using various merge strategies like linear interpolation, SLERP, and TIES. This wrapper integrates MergeKit's functionality to enable structural control through model composition.

The process involves loading a merge configuration (from YAML or dict), executing the merge operation, and optionally loading the resulting merged model. Supports caching to avoid redundant operations.

Parameters:

Name	Type	Description	Default
`config_path`	`str`	Path to YAML merge configuration file. Defaults to None.	required
`config_dict`	`dict`	Dictionary merge configuration. Defaults to None.	required
`out_path`	`str`	Output directory for merged model.	required
`load_merged`	`bool`	Whether to load merged model after merging. Defaults to True.	required
`force_remerge`	`bool`	Force remerge even if output exists. Defaults to False.	required
`allow_cuda`	`bool`	Use CUDA acceleration if available. Defaults to True.	required
`device_map`	`str \| dict`	Device mapping for model loading. Defaults to None.	required
`trust_remote_code`	`bool`	Trust remote code when loading. Defaults to False.	required
`dtype`	`str`	PyTorch dtype for loading. Defaults to "float16".	required

Reference:

"Arcee's MergeKit: A Toolkit for Merging Large Language Models" Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz https://aclanthology.org/2024.emnlp-industry.36

Source code in aisteer360/algorithms/structural_control/wrappers/mergekit/control.py

class MergeKit(StructuralControl):
    """
    Wrapper for merging models via MergeKit [https://github.com/arcee-ai/mergekit](https://github.com/arcee-ai/mergekit).

    MergeKit combines multiple language models using various merge strategies like linear interpolation, SLERP, and
    TIES. This wrapper integrates MergeKit's functionality to enable structural control through model composition.

    The process involves loading a merge configuration (from YAML or dict), executing the merge operation, and
    optionally loading the resulting merged model. Supports caching to avoid redundant operations.

    Args:
        config_path (str, optional): Path to YAML merge configuration file. Defaults to None.
        config_dict (dict, optional): Dictionary merge configuration. Defaults to None.
        out_path (str): Output directory for merged model.
        load_merged (bool): Whether to load merged model after merging. Defaults to True.
        force_remerge (bool): Force remerge even if output exists. Defaults to False.
        allow_cuda (bool): Use CUDA acceleration if available. Defaults to True.
        device_map (str | dict, optional): Device mapping for model loading. Defaults to None.
        trust_remote_code (bool): Trust remote code when loading. Defaults to False.
        dtype (str): PyTorch dtype for loading. Defaults to "float16".

    Reference:

    - "Arcee's MergeKit: A Toolkit for Merging Large Language Models"
      Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vladimir Karpukhin, Brian Benedict,
      Mark McQuade, Jacob Solawetz
      [https://aclanthology.org/2024.emnlp-industry.36](https://aclanthology.org/2024.emnlp-industry.36)
    """

    Args = MergeKitArgs

    def steer(
            self,
            model: PreTrainedModel,
            tokenizer: PreTrainedTokenizer = None,
            **_
    ):
        """Execute model merging via MergeKit and optionally return the merged model.

        Performs structural steering by merging multiple models according to a configuration file or dictionary.
        Supports caching to avoid redundant merge operations and can either return the merged model or the original
        model based on configuration.

        The method follows this logic:

        1. Load merge configuration from YAML file or dictionary
        2. Check if merged model already exists (skip if `force_remerge=False`)
        3. Execute merge if needed using MergeKit
        4. Optionally load and return the merged model

        Args:
            model (PreTrainedModel): The base model (potentially unused depending on the method).
            tokenizer (PreTrainedTokenizer, optional): Base tokenizer (currently unused).
            **_: Additional arguments (ignored).

        Returns:
            PreTrainedModel: Either the merged model (if `load_merged=True`) or the original model. When returning
            merged model, attempts to attach a new tokenizer if one was created during merging.

        Note:

        - If out_path exists and `force_remerge=False`, skips merging and loads cached result
        - Merged model saved to `out_path` directory with full weights and config
        - If `load_merged=False`, performs merge but returns original model
        """
        args: MergeKitArgs = self.args

        if args.config_path:
            config = mk_config.MergeConfiguration.from_yaml(args.config_path)
        else:
            config = mk_config.MergeConfiguration(**args.config_dict)

        # find merged weights
        out_path = Path(args.out_path)
        if out_path.exists() and not args.force_remerge:
            if args.load_merged:
                merged = AutoModelForCausalLM.from_pretrained(
                    pretrained_model_name_or_path=str(out_path),
                    device_map=args.device_map,
                    trust_remote_code=args.trust_remote_code,
                    torch_dtype=getattr(torch, args.dtype)
                )
                return merged
            return model

        # merge
        # with FileLock(str(out_path) + ".lock"):
        mk_merge.run_merge(
            merge_config=config,
            out_path=str(out_path),
            options=mk_merge.MergeOptions(
                use_cuda=args.allow_cuda,
                trust_remote_code=args.trust_remote_code,
            )
        )

        # load merged checkpoint (and check if merge returned new tokenizer)
        if args.load_merged:
            merged = AutoModelForCausalLM.from_pretrained(
                out_path,
                torch_dtype=getattr(torch, args.dtype),
                device_map=args.device_map,
                trust_remote_code=args.trust_remote_code,
            )
            try:
                merged.tokenizer = AutoTokenizer.from_pretrained(
                    out_path,
                    trust_remote_code=args.trust_remote_code
                )
            except Exception:
                pass
            return merged

        return model

`args = self.Args.validate(*args, **kwargs)` `instance-attribute`

`enabled = True` `class-attribute` `instance-attribute`

`steer(model, tokenizer=None, **_)`

Execute model merging via MergeKit and optionally return the merged model.

Performs structural steering by merging multiple models according to a configuration file or dictionary. Supports caching to avoid redundant merge operations and can either return the merged model or the original model based on configuration.

The method follows this logic:

Load merge configuration from YAML file or dictionary
Check if merged model already exists (skip if force_remerge=False)
Execute merge if needed using MergeKit
Optionally load and return the merged model

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	The base model (potentially unused depending on the method).	required
`tokenizer`	`PreTrainedTokenizer`	Base tokenizer (currently unused).	`None`
`**_`		Additional arguments (ignored).	`{}`

Returns:

Name	Type	Description
`PreTrainedModel`		Either the merged model (if `load_merged=True`) or the original model. When returning
		merged model, attempts to attach a new tokenizer if one was created during merging.

Note:

If out_path exists and force_remerge=False, skips merging and loads cached result
Merged model saved to out_path directory with full weights and config
If load_merged=False, performs merge but returns original model

Source code in aisteer360/algorithms/structural_control/wrappers/mergekit/control.py

def steer(
        self,
        model: PreTrainedModel,
        tokenizer: PreTrainedTokenizer = None,
        **_
):
    """Execute model merging via MergeKit and optionally return the merged model.

    Performs structural steering by merging multiple models according to a configuration file or dictionary.
    Supports caching to avoid redundant merge operations and can either return the merged model or the original
    model based on configuration.

    The method follows this logic:

    1. Load merge configuration from YAML file or dictionary
    2. Check if merged model already exists (skip if `force_remerge=False`)
    3. Execute merge if needed using MergeKit
    4. Optionally load and return the merged model

    Args:
        model (PreTrainedModel): The base model (potentially unused depending on the method).
        tokenizer (PreTrainedTokenizer, optional): Base tokenizer (currently unused).
        **_: Additional arguments (ignored).

    Returns:
        PreTrainedModel: Either the merged model (if `load_merged=True`) or the original model. When returning
        merged model, attempts to attach a new tokenizer if one was created during merging.

    Note:

    - If out_path exists and `force_remerge=False`, skips merging and loads cached result
    - Merged model saved to `out_path` directory with full weights and config
    - If `load_merged=False`, performs merge but returns original model
    """
    args: MergeKitArgs = self.args

    if args.config_path:
        config = mk_config.MergeConfiguration.from_yaml(args.config_path)
    else:
        config = mk_config.MergeConfiguration(**args.config_dict)

    # find merged weights
    out_path = Path(args.out_path)
    if out_path.exists() and not args.force_remerge:
        if args.load_merged:
            merged = AutoModelForCausalLM.from_pretrained(
                pretrained_model_name_or_path=str(out_path),
                device_map=args.device_map,
                trust_remote_code=args.trust_remote_code,
                torch_dtype=getattr(torch, args.dtype)
            )
            return merged
        return model

    # merge
    # with FileLock(str(out_path) + ".lock"):
    mk_merge.run_merge(
        merge_config=config,
        out_path=str(out_path),
        options=mk_merge.MergeOptions(
            use_cuda=args.allow_cuda,
            trust_remote_code=args.trust_remote_code,
        )
    )

    # load merged checkpoint (and check if merge returned new tokenizer)
    if args.load_merged:
        merged = AutoModelForCausalLM.from_pretrained(
            out_path,
            torch_dtype=getattr(torch, args.dtype),
            device_map=args.device_map,
            trust_remote_code=args.trust_remote_code,
        )
        try:
            merged.tokenizer = AutoTokenizer.from_pretrained(
                out_path,
                trust_remote_code=args.trust_remote_code
            )
        except Exception:
            pass
        return merged

    return model

MergeKit

aisteer360.algorithms.structural_control.wrappers.mergekit

args

control

MergeKit

args = self.Args.validate(*args, **kwargs) instance-attribute

enabled = True class-attribute instance-attribute

steer(model, tokenizer=None, **_)

`aisteer360.algorithms.structural_control.wrappers.mergekit`

`args`

`control`

`MergeKit`

`args = self.Args.validate(*args, **kwargs)` `instance-attribute`

`enabled = True` `class-attribute` `instance-attribute`

`steer(model, tokenizer=None, **_)`