src.llm.pattern_detection.aho_corasick_normalized.AhoCorasickAutomatonNormalized
A wrapper for normalized pattern matching using the Aho-Corasick algorithm.
This class normalizes patterns by removing whitespace variations before building the underlying Aho-Corasick automaton. This allows for pattern matching that is insensitive to whitespace differences.
Attributes:
Name | Type | Description |
---|---|---|
`normalized_patterns` |
Dictionary mapping pattern names to their normalized forms. |
|
`pattern_lengths` |
Dictionary storing the lengths of normalized patterns. |
|
`automaton` |
The underlying AhoCorasickAutomaton instance. |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
patterns
|
Dict[str, str]
|
Dictionary mapping pattern names to their original string patterns. |
required |
Source code in src/llm/pattern_detection/aho_corasick_normalized.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
get_pattern_length(pattern_name)
Returns the length of a normalized pattern.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`pattern_name`
|
The name of the pattern whose length is required. |
required |
Returns:
Type | Description |
---|---|
int
|
The length of the normalized pattern as an integer. |
Raises:
Type | Description |
---|---|
`KeyError`
|
Error raised if the |
Source code in src/llm/pattern_detection/aho_corasick_normalized.py
70 71 72 73 74 75 76 77 78 79 80 81 82 |
|
reset_state()
Resets the automaton to its initial state.
Should be called before starting a new search if the automaton has been used previously.
Source code in src/llm/pattern_detection/aho_corasick_normalized.py
41 42 43 44 45 46 47 |
|
search_chunk(norm_chunk)
Searches for pattern matches in normalized text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
`norm_chunk`
|
The normalized text chunk to search in. Should be pre-normalized before calling this method. |
required |
Returns:
Type | Description |
---|---|
List[Tuple[int, str]]
|
A list of tuples, where each tuple contains:
|
automaton = AhoCorasickAutomatonNormalized({'pat1': 'hello world'})
matches = automaton.search_chunk('helloworld')
print(len(matches)) # 1
Source code in src/llm/pattern_detection/aho_corasick_normalized.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|