MentionsExtractor¶

The mentions extractor will detect the possible entities (a.k.a. mentions), that will be then linked to a data source (e.g.: Wikidata) by the linker.

Currently, there are 7 different mentions extractors supported, 2 of them are based on SpaCy, 2 of them are based on Flair, TARS, SMXM and GLiNER. The two different versions for SpaCy and Flair are similar, one is based on NERC and the other one is based on the linguistics (i.e.: using PoS and DP). The TARS and SMXM models can be used when the user wants to specify the mentions wanted to be extracted.

The NERC approach will use NERC models to detect all the entities that have to be linked. This approach depends on the model that is being used, and the entities the model has been trained on, so depending on the use case and the target entities it may be not the best approach, as the entities may be not recognized by the NERC model and thus won't be linked.

The linguistic approach relies on the idea that mentions will usually be a syntagma or a noun. Therefore, this approach detects nouns that are included in a syntagma and that act like objects, subjects, etc. This approach do not depend on the model (although the performance does), but a noun in a text should be always a noun, it doesn't depend on the dataset the model has been trained on.

The SMXM model uses the description of the mentions to give the model information about them.

TARS model will use the labels of the mentions to detect them.

The GLiNER model will use the labels of the mentions to detect them.

Bases: ABC

`extract_mentions(docs, batch_size=None)` ¶

Perform the mentions extraction. Call the predict function and add the mentions to the Spacy Doc

Parameters:

Name	Type	Description	Default
`docs`	`Iterator[Doc]`	A list of spacy Document	required
`batch_size`		The batch size	`None`

Returns:

Type	Description

`load_models()` ¶

Load the model

Returns:

Type	Description

`predict(docs, batch_size=None)` `abstractmethod` ¶

Perform the mentions prediction

Parameters:

Name	Type	Description	Default
`docs`	`Iterator[Doc]`	A list of spacy Document	required
`batch_size`		The batch size	`None`

Returns:

Type	Description
`List[List[Span]]`

`set_device(device)` ¶

Set the device to use

Parameters:

Name	Type	Description	Default
`device`	`Union[str, device]`		required

Returns:

Type	Description

`set_kg(mentions)` ¶

Set entities that mention extractor can use

Parameters:

Name	Type	Description	Default
`mentions`	`Iterator[Entity]`	The list of entities	required

MentionsExtractor¶

extract_mentions(docs, batch_size=None) ¶

load_models() ¶

predict(docs, batch_size=None) abstractmethod ¶

set_device(device) ¶

set_kg(mentions) ¶

`extract_mentions(docs, batch_size=None)` ¶

`load_models()` ¶

`predict(docs, batch_size=None)` `abstractmethod` ¶

`set_device(device)` ¶

`set_kg(mentions)` ¶