Skip to content

KnowGL Knowledge Extractor

The knowgl-large model is trained by combining Wikidata with an extended version of the training data in the REBEL dataset. Given a sentence, KnowGL generates triple(s) in the following format:

[(subject mention # subject label # subject type) | relation label | (object mention # object label # object type)]
If there are more than one triples generated, they are separated by $ in the output. The model achieves state-of-the-art results for relation extraction on the REBEL dataset. The generated labels (for the subject, relation, and object) and their types can be directly mapped to Wikidata IDs associated with them.

This KnowledgeExtractor does not use any entity/relation pre-defined.

Bases: KnowledgeExtractor

Instantiate the KnowGL Knowledge Extractor

load_models()

Load KnowGL model

parse_result(result, doc, encodings)

Parse the text result into a list of triples

Parameters:

Name Type Description Default
result str

Text generate by the KnowGL model

required
doc Doc

Spacy doc

required
encodings Encoding

Encodings result of the tokenization

required

Returns:

Type Description
List[Tuple[Span, RelationSpan, Span]]

List of triples (subject, relation, object)

predict(docs, batch_size=None)

Extract triples from docs

Parameters:

Name Type Description Default
docs Iterator[Doc]

Spacy Docs to process

required
batch_size Optional[Union[int, None]]

Batch size for processing

None

Returns:

Type Description
List[List[Tuple[Span, RelationSpan, Span]]]

Triples (subject, relation, object) extracted for each document