Skip to content

KnowGL Knowledge Extractor

The knowgl-large model is trained by combining Wikidata with an extended version of the training data in the REBEL dataset. Given a sentence, KnowGL generates triple(s) in the following format:

[(subject mention # subject label # subject type) | relation label | (object mention # object label # object type)]
If there are more than one triples generated, they are separated by $ in the output. The model achieves state-of-the-art results for relation extraction on the REBEL dataset. The generated labels (for the subject, relation, and object) and their types can be directly mapped to Wikidata IDs associated with them.

This KnowledgeExtractor does not use any entity/relation pre-defined.

Bases: KnowledgeExtractor

Instantiate the KnowGL Knowledge Extractor


Load KnowGL model

parse_result(result, doc, encodings)

Parse the text result into a list of triples


Name Type Description Default
result str

Text generate by the KnowGL model

doc Doc

Spacy doc

encodings Encoding

Encodings result of the tokenization



Type Description
List[Tuple[Span, RelationSpan, Span]]

List of triples (subject, relation, object)

predict(docs, batch_size=None)

Extract triples from docs


Name Type Description Default
docs Iterator[Doc]

Spacy Docs to process

batch_size Optional[Union[int, None]]

Batch size for processing



Type Description
List[List[Tuple[Span, RelationSpan, Span]]]

Triples (subject, relation, object) extracted for each document