KnowGL Knowledge Extractor¶

The knowgl-large model is trained by combining Wikidata with an extended version of the training data in the REBEL dataset. Given a sentence, KnowGL generates triple(s) in the following format:

[(subject mention # subject label # subject type) | relation label | (object mention # object label # object type)]

If there are more than one triples generated, they are separated by $ in the output. The model achieves state-of-the-art results for relation extraction on the REBEL dataset. The generated labels (for the subject, relation, and object) and their types can be directly mapped to Wikidata IDs associated with them.

This KnowledgeExtractor does not use any entity/relation pre-defined.

Bases: KnowledgeExtractor

Instantiate the KnowGL Knowledge Extractor

`load_models()` ¶

Load KnowGL model

`parse_result(result, doc, encodings)` ¶

Parse the text result into a list of triples

Parameters:

Name	Type	Description	Default
`result`	`str`	Text generate by the KnowGL model	required
`doc`	`Doc`	Spacy doc	required
`encodings`	`Encoding`	Encodings result of the tokenization	required

Returns:

Type	Description
`List[Tuple[Span, RelationSpan, Span]]`	List of triples (subject, relation, object)

`predict(docs, batch_size=None)` ¶

Extract triples from docs

Parameters:

Name	Type	Description	Default
`docs`	`Iterator[Doc]`	Spacy Docs to process	required
`batch_size`	`Optional[Union[int, None]]`	Batch size for processing	`None`

Returns:

Type	Description
`List[List[Tuple[Span, RelationSpan, Span]]]`	Triples (subject, relation, object) extracted for each document

KnowGL Knowledge Extractor¶

load_models() ¶

parse_result(result, doc, encodings) ¶

predict(docs, batch_size=None) ¶

`load_models()` ¶

`parse_result(result, doc, encodings)` ¶

`predict(docs, batch_size=None)` ¶