Skip to content

Tutorial

Welcome to the KIF tutorial!

Before we start, let's use the command-line to create and activate a fresh Python virtual environment:

$ python -m env tutorial
$ source tutorial/bin/activate
(tutorial) $

Let's now use pip to install the latest release of kif-lib (with KIF CLI support) and then start an interactive Python session:

(tutorial) $ pip install kif-lib[cli]
(tutorial) $ python
>>>

The rest of the tutorial will take place inside this interactive session. From now on, we will omit the session prompt string (>>>).

1 First steps

We start by importing the Store constructor from the KIF library:

from kif_lib import Store

We'll also need the Wikidata vocabulary module wd:

from kif_lib.vocabulary import wd

Now, let's create a KIF store pointing to the official Wikidata query service:

kb = Store('wikidata')

Note

Wikidata is a general-purpose open knowledge graph maintained by the Wikimedia Foundation. It can be thought of as a structured version of Wikipedia.

A KIF store is an interface to a knowledge source. It allows us to query the source as if it were Wikidata and obtain Wikidata-like statements as a result.

The store kb (short for knowledge base) we just created is an interface to Wikidata itself. We can use it, for example, to fetch from Wikidata three statements about Brazil (Q155):

it = kb.filter(subject=wd.Brazil, limit=3)
for stmt in it:
   print(stmt)
# -- output --
# Statement(Item(IRI('http://www.wikidata.org/entity/Q155')), ValueSnak(Property(IRI('http://www.wikidata.org/entity/P47'), ItemDatatype()), Item(IRI('http://www.wikidata.org/entity/Q414'))))
# Statement(Item(IRI('http://www.wikidata.org/entity/Q155')), ValueSnak(Property(IRI('http://www.wikidata.org/entity/P571'), TimeDatatype()), Time(datetime.datetime(1822, 9, 7, 0, 0, tzinfo=datetime.timezone.utc), 11, 0, Item(IRI('http://www.wikidata.org/entity/Q1985727')))))
# Statement(Item(IRI('http://www.wikidata.org/entity/Q155')), ValueSnak(Property(IRI('http://www.wikidata.org/entity/P37'), ItemDatatype()), Item(IRI('http://www.wikidata.org/entity/Q5146'))))
$ kif filter --store=wikidata --subject=wd.Brazil --limit=3
(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))
(Statement (Item Brazil) (ValueSnak (Property inception) 7 September 1822))
(Statement (Item Brazil) (ValueSnak (Property official language) (Item Portuguese)))

Note

Most Python examples shown in this tutorial can be reproduced on the command-line using KIF CLI. Click on the "CLI" tab above to see the equivalent KIF CLI shell invocation.

If you run the previous Python code in a Jupyter notebook and use display() instead of print(), then the three statements will be pretty-printed in S-expression format as follows:

(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))
(Statement (Item Brazil) (ValueSnak (Property inception) 7 September 1822))
(Statement (Item Brazil) (ValueSnak (Property official language) (Item Portuguese)))

From now on, we'll use this pretty-printed format to display statements and their components. (Note that KIF CLI uses this format by default.)

A KIF statement represents an assertion and consists of two parts: a subject and a snak. The subject is the entity about which the assertion is made, while the snak (or predication) is what is asserted about the subject. The snak associates a property with a specific value, some value, or no value.

Consider the first statement obtained from the store kb above:

(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))

This statement stands for the assertion "Brazil shares border with Argentina". Its subject is the item Brazil (Q155) and its snak is a value snak which associates property shares border with (P47) with value Argentina (Q414).

2 Data model

KIF data-model objects, such as statements and their components, are built using the data-model object constructors (see Data Model). For instance, we can construct the statement "Brazil shares border with Argentina" as follows:

from kif_lib import Item, Property, Statement, ValueSnak

Brazil = Item('http://www.wikidata.org/entity/Q155')
shares_border_with = Property('http://www.wikidata.org/entity/P47')
Argentina = Item('http://www.wikidata.org/entity/Q414')

stmt = Statement(Brazil, ValueSnak(shares_border_with, Argentina))
print(stmt)

(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))

Alternatively, we can apply the property object shares_border_with as if it were a Python function to the arguments Brazil and Argentina to obtain exactly the same statement:

stmt_alt = shares_border_with(Brazil, Argentina)
print(stmt_alt)

(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))

Note

KIF data-model objects are immutable: once constructed, they cannot be changed. Also, data-model object identity is completely determined by the object contents. This means that two data-model objects constructed from the same arguments will always test equal. For example, (stmt == stmt_alt) == True above.

To access the contents of statement objects, we can use the fields subject and snak:

print(stmt.subject, stmt.snak)

(Item Brazil)
(ValueSnak (Property shares border with) (Item Argentina))

Other data-model objects, such as entities, data values, and snaks, have analogous fields (see Data Model):

print(stmt.subject.iri, stmt.snak.property, stmt.snak.value)

http://www.wikidata.org/entity/Q155
(Property shares border with)
(Item Argentina)

2.1 Vocabulary

KIF comes with built-in vocabulary modules that ease the construction of entities in certain namespaces. For instance, instead of writing the full IRI of Wikidata entities, we can use the convenience functions wd.Q and wd.P from the Wikidata vocabulary module wd to construct the items Brazil (Q155) and Argentina (Q414) and the property shares border with (P47):

from kif_lib.vocabulary import wd

Brazil = wd.Q(155)
Argentina = wd.Q(414)
shares_border_with = wd.P(47)

print(Brazil, Argentina, shares_border_with)

(Item Brazil)
(Item Argentina)
(Property shares border with)

As before, we can apply shares_border_with to Brazil and Argentina to obtain the statement "Brazil shares border with Argentina":

stmt = shares_border_with(Brazil, Argentina)
print(stmt)

(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))

The wd vocabulary module also defines symbolic aliases for popular entities. Instead of writing wd.Q(155) and wd.Q(414) for Brazil and Argentina, we can write wd.Brazil and wd.Argentina. Similarly, instead of writing wd.P(47) for "shares border with", we can write wd.shares_border_with. Most Wikidata properties have symbolic aliases defined in wd.

print(wd.Brazil, wd.Argentina, wd.shares_border_with, wd.capital)

(Item Brazil)
(Item Argentina)
(Property shares border with)
(Property capital)

Note

Besides wd, KIF comes with the vocabulary modules db for DBpedia, fg for FactGrid, pc for PubChem, up for UniProt, among others.

3 Store

Let's now turn to the Store API.

As we said earlier, a KIF store is an interface to a knowledge source, typically but not necessarily a knowledge graph. A store is created using the Store constructor which takes as arguments the name of the store plugin to instantiate followed by zero or more arguments to be passed to the plugin. For instance:

kb = Store('wikidata')

This instantiates and assigns to kb a new store using the "wikidata" plugin. This plugin creates a SPARQL store, loads it with the Wikidata SPARQL mappings, and points it at the official Wikidata SPARQL endpoint. (SPARQL is the query language of RDF, a standard format for knowledge graphs; see RDF.)

Alternatively, we could have specified the target SPARQL endpoint explicitly, as the second argument to the Store() call:

kb = Store('wikidata', 'https://query.wikidata.org/sparql')

Note

The available store plugins can be shown using KIF CLI:

$ kif show-plugins --store
...
dbpedia         : DBpedia SPARQL store
europa          : Europa (data.europa.eu) SPARQL store
factgrid        : FactGrid SPARQL store
pubchem         : PubChem SPARQL store
uniprot         : UniProt SPARQL store
wikidata        : Wikidata query service store
...

4 Filter

The basic store operation is the filter.

The call kb.filter(...) searches for statements in kb matching the constraints .... The result is a (lazy) iterator which when advanced produces the matched statements. For example:

kb = Store('wikidata')
it = kb.filter(subject=wd.Alan_Turing)
print(next(it))
$ kif filter --store=wikidata --subject=wd.Alan_Turing

# Note: We can omit --store=wikidata, as it is the default.

(Statement (Item Alan Turing) (ValueSnak (Property doctoral advisor) (Item Alonzo Church)))

If no limit argument is given to filter(), the returned iterator will eventually produce all matching statements. For instance, iterator it above will produce every statement with subject Alan Turing (Q7251) in Wikidata before it is exhausted.

4.1 Basic filters

We can filter statements by specifying any combination of subject, property, value (or snak) to match. None of these are required though. For example:

# (1) Match any statement whatsoever:
it = kb.filter()
print(next(it))

# (2) Match statements with subject "water":
it = kb.filter(subject=wd.water)
print(next(it))

# (3) Match statements with snak "place of birth is Athens":
it = kb.filter(snak=wd.place_of_birth(wd.Athens))
print(next(it))

# (4) Match statements with property "official language":
it = kb.filter(property=wd.official_language)
print(next(it))

# (5) Match statements with value "733 kilograms":
it = kb.filter(value=733@wd.kilogram)
print(next(it))

# (6) Match statements with subject "Brazil" and
#     snak "shares border with Argentina":
it = kb.filter(subject=wd.Brazil, snak=wd.shares_border_with(wd.Argentina))
print(next(it))

# (7) Match statements with subject "Brazil" and
#     snak "shares border with Chile":
it = kb.filter(subject=wd.Brazil, snak=wd.shares_border_with(wd.Chile))
print(next(it)) # *** ERROR: iterator is empty (no such statement) ***
# (1) Match any statement whatsoever:
$ kif filter --limit=1

# (2) Match statements with subject "water":
$ kif filter --subject=wd.water --limit=1

# (3) Match statements with snak "place of birth is Athens":
$ kif filter --snak="wd.place_of_birth(wd.Athens)" --limit=1

# (4) Match statements with property "official language":
$ kif filter --property=wd.official_language --limit=1

# (5) Match statements with value "733 kilograms":
$ kif filter --value=733@wd.kilogram --limit=1

# (6) Match statements with subject "Brazil" and
#     snak "shares border with Argentina":
$ kif filter --subject=wd.Brazil\
    --snak="wd.shares_border_with(wd.Argentina)" --limit=1

# (7) Match statements with subject "Brazil" and
#     snak "shares border with Chile":
$ kif filter --subject=wd.Brazil\
     --snak="wd.shares_border_with(wd.Chile)" --limit=1
# *** no output ***

(1) (Statement (Item lion) (ValueSnak (Property parent taxon) (Item Panthera)))
(2) (Statement (Item water) (ValueSnak (Property chemical formula) "H₂O"))
(3) (Statement (Item Socrates) (ValueSnak (Property place of birth) (Item Athens)))
(4) (Statement (Item Peru) (ValueSnak (Property official language) (Item Spanish)))
(5) (Statement (Item Voyager 1) (ValueSnak (Property mass) 733 kilogram))
(6) (Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))

Note

In example (3) above, wd.place_of_birth(wd.Athens) is another way of constructing the snak ValueSnak(wd.place_of_birth, wd.Athens), while in example (5), 733@wd.kilogram is another way of constructing the quantity value Quantity(733, wd.kilogram) (see Data Model).

In example (7), the filter failed to match any statement in Wikidata, as Brazil does not share a border with Chile. So, the returned iterator is empty and we get a StopIteration exception when we try to advance it.

Alternative subjects, properties, and values can be specified using Python's bitwise "or" operator (|):

# (1) Match statements with subject "Socrates" or "Plato":
it = kb.filter(subject=wd.Socrates|wd.Plato)
print(next(it))

# (2) Match statements with subject "caffeine" and
#     property "density" or "mass" or "pKa":
it = kb.filter(subject=wd.caffeine, property=wd.density|wd.mass|wd.pKa)
print(next(it))

# (3) Match statements with subject "IBM" and
#     value "16 June 1911" or "https://www.ibm.com/":
from kif_lib import IRI, Time
it = kb.filter(
    subject=wd.IBM, value=Time('1911-06-16')|IRI('https://www.ibm.com/'))
for stmt in it:
    print(stmt)
# (1) Match statements with subject "Socrates" or "Plato":
$ kif filter --subject="wd.Socrates|wd.Plato" --limit=1

# (2) Match statements with subject "caffeine" and
#     property "density" or "mass" or "pKa":
$ kif filter --subject=wd.caffeine\
    --property="wd.density|wd.mass|wd.pKa" --limit=1

# (3) Match statements with subject "IBM" and
#     value "16 June 1911" or "https://www.ibm.com/":
$ kif filter --subject=wd.IBM\
    --value="Time('1911-06-16')|IRI('https://www.ibm.com/')" --limit=2

(1 ) (Statement (Item Plato) (ValueSnak (Property notable work) (Item The Republic)))
(2 ) (Statement (Item caffeine) (ValueSnak (Property mass) 194.08037556 dalton))
(3a) (Statement (Item IBM) (ValueSnak (Property official website) https://www.ibm.com/))
(3b) (Statement (Item IBM) (ValueSnak (Property inception) 16 June 1911))

The Or constructor can be used to construct "or" compositions more conveniently from collection of values. For example:

from kif_lib import Or

south_america_countries = [wd.Brazil, wd.Argentina, wd.Uruguay, ...]
it = kb.filter(subject=Or(*south_america_countries), property=wd.capital)
for stmt in it:
    print(stmt)

(Statement (Item Argentina) (ValueSnak (Property capital) (Item Buenos Aires)))
(Statement (Item Uruguay) (ValueSnak (Property capital) (Item Montevideo)))
(Statement (Item Brazil) (ValueSnak (Property capital) (Item Brasília)))

4.2 More complex filters

Suppose we want to match statements whose subjects are any items that "share border with Argentina". If we know the subjects beforehand, we can specify them explicitly using the | operator:

it = kb.filter(subject=wd.Brazil|wd.Uruguay|wd.Chile|...))

Sometimes, however, we do not know the subjects beforehand. In such cases, we can use a snak that captures the desired constraint, such as:

snak = wd.shares_border_with(wd.Argentina)
print(snak)

(ValueSnak (Property shares border with) (Item Argentina))

The filter below matches statements such that the subject is any item that "shares border with Argentina", that is, any item x such that there is a statement wd.shares_border_with(x, wd.Argentina) in the store kb.

it = kb.filter(subject=wd.shares_border_with(wd.Argentina))
for stmt in it:
    print(stmt)
$ kif filter  --subject="wd.shares_border_with(wd.Argentina)"

# Note: The double quotes (") prevent the shell from interpreting the
#       parentheses in the argument of --subject as a subshell invocation.

(Statement (Item Uruguay) (ValueSnak (Property public holiday) (Item Tourism Week)))
(Statement (Item Bolivia) (ValueSnak (Property highest point) (Item Nevado Sajama)))
(Statement (Item Paraguay) (ValueSnak (Property electrical plug type) (Item Europlug)))

These statements may seem random at first but they all have subjects matching the constraint "shares border with Argentina".

Snak constraints such as wd.shares_border_with(wd.Argentina) can be given as subject, property, or value arguments to filter(). Moreover, they can be combined with other constraints using the bitwise "and" (&) and "or" (|) operators (or their prefixed versions And and Or). For example:

# (1) Subject's "anthem is La Marseillaise" and property is "capital":
it = kb.filter(subject=wd.anthem(wd.La_Marseillaise), property=wd.capital)
print(next(it))

# (2) Subject is "water" and property is any "property related to chemistry":
it = kb.filter(subject=wd.water,
    property=wd.instance_of(wd.Wikidata_property_related_to_chemistry))
print(next(it))

# (3) Property is "place of birth" and value is the "capital of Poland":
it = kb.filter(property=wd.place_of_birth, value=wd.capital_of(wd.Poland))
print(next(it))

# (4) Subject's "language is Portuguese" & "shares border with Argentina";
#     Property is "highest point" | "driving side":
it = kb.filter(
    subject=(wd.official_language(wd.Portuguese)&
             wd.shares_border_with(wd.Argentina)),
    property=wd.highest_point|wd.driving_side)
for stmt in it:
    print(stmt)
# (1) Subject's "anthem is La Marseillaise" and property is "capital":
$ kif filter --subject="wd.anthem(wd.La_Marseillaise)"\
    --property=wd.capital --limit=1

# (2) Subject is "water" and property is a "property related to chemistry":
$ kif filter --subject=wd.water\
    --property="wd.instance_of(wd.Wikidata_property_related_to_chemistry)"\
    --limit=1

# (3) Property is "place of birth" and value is the "capital of Poland":
$ kif filter --property=wd.place_of_birth --value="wd.capital_of(wd.Poland)"

# (4) Subject's "language is Portuguese" & "shares border with Argentina";
#     Property is "highest point" | "driving side":
$ kif filter --subject="wd.official_language(wd.Portuguese)&\
                        wd.shares_border_with(wd.Argentina)"\
    --property="wd.highest_point|wd.driving_side"

(1 ) (Statement (Item France) (ValueSnak (Property capital) (Item Paris)))
(2 ) (Statement (Item water) (ValueSnak (Property chemical formula) "H₂O"))
(3 ) (Statement (Item Marie Curie) (ValueSnak (Property place of birth) (Item Warsaw))) (4a) (Statement (Item Brazil) (ValueSnak (Property highest point) (Item Pico da Neblina)))
(4b) (Statement (Item Brazil) (ValueSnak (Property driving side) (Item right)))

Constraints that require traversing property paths of length greater than one can be specified using the sequencing operator /. For example, the filter below matches statements such that:

  • the subject "has some notable work in a collection which is part of the Louvre"; and

  • the property is "handedness".

it = kb.filter(
    subject=(wd.notable_work/wd.collection/wd.part_of)(wd.Louvre_Museum),
    property=wd.handedness)
print(next(it))
$ kif filter\
    --subject="(wd.notable_work/wd.collection/wd.part_of)(wd.Louvre_Museum)"\
    --property=wd.handedness

(Statement (Item Leonardo da Vinci) (ValueSnak (Property handedness) (Item left-handedness)))

4.3 Masks and language

The parameters subject_mask, property_mask, and value_mask of filter() can be used to restrict the kinds of entities, properties, and values to be matched. For example:

from kif_lib import Filter

# (1) Subject is "Louvre Museum" and value is an IRI:
it = kb.filter(subject=wd.Wikidata, value_mask=Filter.IRI)
print(next(it))

# (2) Subject is a property and value is a property:
it = kb.filter(subject_mask=Filter.PROPERTY, value_mask=Filter.PROPERTY)
print(next(it))

# (3) Subject is "El Capitan" and value is an external id or quantity:
it = kb.filter(
    subject=wd.El_Capitan, value_mask=Filter.EXTERNAL_ID|Filter.QUANTITY)
print(next(it))
# (1) Subject is "Louvre Museum" and value is an IRI:
$ kif filter --subject=wd.Wikidata, value-mask=Filter.IRI

# (2) Subject is a property and value is a property:
$ kif filter --subject-mask=Filter.PROPERTY --value-mask=Filter.PROPERTY

# (3) Subject is "El Capitan" and value is an external id or quantity:
$ kif filter --subject=wd.El_Capitan\
    --value-mask="Filter.EXTERNAL_ID|Filter.QUANTITY"

(1 ) (Statement (Item Louvre Museum) (ValueSnak (Property official website) https://www.louvre.fr/))
(2 ) (Statement (Property part of the series) (ValueSnak (Property subproperty of) (Property part of)))
(3a) (Statement (Item El Capitan) (ValueSnak (Property GeoNames ID) "5334090"))
(3b) (Statement (Item El Capitan) (ValueSnak (Property elevation above sea level) 2307 metre))

Note

As illustrated in example (3) above, masks can be operated through the usual bitwise operators. See Filter for the available mask types and values.

Another mask parameter of filter() is snak_mask which determines the kinds of snaks to be matched. So far, we have dealt only with value snaks (ValueSnak), which are essentially property-value pairs. But KIF also supports, some-value snaks (SomeValueSnak) and no-value snaks (NoValueSnak), which carry only the predicated property.

Some-value snaks represent predications with an unknown value, while no-value snaks represent predications with an absent value. For instance, the fact that the Greek poet Homer's place of birth is unknown is represented in Wikidata by the some-value statement:

(Statement (Item Homer) (SomeValueSnak (Property place of birth)))

Similarly, the fact that the natural number 1 has no prime factor is represented by the no-value statement:

(Statement (Item 1) (NoValueSnak (Property prime factor)))

The kinds of snaks to be matched in filters are determined by the parameter snak_mask:

# (1) Subject is "Homer" and snak is some-value:
it = kb.filter(subject=wd.Homer, snak_mask=Filter.SOME_VALUE_SNAK)
print(next(it))

# (2) Subject is "1" and snak is no-value:
it = kb.filter(subject=wd._1, snak_mask=Filter.NO_VALUE_SNAK)
print(next(it))

# (3) Subject is "Adam" and snak is some- or no-value:
it = kb.filter(subject=wd.Adam,
    snak_mask=Filter.SOME_VALUE_SNAK|Filter.NO_VALUE_SNAK, limit=2)
for stmt in it:
    print(stmt)
# (1) Subject is "Homer" and snak is some-value:
$ kif filter --subject=wd.Homer --snak-mask=Filter.SOME_VALUE_SNAK

# (2) Subject is "1" and snak is no-value:
$ kif filter --subject=wd._1 --snak-mask=Filter.NO_VALUE_SNAK

# (3) Subject is "Adam" and snak is some- or no-value:
$ kif filter --subject=wd.Adam
    --snak-mask="Filter.SOME_VALUE_SNAK|Filter.NO_VALUE_SNAK" --limit=2

(1 ) (Statement (Item Homer) (SomeValueSnak (Property place of birth)))
(2 ) (Statement (Item 1) (NoValueSnak (Property prime factor)))
(3a) (Statement (Item Adam) (SomeValueSnak (Property date of birth)))
(3b) (Statement (Item Adam) (NoValueSnak (Property father)))

The last filter parameter we want to mention is language, which controls the language of the returned text values. If language is not given, filter() returns statements with text values in any language:

# Subject is "Mario" and property is "catchphrase":
it = kb.filter(subject=wd.Mario, property=wd.catchphrase)
for stmt in it:
    print(next(it))
$ kif filter --subject=wd.Mario --property=wd.catchphrase

(Statement (Item Mario) (ValueSnak (Property catchphrase) "It’s-a me, Mario!"@en))
(Statement (Item Mario) (ValueSnak (Property catchphrase) "Let’s-a go!"@en-us))
(Statement (Item Mario) (ValueSnak (Property catchphrase) "Mamma mia!"@it))

However, we can set the language parameter to a language tag ("en", "it", "fr", "pt", etc.) to restrict the result to statements with text values in the desired language:

# (1) Subject is "Mario", property is "catchphrase", value is in English:
it = kb.filter(subject=wd.Mario, property=wd.catchphrase, language='en')
print(next(it))

# (2) Subject is "Mario", property is "catchphrase", value is in Italian:
it = kb.filter(subject=wd.Mario, property=wd.catchphrase, language='it')
print(next(it))
# (1) Subject is "Mario", property is "catchphrase", value is in English:
$ kif filter --subject=wd.Mario --property=wd.catchphrase  --language=en

# (2) Subject is "Mario", property is "catchphrase", value is in Italian:
$ kif filter --subject=wd.Mario --property=wd.catchphrase --language=it

(1) (Statement (Item Mario) (ValueSnak (Property catchphrase) "It’s-a me, Mario!"@en)) (2) (Statement (Item Mario) (ValueSnak (Property catchphrase) "Mamma mia!"@it))

4.4 Projection

Sometimes we are interested not in the full statements themselves but in their components. For instance, suppose we want to list "the capital cities of all US states", that is, the entities representing the cities, not statements about them. We can do this by first obtaining all has-capital statements whose subjects are US states and then printing only their value component:

it = kb.filter(subject=wd.instance_of(wd.US_state), property=wd.capital)
for (subj, (prop, value)) in it:
    print(value)

(Item Indianapolis)
(Item Phoenix)
(Item Columbus)
(Item Honolulu)

Note

Every KIF data-model object can be decomposed into a sequence of components. The tuple pattern (subj, (prop, value)) above decomposes the statement into a subject part subj and a value-snak part, which itself is decomposed into a property part prop and a value part value. We could also have written:

for stmt in it:
    print(stmt[1][0])

Or, using the Statement access fields:

for stmt in it:
    print(stmt.snak.value)

See Data Model for details.

A more convenient (and often more efficient) way of achieving the same thing is to use one of the projected variants of filter(). In this case, we can use the variant filter_v(), which selects the value of the matched statements:

it = kb.filter_v(subject=wd.instance_of(wd.US_state), property=wd.capital)
for v in it:
    print(v)
$ kif filter --subject="wd.instance_of(wd.US_state)"
    --property=wd.capital --select=v

# Note: The --select=v option instructs KIF CLI to use the filter_v()
#       variant, that is, select the value of the returned statements.

The full list of projected variants of filter() is shown in the table below. All of these variants accept exactly the same arguments as filter(). The only difference is their return type.

filter variant selects returns
filter_s() subject Entity
filter_p() property Property
filter_v() value Value
filter_sp() subject, property ValuePair
filter_sv() subject, value ValuePair
filter_pv() property, value ValueSnak

5 Pseudo-properties

KIF extends the Wikidata data-model with the notion of pseudo-properties. These are property-like entities which are not represented as properties in Wikidata. For instance, labels, aliases, and descriptions are not represented as properties in Wikidata but are made available in KIF through the pseudo-properties LabelProperty, AliasProperty, DescriptionProperty.

We can use pseudo-properties in filters as if they were regular properties:

from kif_lib import LabelProperty, AliasProperty, DescriptionProperty

# (1) Get the Spanish label of Mars:
it = kb.filter(subject=wd.Mars, property=LabelProperty(), language='es')
print(next(it))

# (2) Get a French alias of Mars:
it = kb.filter(subject=wd.Mars, property=AliasProperty(), language='fr')
print(next(it))

# (3) Get the English description of Mars:
it = kb.filter(subject=wd.Mars, property=DescriptionProperty(), language='en')
print(next(it))
# (1) Get the Spanish label of Mars:
$ kif filter --subject=wd.Mars --property="LabelProperty()" --language=es

# (2) Get a French alias of Mars:
$ kif filter --subject=wd.Mars --property="AliasProperty()" --language=fr

# (3) Get the English description of Mars:
$ kif filter --subject=wd.Mars --property="DescriptionProperty" --language=en

(1) (Statement (Item Mars) (ValueSnak LabelProperty "Marte"@es))
(2) (Statement (Item Mars) (ValueSnak AliasProperty "Planète rouge"@fr))
(3) (Statement (Item Mars) (ValueSnak DescriptionProperty "fourth planet in the Solar System from the Sun"@en))

The Wikidata vocabulary module wd defines the aliases wd.label, wd.alias, wd.description for the pseudo-properties LabelProperty(), AliasProperty(), DescriptionProperty(). These can be used to write less verbose filter calls:

# Get the label, aliases, and description of Mars in Portuguese:
it = kb.filter(
    subject=wd.Mars, property=wd.label|wd.alias|wd.description, language='pt')
for stmt in it:
    print(stmt)
# Get the label, aliases, and description of Mars in Portuguese:
$ kif filter --subject=wd.Mars\
    --property="wd.label|wd.alias|wd.description" --language='pt'

(Statement (Item Mars) (ValueSnak LabelProperty "Marte"@pt))
(Statement (Item Mars) (ValueSnak AliasProperty "planeta Marte"@pt))
(Statement (Item Mars) (ValueSnak DescriptionProperty "quarto planeta a partir do Sol no System Solar"@pt))

Note

In KIF, the entity information comprising labels, aliases, and descriptions are referred to collectively as descriptors. Descriptor values can be registered (saved) in the current KIF context so that, for example, they don't need to be retrieved every time an entity is pretty-printed (see Context). Most vocabulary modules register in the KIF context the English label of the entities they define. This cached label can be conveniently accessed through the label field of item and property objects:

print(wd.Q(111).label, wd.P(2583).label)

"Mars"@en
"distance from Earth"@en

The aliases and description can be accessed through the fields aliases and description, which are usually left undefined by the vocabulary modules:

print(wd.Q(111).aliases, wd.P(2583).aliases)          # None None
print(wd.Q(111).description, wd.P(2583).description)  # None None

KIF can be instructed to retrieve aliases and descriptions automatically using the vocabulary module's resolver. This is done by setting option "entities.resolve" to True in the KIF context:

# Enable automatic descriptor resolution:
from kif_lib import Context
Context.top().options.entities.resolve = True

# (1)
print(wd.Q(111).aliases, wd.P(2583).aliases)

# (2)
print(wd.Q(111).description, wd.P(2583).description)

(1a) {"Planet Mars"@en, "Red Planet"@en}
(1b) {"angular diameter distance"@en, "proper distance"@en, ...}
(2a) "fourth planet in the Solar System from the Sun"@en
(2b) "estimated distance to astronomical objects"@en

Automatic descriptor resolution can also be enabled by setting the value of the environment variable KIF_RESOLVE_ENTITIES to 1 (or any other literal that evaluates to True in Python). See Context for details.

Some KIF pseudo-properties have no direct counterpart in Wikidata. This is the case of the pseudo-properties TypeProperty and SubtypeProperty, whose wd aliases are wd.a and wd.subtype. These pseudo-properties stand for the ontological relations "is a" and "subclass of", respectively, and can be seen as more powerful (transitivity-enabled) versions of the Wikidata properties instance of (P31) and subclass of (P279).

To see the difference between wd.a and wd.instance_of consider the filter below. This filter gets statements that assert the classes of which rabbit (Q9394) is an instance:

# Get the classes such that "rabbit" is an instance of:
it = kb.filter(subject=wd.rabbit, property=wd.instance_of)
for stmt in it:
    print(stmt)
# Get the classes such that "rabbit" is an instance of:
$ kif filter --subject=wd.rabbit --property=wd.instance_of

(Statement (Item rabbit) (ValueSnak (Property instance of) (Item organisms known by a particular common name)))
(Statement (Item rabbit) (ValueSnak (Property instance of) (Item taxon)))

If we now replace wd.instance_of by wd.a, we get three times more results:

# Get the classes such that "rabbit" is an instance of (with transitivity):
it = kb.filter(subject=wd.rabbit, property=wd.a)
for stmt in it:
    print(stmt)
# Get the classes such that "rabbit" is an instance of (with transitivity):
$ kif filter --subject=wd.rabbit --property=wd.a

(Statement (Item rabbit) (ValueSnak TypeProperty (Item group or class of living things)))
(Statement (Item rabbit) (ValueSnak TypeProperty (Item organisms known by a particular common name)))
(Statement (Item rabbit) (ValueSnak TypeProperty (Item group or class of physical objects)))
(Statement (Item rabbit) (ValueSnak TypeProperty (Item collective entity)))
(Statement (Item rabbit) (ValueSnak TypeProperty (Item taxon)))
(Statement (Item rabbit) (ValueSnak TypeProperty (Item entity)))

What is happening here is that the latter version considers as classes such that rabbit is an instance not only organisms known by a particular common name (Q55983715) and taxon (Q16521) but also any super-classes of these two. In other words, while wd.instance_of looks only to the immediate class, wd.a traverses the whole class hierarchy.

The pseudo-property wd.subtype, which is the transitive counterpart of wd.subclass_of, behaves similarly:

# (1) Get the subclasses of "mammal":
it = kif.filter(subject=wd.mammal, --property=wd.subclass_of)
for stmt in it:
    print(stmt)

# (2) Get the subclasses of "mammal" (with transitivity):
for stmt in it:
    print(stmt)
# (1) Get the subclasses of "mammal":
$ kif filter --subject=wd.mammal --property=wd.subclass_of

# (2) Get the subclasses of "mammal" (with transitivity):
$ kif filter --subject=wd.mammal --property=wd.subtype

(1 ) (Statement (Item mammal) (ValueSnak (Property subclass of) (Item Vertebrata)))
(2a) (Statement (Item mammal) (ValueSnak SubtypeProperty (Item animal)))
(2b) (Statement (Item mammal) (ValueSnak SubtypeProperty (Item Vertebrata)))
(2c) (Statement (Item mammal) (ValueSnak SubtypeProperty (Item organism)))

We remark that pseudo-properties can occur in any place where a regular property is expected. That is, they can be used to construct snak constraints, paths, etc. For example:

# (1) Get the "breeds" of all "fictional dogs":
it = kb.filter_v(subject=(wd.a/wd.fictional_or_mythical_analog_of)(wd.dog),
    property=wd.animal_breed)
for value in it:
    print(value)

# (2) Get the subject and value of "instance of" statements such that
#     the subject has the Portuguese label "laranja":
it = kb.filter_sv(subject=wd.label(Text("laranja", "pt")),
    property=wd.instance_of)
for pair in it:
    print(pair)
# (1) Get the breeds of all fictional dogs:
$ kif filter --subject="(wd.a/wd.fictional_or_mythical_analog_of)(wd.dog)"\
    --property=wd.animal_breed --select=v

# (2) Get the subject and value of instance-of statements such that
#     the subject has label "laranja"@pt:
$ kif filter --subject="wd.label(Text('laranja', 'pt'))"\
    --property=wd.instance_of  --select=sv

(1a) (Item Welsh Corgi)
(1b) (Item pit bull)
(1c) (Item collie)

(2a) (ValuePair (Item orange) (Item secondary color))
(2b) (ValuePair (Item orange) (Item web color))
(2c) (ValuePair (Item orange) (Item spectral color))

6 Statement annotations

Up to now, we have dealt only with plain statements (Statement). These are statements consisting of a subject, a snak, and nothing else. In KIF, statements can also carry extra information referred to collectively as annotations. These statements with annotations (or annotated statements, see AnnotatedStatement) behave exactly as plain statements but besides a subject and a snak also carry a set of qualifiers, a set of reference records, and a rank. The qualifiers qualify the statement assertion, the reference records contain provenance information, and the rank indicates the quality of the statement.

The boolean parameter annotated instructs the filter() method to obtain the annotations associated with each returned statement. The variant filter_annotated() can also be used to filter annotated statements. It behaves exactly as filter() with the annotated flag set to True but its return type is AnnotatedStatement instead of Statement.

The difference between filter() and filter_annotated() is illustrated in examples (1) and (2) below:

# (1) Get the "density" of "benzene":
it = kb.filter(subject=wd.benzene, property=wd.density)
print(next(it))

# (2) Get the "density" of "benzene" (with annotations):
it = kb.filter_annotated(subject=wd.benzene, property=wd.density)
print(next(it))
# (1) Get the "density" of "benzene":
$ kif filter --subject=wd.benzene --property=wd.density

# (2) Get the "density" of "benzene" (with annotations):
$ kif filter --subject=wd.benzene --property=wd.density --annotated

# Note: The --annotated flag instructs KIF CLI to fetch annotations.

(1) (Statement (Item benzene) (ValueSnak (Property density) 0.88 ±0.01 gram per cubic centimetre))
(2) (AnnotatedStatement (Item benzene) (ValueSnak (Property density) 0.88 ±0.01 gram per cubic centimetre)
 (QualifierRecord
  (ValueSnak (Property temperature) 20 ±1 degree Celsius)
  (ValueSnak (Property phase of matter) (Item liquid)))
 (ReferenceRecordSet
   (ReferenceRecord
    (ValueSnak (Property HSDB ID) "35#section=TSCA-Test-Submissions")
    (ValueSnak (Property stated in) (Item Hazardous Substances Data Bank))))
NormalRank)

Both statements, (1) and (2), assert that "benzene's density is 0.88±0.01 g/cm". But statement (2), which is annotated, carries more information. Its qualifier record qualifies the assertion, i.e., says in addition that this is the case when "the temperature is 20±1 ℃" and "the phase of matter is liquid". Its reference record set contains a single reference record which indicates the provenance of the statement, namely, the entry with the given HSDB ID in the Hazardous Substances Data Bank. Finally, its rank is "normal" which is the default one and means that its status is neutral (neither preferred nor deprecated).

Note

The QualifierRecord is essentially a set of snaks, while the reference record set is a set of ReferenceRecord objects, each of which is itself a snak set. Element repetition and ordering within these set objects are immaterial. See Data Model for details.

Note

The filter parameter rank_mask can be used to match statements with a given rank (NormalRank, PreferredRank, DeprecatedRank; see Data Model).

As plain statements, annotated statements can be constructed directly using data-model object constructors. For instance, stmt2a and stmt2b below correspond exactly to statement (2) above.

from kif_lib import (AnnotatedStatement, NormalRank,
                     QualifierRecord, Quantity,
                     ReferenceRecord, ReferenceRecordSet, ValueSnak)

stmt2a = AnnotatedStatement(
    wd.benzene,
    ValueSnak(
        wd.density,
        Quantity('.88', wd.gram_per_cubic_centimetre, '.87', '.89')),
    QualifierRecord(
        wd.temperature(Quantity(20, wd.degree_Celsius, 19, 21)),
        wd.phase_of_matter(wd.liquid)),
    ReferenceRecordSet(
        ReferenceRecord(
            wd.HSDB_ID('35#section=TSCA-Test-Submissions'),
        wd.stated_in(wd.Hazardous_Substances_Data_Bank))),
    NormalRank())

stmt2b = wd.density(
    wd.benzene, Quantity('.88', wd.gram_per_cubic_centimetre, '.87', '.89'),
    qualifiers=[
        wd.temperature(Quantity(20, wd.degree_Celsius, 19, 21)),
        wd.phase_of_matter(wd.liquid)],
    references=[[
        wd.HSDB_ID('35#section=TSCA-Test-Submissions'),
        wd.stated_in(wd.Hazardous_Substances_Data_Bank)]],
    rank=NormalRank())

print(stmt2a == stmt2b)           # True

If we already have a statement, then we can use the method annotate() to create an annotated version of it. For example:

stmt3 = wd.density(
    wd.benzene, Quantity('.88', wd.gram_per_cubic_centimetre, '.87', '.89'))

stmt4 = stmt3.annotate(
    qualifiers=[
        wd.temperature(Quantity(20, wd.degree_Celsius, 19, 21)),
        wd.phase_of_matter(wd.liquid)],
    references=[[
        wd.HSDB_ID('35#section=TSCA-Test-Submissions'),
        wd.stated_in(wd.Hazardous_Substances_Data_Bank)]],
    rank=NormalRank())

print(stmt2a == stmt2b == stmt4)  # True
print(stmt3 == stmt4)             # False

Conversely, if we have an annotated statement we can use the method unannotated() to obtain its plain version:

print(stmt3 == stmt4.unannotate()  # True

7 Ask, count, mix

We now turn to other query methods available in the Store API.

Two variants of the filter call are kb.ask(...) and kb.count(...). The former tests whether some (at least one) statement in kb matches the constraints ..., while the latter counts the number of statements in kb matching .... Both variants accept exactly the same constraint arguments as filter().

Here are some examples of ask() and count():

# Ask whether there are statements with subject "caffeine":
b = kb.ask(subject=wd.caffeine)
print(b)  # True

# Count statements with snak "place of death is New York City":
n = kb.count(snak=wd.place_of_death(wd.New_York_City))
print(n)  # 13491

# Ask whether there are some- or no-value statements whose
# subject is "a singer buried in Paris":
b = kb.ask(subject=wd.occupation(wd.singer)&wd.place_of_burial(wd.Paris),
        snak_mask=Filter.SOME_VALUE_SNAK|Filter.NO_VALUE_SNAK)
print(b)  # False

# Count statements whose subject is "a dish originating in Italy",
# property is "has part(s)", and value is "tomato":
n = kb.count(subject=wd.subtype(wd.dish)&wd.country_of_origin(wd.Italy),
        property=wd.has_parts, value=wd.tomato)
print(n)  # 17

# [Ask whether there are / count] statements "Brazil shares border with Chile":
b = kb.ask(subject=wd.Brazil, property=wd.shares_border_with, value=wd.Chile)
n = kb.count(subject=wd.Brazil, property=wd.shares_border_with, value=wd.Chile)
print(b, n)  # False, 0
# Ask whether there are statements with subject "caffeine":
$ kif ask --subject=wd.caffeine; echo $?
0

# Count statements with snak "place of death is New York City":
$ kif count --snak="wd.place_of_death(wd.New_York_City)"
13491

# Ask whether there are some- or no-value statements whose
# subject is "a singer buried in Paris":
$ kif ask --subject="wd.occupation(wd.singer)&wd.place_of_burial(wd.Paris)"\
    --snak-mask="Filter.SOME_VALUE_SNAK|Filter.NO_VALUE_SNAK"; echo $?
1

# Count statements whose subject is "a dish originating in Italy",
# property is "has part(s)", and value is "tomato":
$ kif count --subject="wd.subtype(wd.dish)&wd.country_of_origin(wd.Italy)"\
    --property=wd.has_parts --value=wd.tomato
17

# [Ask whether there are / count] statements "Brazil shares border with Chile":
$kif ask --subject=wd.Brazil\
    --property=wd.shares_border_with --value=wd.Chile; echo $?
1

$ kif count --subject=wd.Brazil\
    --property=wd.shares_border_with --value=wd.Chile
0

# Note: The ask command exits with status 0 if any matching statements
# where found; otherwise it exits with status 1.  We use "echo $?" above
# to display its exit status.

Note

One important difference between filter() and ask() and count() is that, while filter() returns a lazy iterator, meaning that the underlying filter operation is only performed when the iterator is advanced, the operations underlying ask() and count() are executed immediately.

The method count() also comes with projected variants (s, p, v, sp, sv, pv) which can be used to count the number of distinct statement components matching the given constraints. For example:

# Counts the number of distinct properties used in statements with
# subject "cat":
n = kb.count_p(subject=wd.cat)
print(n)  # 123
# Counts the number of distinct properties used in statements with
# subject "cat":
$ kif count --subject=wd.cat --select=p
123

Another method of the Store API is mix(). It is used to run multiple filters at once, combining the resulting statements into a single output stream. For example, suppose we want to match statements such that either:

  • the subject is "Brazil" and the value is "Argentina"; or
  • the subject is "France" and the value is "United Kingdom".

Note that in this case we want to match alternative subject-value pairs. This cannot be written as a single filter, but using mix(), we can write two separate filters and whose results are combined into a single output stream:

from kif_lib import Filter

it = kb.mix(
    Filter(subject=wd.Brazil, value=wd.Argentina),
    Filter(subject=wd.France, value=wd.United_Kingdom))
for stmt in it:
    print(stmt)

(Statement (Item Brazil) (ValueSnak (Property shares border with) (Item Argentina)))
(Statement (Item France) (ValueSnak (Property diplomatic relation) (Item United Kingdom)))
(Statement (Item Brazil) (ValueSnak (Property diplomatic relation) (Item Argentina)))

The Filter constructor used above builds a data-model representation of a filter pattern. It takes as arguments exactly the same constraints as filter(). Alternatively, we could have used kb.filter() calls directly:

it = kb.mix(
    kb.filter(subject=wd.Brazil, value=wd.Argentina),
    kb.filter(subject=wd.France, value=wd.United_Kingdom))
for stmt in it:
    print(stmt)

Note

The mix() call evaluates its filter arguments in the order they are given and interleaves their results. There is no parallelism though, as each filter evaluation causes the calling thread to block. One way to avoid blocking the calling thread during each filter evaluation is to use Python's async mechanism. To this end, the Store API provides the async versions afilter, aask, acount, and amix. These behave exactly like their sync counterparts but can be awaited within an asyncio event-loop. See Async for details.

8 Beyond Wikidata

We begin this section using DBpedia to demonstrate KIF's ability to query knowledge sources other than Wikidata. We chose DBpedia mainly because, besides being supported by KIF out-of-the-box, like Wikidata, it allows us to write mundane examples which can be understood by everybody. Besides Wikidata and DBpedia, KIF comes with builtin support for FactGrid, PubChem, UniProt, among other sources. Most of what is illustrated below using DBpedia can be easily adapted to work with these other sources.

Note

DBpedia is a popular knowledge graph of general-purpose information extracted from Wikipedia. FactGrid is a knowledge graph of historical data maintained by Gotha Research Centre (Germany). PubChem is a database of chemical data maintained the US National Institutes of Health (NIH). UniProt is a database of protein data maintained by the UniProt consortium.

8.1 DBpedia

To query a knowledge source other than Wikidata, all we need to do is create a new store using a different plugin. Here is how we create a store targeting DBpedia:

kb_dbp = Store('dbpedia')

The "dbpedia" plugin creates a SPARQL store, loads it with the DBpedia SPARQL mappings, and points it at the official DBpedia SPARQL endpoint. The result is a new store kb_dbp to which we can apply filters to obtain DBpedia statements:

# Match any statement whatsoever:
it = kb_dbp.filter(limit=3)
for stmt in it:
    print(stmt)
# Match any statement whatsoever:
$ kif filter --store=dbpedia --limit=3

# Note: The --store=dbpedia option instructs KIF CLI to use as target
#       a store instantiated with the "dbpedia" plugin, instead of the
#       default one (--store=wikidata).

(Statement (Item Mark Twain) (ValueSnak (Property birth name) "Samuel Langhorne Clemens"@en))
(Statement (Item Garfield) (ValueSnak (Property author) (Item Jim Davis (cartoonist)))) (Statement (Item Camelot) (ValueSnak (Property ruler) (Item King Arthur)))

The result of the previous filter is a stream of statements following the Wikidata syntax but with entities in the DBpedia namespace.

Note

KIF's DBpedia SPARQL mappings do not attempt to convert entities in the DBpedia namespace into that of Wikidata. The exception are Wikidata properties which, as we'll see in a moment, are converted by the DBpedia SPARQL mappings into equivalent DBpedia properties whenever possible.

We can use the DBpedia vocabulary module db to write filters referring to DBpedia entities. For example:

from kif_lib.vocabulary import db

# (1) Match statements with subject "Banana":
it = kb_dbp.filter(subject=db.r('Banana'))
print(next(it))

# (2) Match statements with snak "place of birth is Sicily":
it = kb_dbp.filter(snak=db.op('birthPlace')(db.r('Sicily')))
print(next(it))

# (3) Match statements with property "official language":
it = kb_dbp.filter(property=db.op('officialLanguage'))
print(next(it))

# (4) Match statements with value 733:
it = kb_dbp.filter(value=733)
print(next(it))

# (5) Match statements with subject "Brazil" and
#     snak "capital is Brasília":
it = kb_dbp.filter(subject=db.r('Brazil'),
    snak=db.op('capital')(db.r('Brasília')))
print(next(it))

# (6) Match statements with subject "Brazil" and
#     snak "capital is São Paulo":
it = kb_dbp.filter(subject=db.r('Brazil'),
    snak=db.op('capital')(db.r('São_Paulo')))
print(next(it)) # *** ERROR: iterator is empty (no such statement) ***
# (1) Match statements with subject "Banana":
$ kif filter -s dbpedia --subject="db.r('Banana')" --limit=1

# (2) Match statements with snak "birth place is Sicily":
$ kif filter -s dbpedia --snak="db.op('birthPlace')(db.r('Sicily'))" --limit=1

# (3) Match statements with property "official language":
$ kif filter -s dbpedia --property=db.op('officialLanguage') --limit=1

# (4) Match statements with value 733:
$ kif filter -s dbpedia --value=733 --limit=1

# (5) Match statements with subject "Brazil" and
#     snak "capital is Brasília":
$ kif filter -s dbpedia --subject="db.r('Brazil')"\
    --snak="db.op('capital')(db.r('Brasília'))" --limit=1

# (6) Match statements with subject "Brazil" and
#     snak "capital is São Paulo":
$ kif filter -s dbpedia --subject="db.r('Brazil')"\
    --snak="db.op('capital')(db.r('São_Paulo'))" --limit=1
# *** no output ***

# Note: "-s dbpedia" is an alias for "--store=dbpedia".

(1) (Statement (Item Banana) (ValueSnak (Property genus) (Item Musa (genus))))
(2) (Statement (Item Archimedes) (ValueSnak (Property birth place) (Item Sicily)))
(3) (Statement (Item Cameroon) (ValueSnak (Property official language) (Item French language)))
(4) (Statement (Item Mosquito County, Florida) (ValueSnak (Property population total) 733))
(5) (Statement (Item Brazil) (ValueSnak (Property capital) (Item Brasília)))

Different from Wikidata, DBpedia uses symbolic names for identifying entities in its namespace. Also, it distinguishes between resources (db.r), ontology concepts (db.oc), ontology properties (db.op), and properties (db.p). The first two, resources and ontology concepts, are interpreted by the DBpedia SPARQL mappings as KIF items (Item), while the last two, ontology properties and properties, are interpreted as KIF properties (Property). The DBpedia vocabulary module db defines aliases for some of the ontology properties. So, for example, we can write db.birthPlace for db.op('birthPlace'), db.capital for db.op('capital'), and so on.

By default, the DBpedia SPARQL mappings attempt to convert Wikidata properties into DBpedia properties whenever such mapping information is available in DBpedia. This means that it supports the use of Wikidata properties directly in filters. For example:

from kif_lib.vocabulary import wd

# Get the "place of birth" of "Jim Morrison":
it = kb_dbp(subject=db.r('Jim_Morrison'), property=wd.place_of_birth)
print(next(it))
# Get the "place of birth" of "Jim Morrison":
$ kif filter -s dbpedia "db.r('Jim_Morrison')" wd.place_of_birth --limit=1

(Statement (Item Jim Morrison) (ValueSnak (Property place of birth) (Item Melbourne, Florida)))

Notice that we used the Wikidata property place of birth (P19) instead of dbo:birthPlace above. This works because these two properties are declared "equivalent" in DBpedia. Similarly, if we ask for all properties leaving the node dbr:Jim_Morrison, we get as results not only DBpedia properties but also Wikidata properties, including place of birth (P19):

it = kb_dbp.filter_p(subject=db.r('Jim_Morrison'))
for prop in it:
    print(prop)
$ kif filter -s dbpedia --subject="db.r('Jim_Morrison')" --select=p

DBpedia properties:
(Property burial place)
(Property birth place)
(Property parents)
(Property occupation)

Wikidata properties:
(Property educated at)
(Property place of birth)
(Property occupation)
(Property place of death)

The DBpedia store kb_dbp supports all features discussed in the previous sections, including complex filter constraints, masks, projections, pseudo-properties, and the ask(), count(), and mix() operations. Here are some examples:

# (1) Subject's "capital is Paris" and property is "anthem":
it = kb_dbp.filter(subject=wd.capital(db.r('Paris')), property=wd.anthem)
print(next(it))

# (2) Subject was "influenced by Bertrand Russell" &
#     "educated at University of Vienna"; property is "description"; and
#     value is in English:
it = kb_dbp.filter(
    subject=(-db.p('influenced')(db.r('Bertrand_Russell'))&
             wd.educated_at(db.r('University_of_Vienna'))),
    property=wd.description, language='en')
print(next(it))

# (3) Get the subject and value of statements with property "doctoral student"
#     and value which "died of a condition caused by Cyanide":
it = kb_dbp.filter_sv(property=db.doctoralStudent,
    value=(wd.cause_of_death/db.op('medicalCause'))(db.r('Cyanide')))
print(next(it))

# (4) Same as (3) but count the number of matches:
n = kb_dbp.count_sv(property=db.doctoralStudent,
    value=(wd.cause_of_death/db.op('medicalCause'))(db.r('Cyanide')))
print(n)  # 1
# (1) Subject's "capital is Paris" and property is "anthem":
$ kif filter -s dbpedia --subject="wd.capital(db.r('Paris'))"\
    --property=wd.anthem

# (2) Subject was "influenced by Bertrand Russell" &
#     "educated at University of Vienna"; property is "description"; and
#     value is in English:
$ kif filter -s dbpedia\
    --subject="-db.p('influenced')(db.r('Bertrand_Russell'))&\
               wd.educated_at(db.r('University_of_Vienna'))"\
    --property=wd.description --language=en --limit=1

# (3) Get the subject and value of statements with property "doctoral student"
#     and value which "died of a condition caused by Cyanide":
$ kif filter -s dbpedia --property=db.doctoralStudent\
    --value="(wd.cause_of_death/db.op('medicalCause'))(db.r('Cyanide'))"\
    --select=sv --limit=1

# (4) Same as (3) but count the number of matches:
$ kif count -s dbpedia --property=db.doctoralStudent\
    --value="(wd.cause_of_death/db.op('medicalCause'))(db.r('Cyanide'))"\
    --select=sv
1

(1) (Statement (Item France) (ValueSnak (Property anthem) (Item La Marseillaise)))
(2) (Statement (Item Kurt Gödel) (ValueSnak DescriptionProperty "Kurt Friedrich Gödel (/ˈɡɜːrdəl/ GUR-dəl, German: [kʊʁt ˈɡøːdl̩]; April 28, 1906 – January 14, 1978) was a logician, mathematician, and philosopher. Considered along with Aristotle and Gottlob Frege to be one of the most significant logicians in history, Gödel had an immense effect upon scientific and philosophical thinking in the 20th century, a time when others such as Bertrand Russell, Alfred North Whitehead, and David Hilbert were using logic and set theory to investigate the foundations of mathematics, building on earlier work by the likes of Richard Dedekind, Georg Cantor and Frege."@en))
(3) (ValuePair (Item Alonzo Church) (Item Alan Turing))

Note

In example (2) above, the unary minus operator - is used in the subject constraint to invert the direction of the relation dbp:influenced. That is, by writing -db.p('influenced')(db.r('Bertrand_Russell')) we match the x such that there is a dbp:influenced edge in the graph with source dbo:Bertrand_Russell and target x. Still in example (2), notice that we use in the same filter the DBpedia property dbp:influenced, the Wikidata property educated at (P69), and the pseudo property wd.description.

8.2 Mixer store

We've seen how to construct stores targeting individual knowledge sources. KIF also comes with the "mixer" store plugin which allows us to combine multiple stores to into a new store. For example:

mx = Store('mixer', [Store('wikidata'), Store('dbpedia'), Store('factgrid')])

This instantiates and assigns to mx a new mixer store with three child stores, namely, SPARQL stores targeting Wikidata, DBpedia, and FactGrid. The mixer store acts as a proxy to the child stores. It provides a unified interface for querying them as if they were single knowledge source.

When we evaluate a filter() over mx, we get all statements from the child stores that match the filter. For example:

it = mx.filter(subject=wd.label('Joan of Arc'), value_snak=Filter.TIME))
for stmt in it:
    print(stmt)
$ kif filter -s wikidata -s dbpedia -s factgrid\
    --subject="wd.label('Joan of Arc')" --value-mask=Filter.TIME

Note: When multiple stores are given using option "-s", KIF CLI adds
      all of them to a new mixer store, which becomes the target store.

Wikidata:
(Statement (Item Joan of Arc) (ValueSnak (Property date of death) 8 June 1431))
DBpedia:
(Statement (Item Joan of Arc) (ValueSnak (Property death date) 30 May 1431))
FactGrid:
(Statement (Item Joan of Arc) (ValueSnak (Property Date of death) 8 June 1431))

The filter above matches statements whose subject has label "Joan of Arc" and value is a time value. The result is a stream of statements obtained by combining (by default, interleaving) the streams produced by mx's children. The first statement shown above comes from Wikidata, the second from DBpedia, and the third from FactGrid. The three concern the date of death of Joan of Arc but, because they come from distinct sources, they use different IRIs for the item "Joan of Arc" and the property "date of death". Also, while Wikidata and FactGrid agree on the date, 8th June 1431, the DBpedia asserts that the event took place on 30th May 1431.

The mixer store tends to be more useful when the child stores adopt the same namespace, or at least can handle entities in the namespaces of the other children. This is the case, for example, of the DBpedia SPARQL mappings, which can handle Wikidata properties. In contrast, the FactGrid SPARQL mappings use a namespace that is completely separate from Wikidata's and cannot handle Wikidata entities (at least for now).

KIF also comes with SPARQL mappings for PubChem, which is one of the largest open databases of chemical data. KIF's PubChem SPARQL mappings support Wikidata properties natively. This includes properties for universal chemical identifiers which can be used for matching compounds across knowledge sources. In the example below, we create a mixer store mx_wd_pc combining both, Wikidata and PubChem SPARQL stores, and then use it to obtain annotated statements about the molecular mass of a given chemical (benzene) from these sources:

1
2
3
4
5
6
7
8
9
kb_wd = Store('wikidata', extra_references=[[wd.stated_in(wd.Wikidata)]])
kb_pc = Store('pubchem', 'https://localhost:1234/sparql',
    extra_references=[[wd.stated_in(wd.PubChem)]])
kb_wd_pc = Store('mixer', [kb_wd, kb_pc])

it = kb_wd_pc.filter(subject=wd.InChIKey('UHOVQNZJYSORNB-UHFFFAOYSA-N'),
    property=wd.mass, annotated=True)
for stmt in it:
    print(stmt)

(AnnotatedStatement (Item benzene) (ValueSnak (Property mass) 78.046950192 dalton)
 (QualifierRecord)
 (ReferenceRecordSet
  (ReferenceRecord
   (ValueSnak (Property stated in) (Item Wikidata)))
  (ReferenceRecord
   (ValueSnak (Property based on heuristic) (Item inferred from InChI))))
NormalRank)

(AnnotatedStatement (Item [6]annulene) (ValueSnak (Property mass) 78.0469970703125
dalton)
 (QualifierRecord)
 (ReferenceRecordSet
  (ReferenceRecord
   (ValueSnak (Property stated in) (Item PubChem))))
NormalRank)

There are a couple of things to notice here.

  1. PubChem does not provide a public SPARQL endpoint, only RDF dumps. The code above (line 2) assumes that a SPARQL endpoint loaded with PubChem data is available at the address http://localhost:1234/sparql.

  2. We use the extra_references parameter of the Store() constructor (lines 1–2) to associate to the Wikidata and PubChem stores extra reference records. These extra records will be attached by the stores to every annotated statement they produce, allowing us to tell which store generated each statement.

  3. To avoid using a source-dependent identifier for the benzene, we use its InChIKey string "UHOVQNZJYSORNB-UHFFFAOYSA-N" (line 6), which is a universal identifier. This works because property wd.InChIKey is recognized by both SPARQL mappings Wikidata's and PubChem's. Similarly, wd.mass (line 7) is also recognized by both.

Besides stores, the other kind of data-model object producing engine in KIF are the searchers. A KIF searcher is an interface to a similarity search method within a namespace. KIF searchers follow the Search API which is implemented using a plugin architecture. For example, given a search string, the "wikidata" search plugin uses the Wikidata's MediaWiki REST API to look in the Wikidata namespace for entities with a similar label, alias, or description:

from kif_lib import Search

sr = Search('wikidata')
it = sr.item('pizza', limit=3)
for item in it:
    print(item)
$ kif search --search=wikidata 'pizza' --item --limit=3

# Note: We can omit --search=wikidata, as it is the default

(Item pizza)
(Item Mariagrazia Pizza)
(Item Pizza Hut)

The Search() constructor creates a new searcher using the given plugin. In the example above, we used plugin "wikidata" to look for at most three items with descriptors matching the search string "pizza". The items found are returned in order of relevance, from most relevant to least relevant. The top three items found above refer to the food item, to a person (a pharmaceutical chemist), and to the American restaurant chain.

The sr.item() call we used above searches for and returns items. A related call in the Search API is item_descriptor() which in addition to the items returns any available descriptors. For example:

it = sr.item_descriptor('pizza', limit=3)
for item, desc in it:
    print(item, desc)
$ kif search --search=wikidata 'pizza' --item-descriptor --limit=3

# Note: The "--item-descriptor" option instructs the searcher to obtain
#       any available descriptors.

(Item pizza)
{'labels': {'en': "pizza"@en}, 'descriptions': {'en': "Italian universal popular dish with a flat dough-based base and toppings"@en}}
(Item Mariagrazia Pizza)
{'labels': {'en': "Mariagrazia Pizza"@en}, 'descriptions': {'en': "pharmaceutical chemist"@en}, 'aliases': {'en': {"Pizza"@en}}}
(Item Pizza Hut)
{'labels': {'en': "Pizza Hut"en}, 'descriptions': {'en': "American restaurant chain and international franchise"@en}}

The Search API also provides the methods property() and property_descriptor() for searching for properties.

Note

The Search API provides the async versions aitem, aitem_descriptor, aproperty, and aproperty_descriptor. As in the case of stores, the async versions behave exactly like their sync counterparts but can be awaited within an asyncio event-loop. See Async for details.

KIF comes with built-in plugins to search for entities in the namespaces of Wikidata, DBpedia, and PubChem. There is also the general "ddgs" plugin, based on the DDGS library, which can used to search for entities in any namespace reachable through the public Internet.

Note

The available search plugins can be shown using KIF CLI:

$ kif show-plugins --search
...
dbpedia             : DBpedia Lookup API search
dbpedia-ddgs        : DBpedia DDGS search
ddgs                : DDGS search
pubchem             : PubChem PUG search
pubchem-ddgs        : PubChem DDGS search
wikidata            : Wikidata Wikibase API search
wikidata-ddgs       : Wikidata DDGS search
wikidata-rest       : Wikidata REST search
wikidata-wapi-query : Wikidata Wikibase API search ("query" action)
...

10 Final remarks

This concludes the KIF tutorial.

Check out the guides and the API reference for a detailed description of KIF features, including those which were left out of the tutorial.

Have fun!