Quickstart
KIF is a Wikidata-based framework for integrating knowledge sources.
This quickstart guide presents the basic API of KIF.
Hello world!
We start by importing the kif_lib
namespace:
from kif_lib import *
We’ll also need the Wikidata vocabulary module wd
:
from kif_lib.vocabulary import wd
Let us no we create a SPARQL store pointing to the official Wikidata query service:
kb = Store('sparql', 'https://query.wikidata.org/sparql')
A KIF store is an inteface to a knowledge source. It allows us to view the source as a set of Wikidata-like statements.
The store kb
we just created is an interface to Wikidata itself. We can
use it, for example, to fetch from Wikidata three statements about Brazil:
it = kb.filter(subject=wd.Brazil, limit=3)
for stmt in it:
display(stmt)
(Statement (Item Brazil) (ValueSnak (Property country) (Item Brazil)))
(Statement (Item Brazil) (ValueSnak (Property head of state) (Item Luiz Inácio Lula da Silva)))
(Statement (Item Brazil) (ValueSnak (Property instance of) (Item wd:Q512187)))
Filters
The kb.filter(...)
call searches for statements in kb
matching the
restrictions ...
.
The result of a filter call is a (lazy) iterator it
of statements:
it = kb.filter(subject=wd.Brazil)
We can advance it
to obtain statements:
If no limit
argument is given to kb.filter()
, the returned iterator
contains all matching statements.
Basic filters
We can filter statements by any combination of subject, property, and value.
For example:
###
# match any statement
###
next(kb.filter())
(Statement (Item Sudan) (ValueSnak (Property head of government) (Item wd:Q27654948)))
###
# match statements with subject "Brazil" and property "official website"
###
next(kb.filter(subject=wd.Brazil, property=wd.official_website))
(Statement (Item Brazil) (ValueSnak (Property official website) https://www.gov.br))
###
# match statements with property "official website" and value "https://www.ibm.com/"
###
next(kb.filter(property=wd.official_website, value=IRI('https://www.ibm.com/')))
(Statement (Item IBM) (ValueSnak (Property official website) https://www.ibm.com/))
###
# match statements with value "78.046950192 dalton"
###
next(kb.filter(value=Quantity('78.046950192', unit=wd.dalton)))
(Statement (Item Claus’ benzene) (ValueSnak (Property mass) (Quantity 78.046950192 (Item dalton))))
We can also match statements having some (unknown) value:
next(kb.filter(snak=wd.date_of_birth.some_value()))
(Statement (Item Aemilius Macer) (SomeValueSnak (Property date of birth)))
Or no value:
next(kb.filter(snak=wd.date_of_death.no_value()))
(Statement (Item wd:Q120334706) (NoValueSnak (Property date of death)))
Fingerprints (indirect ids)
So far, we have been using the symbolic aliases defined in the wd
module to
specify entities in filters:
Alternatively, we can use their numeric Wikidata ids:
###
# match statements with subject Q155 (Brazil) and property P30 (continent)
###
next(kb.filter(subject=wd.Q(155), property=wd.P(30)))
(Statement (Item Brazil) (ValueSnak (Property continent) (Item South America)))
Sometimes, however, ids are not enough. We might need to specify an entity indirectly by giving not its id but a property it satisfies.
In cases like this, we can use a fingerprint:
###
# match statemets whose subject "is a dog" and value "is a human"
###
next(kb.filter(subject=wd.instance_of(wd.dog), value=wd.instance_of(wd.human)))
(Statement (Item Abuwtiyuw) (ValueSnak (Property discoverer or inventor) (Item George Andrew Reisner)))
Properties themselves can also be specified using fingerprints:
###
# match statements whose property is "equivalent to Schema.org's 'weight'"
###
next(kb.filter(property=wd.equivalent_property('https://schema.org/weight')))
(Statement (Item Zuzana Ondrášková) (ValueSnak (Property mass) (Quantity 56 (Item kilogram))))
The -
(unary minus) operator can be used to invert the direction of the
property used in the fingerprint:
###
# match statements whose subject is "the continent of Brazil"
###
next(kb.filter(subject=-(wd.continent(wd.Brazil))))
(Statement (Item South America) (NoValueSnak (Property country)))
And-ing and or-ing fingeprints
Entity ids and fingerpints can be combined using the operators &
(and) and
|
(or).
For example:
###
# match three statements such that:
# - subject is "Brazil" or "Argentina"
# - property is "continent" or "highest point"
###
it = kb.filter(
subject=wd.Brazil | wd.Argentina,
property=wd.continent | wd.highest_point,
limit=3)
for stmt in it:
display(stmt)
(Statement (Item Brazil) (ValueSnak (Property continent) (Item South America)))
(Statement (Item Brazil) (ValueSnak (Property highest point) (Item Pico da Neblina)))
(Statement (Item Argentina) (ValueSnak (Property continent) (Item South America)))
###
# match three statements such that:
# - subject "has continent South America" and "official language is Portuguese"
# - value "is a river" or "is a mountain"
###
it = kb.filter(
subject=wd.continent(wd.South_America) & wd.official_language(wd.Portuguese),
value=wd.instance_of(wd.river) | wd.instance_of(wd.mountain),
limit=3)
for stmt in it:
display(stmt)
(Statement (Item Brazil) (ValueSnak (Property located in or next to body of water) (Item Paraná River)))
(Statement (Item Brazil) (ValueSnak (Property located in or next to body of water) (Item Amazon)))
(Statement (Item Brazil) (ValueSnak (Property located in or next to body of water) (Item São Francisco River)))
###
# match three statements such that:
# - subject "is a female" and ("was born in NYC" or "was born in Rio")
# - property is "field of work" or "is equivalent to Schema.org's 'hasOccupation'"
###
it = kb.filter(
subject=wd.sex_or_gender(wd.female)\
& (wd.place_of_birth(wd.New_York_City) | wd.place_of_birth(wd.Rio_de_Janeiro)),
property=wd.field_of_work\
| wd.equivalent_property(IRI('https://schema.org/hasOccupation')),
limit=3)
for stmt in it:
display(stmt)
(Statement (Item Pauline Newman) (ValueSnak (Property occupation) (Item judge)))
(Statement (Item Pauline Newman) (ValueSnak (Property occupation) (Item chemist)))
(Statement (Item Jane Stafford) (ValueSnak (Property occupation) (Item chemical technologist)))
Count and contains
A variant of the filter call is kb.count(...)
which, instead of
statements, counts the number of statements matching restrictions ...
:
kb.count(subject=wd.Brazil, property=wd.population | wd.official_language)
2
The kb.contains()
call tests whether a given statement occurs in kb
.
stmt1 = wd.official_language(wd.Brazil, wd.Portuguese)
kb.contains(stmt1)
True
stmt2 = wd.official_language(wd.Brazil, wd.Spanish)
kb.contains(stmt2)
False
Final remarks
This concludes the quickstart guide.
There are many other calls in the Store API of KIF. For more information see, the API Reference.