Quickstart

KIF is a Wikidata-based framework for integrating knowledge sources.

This quickstart guide presents the basic API of KIF.

Hello world!

We start by importing the kif_lib namespace:

from kif_lib import *

We’ll also need the Wikidata vocabulary module wd:

from kif_lib.vocabulary import wd

Let us no we create a SPARQL store pointing to the official Wikidata query service:

kb = Store('wdqs')

A KIF store is an inteface to a knowledge source. It allows us to view the source as a set of Wikidata-like statements.

The store kb we just created is an interface to Wikidata itself. We can use it, for example, to fetch from Wikidata three statements about Brazil:

it = kb.filter(subject=wd.Brazil, limit=3)
for stmt in it:
    display(stmt)

(Statement (Item Brazil) (ValueSnak (Property Mastodon instance URL) https://mastodon.com.br))

(Statement (Item Brazil) (ValueSnak (Property Mastodon instance URL) https://masto.donte.com.br))

(Statement (Item Brazil) (ValueSnak (Property external data available at URL) http://dados.gov.br))

Filters

The kb.filter(...) call searches for statements in kb matching the restrictions ....

The result of a filter call is a (lazy) iterator it of statements:

it = kb.filter(subject=wd.Brazil)

We can advance it to obtain statements:

next(it)

(Statement (Item Brazil) (ValueSnak LabelProperty “巴西”@zh-sg))

If no limit argument is given to kb.filter(), the returned iterator contains all matching statements.

Basic filters

We can filter statements by any combination of subject, property, and value.

For example:

match any statement

next(kb.filter())

(Statement (Lexeme wd:L691226) (ValueSnak LanguageProperty (Item Japanese)))

match statements with subject “Brazil” and property “official website”

next(kb.filter(subject=wd.Brazil, property=wd.official_website))

(Statement (Item Brazil) (ValueSnak (Property official website) https://www.gov.br))

match statements with property “official website” and value “https://www.ibm.com/”

next(kb.filter(property=wd.official_website, value=IRI('https://www.ibm.com/')))

(Statement (Item IBM) (ValueSnak (Property official website) https://www.ibm.com/))

match statements with value “78.046950192 dalton”

next(kb.filter(value=Quantity('78.046950192', unit=wd.dalton)))

(Statement (Item Cyclopropane, tris(methylene)-) (ValueSnak (Property mass) (Quantity 78.046950192 (Item dalton))))

We can also match statements having some (unknown) value:

next(kb.filter(snak=wd.date_of_birth.some_value()))

(Statement (Item Aemilius Macer) (SomeValueSnak (Property date of birth)))

Or no value:

next(kb.filter(snak=wd.date_of_death.no_value()))

(Statement (Item wd:Q123038537) (NoValueSnak (Property date of death)))

Fingerprints (indirect ids)

So far, we have been using the symbolic aliases defined in the wd module to specify entities in filters:

display(wd.Brazil)
display(wd.continent)

(Item Brazil)

(Property continent)

Alternatively, we can use their numeric Wikidata ids:

match statements with subject Q155 (Brazil) and property P30 (continent)

next(kb.filter(subject=wd.Q(155), property=wd.P(30)))

(Statement (Item Brazil) (ValueSnak (Property continent) (Item South America)))

Sometimes, however, ids are not enough. We might need to specify an entity indirectly by giving not its id but a property it satisfies.

In cases like this, we can use a fingerprint:

match statemets whose subject “is a dog” and value “is a human”

next(kb.filter(subject=wd.instance_of(wd.dog), value=wd.instance_of(wd.human)))

(Statement (Item Decoy Ohtani) (ValueSnak (Property owned by) (Item Shohei Ohtani)))

Properties themselves can also be specified using fingerprints:

match statements whose property is “equivalent to Schema.org’s ‘weight’”

next(kb.filter(property=wd.equivalent_property('https://schema.org/weight')))
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[16], line 1
----> 1 next(kb.filter(property=wd.equivalent_property('https://schema.org/weight')))

File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/mixer.py:212, in MixerStore._filter(self, filter, limit, distinct)
    210 if src in exausted:
    211     continue    # skip source
--> 212 stmt = next(src)
    213 if distinct:
    214     if stmt in seen:

File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/sparql_core.py:420, in _SPARQL_Store._filter(self, filter, limit, distinct)
    418 if query.where_is_empty():
    419     break           # nothing to do
--> 420 res = self.backend.select(str(query))
    421 if 'results' not in res:
    422     break           # nothing to do

File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/sparql_core.py:109, in _SPARQL_Store.Backend.select(self, query)
    100 """Evaluates select query over back-end.
    101 
    102 Parameters:
   (...)    106    Select query results.
    107 """
    108 _logger.debug('%s()\n%s', self.select.__qualname__, query)
--> 109 return self._select(query)

File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/httpx.py:98, in HttpxSPARQL_Store.HttpxBackend._select(self, query)
     96 @override
     97 def _select(self, query: str) -> SPARQL_Results:
---> 98     return self._http_post(query).json()

File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/httpx.py:103, in HttpxSPARQL_Store.HttpxBackend._http_post(self, text)
    101 assert self._client is not None
    102 try:
--> 103     res = self._client.post(
    104         self._iri.content, content=text.encode('utf-8'))
    105     res.raise_for_status()
    106     return res

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:1144, in Client.post(self, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
   1123 def post(
   1124     self,
   1125     url: URL | str,
   (...)   1137     extensions: RequestExtensions | None = None,
   1138 ) -> Response:
   1139     """
   1140     Send a `POST` request.
   1141 
   1142     **Parameters**: See `httpx.request`.
   1143     """
-> 1144     return self.request(
   1145         "POST",
   1146         url,
   1147         content=content,
   1148         data=data,
   1149         files=files,
   1150         json=json,
   1151         params=params,
   1152         headers=headers,
   1153         cookies=cookies,
   1154         auth=auth,
   1155         follow_redirects=follow_redirects,
   1156         timeout=timeout,
   1157         extensions=extensions,
   1158     )

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:825, in Client.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
    810     warnings.warn(message, DeprecationWarning, stacklevel=2)
    812 request = self.build_request(
    813     method=method,
    814     url=url,
   (...)    823     extensions=extensions,
    824 )
--> 825 return self.send(request, auth=auth, follow_redirects=follow_redirects)

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:914, in Client.send(self, request, stream, auth, follow_redirects)
    910 self._set_timeout(request)
    912 auth = self._build_request_auth(request, auth)
--> 914 response = self._send_handling_auth(
    915     request,
    916     auth=auth,
    917     follow_redirects=follow_redirects,
    918     history=[],
    919 )
    920 try:
    921     if not stream:

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:942, in Client._send_handling_auth(self, request, auth, follow_redirects, history)
    939 request = next(auth_flow)
    941 while True:
--> 942     response = self._send_handling_redirects(
    943         request,
    944         follow_redirects=follow_redirects,
    945         history=history,
    946     )
    947     try:
    948         try:

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:979, in Client._send_handling_redirects(self, request, follow_redirects, history)
    976 for hook in self._event_hooks["request"]:
    977     hook(request)
--> 979 response = self._send_single_request(request)
    980 try:
    981     for hook in self._event_hooks["response"]:

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:1014, in Client._send_single_request(self, request)
   1009     raise RuntimeError(
   1010         "Attempted to send an async request with a sync Client instance."
   1011     )
   1013 with request_context(request=request):
-> 1014     response = transport.handle_request(request)
   1016 assert isinstance(response.stream, SyncByteStream)
   1018 response.request = request

File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_transports/default.py:250, in HTTPTransport.handle_request(self, request)
    237 req = httpcore.Request(
    238     method=request.method,
    239     url=httpcore.URL(
   (...)    247     extensions=request.extensions,
    248 )
    249 with map_httpcore_exceptions():
--> 250     resp = self._pool.handle_request(req)
    252 assert isinstance(resp.stream, typing.Iterable)
    254 return Response(
    255     status_code=resp.status,
    256     headers=resp.headers,
    257     stream=ResponseStream(resp.stream),
    258     extensions=resp.extensions,
    259 )

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/connection_pool.py:256, in ConnectionPool.handle_request(self, request)
    253         closing = self._assign_requests_to_connections()
    255     self._close_connections(closing)
--> 256     raise exc from None
    258 # Return the response. Note that in this case we still have to manage
    259 # the point at which the response is closed.
    260 assert isinstance(response.stream, typing.Iterable)

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/connection_pool.py:236, in ConnectionPool.handle_request(self, request)
    232 connection = pool_request.wait_for_connection(timeout=timeout)
    234 try:
    235     # Send the request on the assigned connection.
--> 236     response = connection.handle_request(
    237         pool_request.request
    238     )
    239 except ConnectionNotAvailable:
    240     # In some cases a connection may initially be available to
    241     # handle a request, but then become unavailable.
    242     #
    243     # In this case we clear the connection and try again.
    244     pool_request.clear_connection()

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/connection.py:103, in HTTPConnection.handle_request(self, request)
    100     self._connect_failed = True
    101     raise exc
--> 103 return self._connection.handle_request(request)

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:136, in HTTP11Connection.handle_request(self, request)
    134     with Trace("response_closed", logger, request) as trace:
    135         self._response_closed()
--> 136 raise exc

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:106, in HTTP11Connection.handle_request(self, request)
     95     pass
     97 with Trace(
     98     "receive_response_headers", logger, request, kwargs
     99 ) as trace:
    100     (
    101         http_version,
    102         status,
    103         reason_phrase,
    104         headers,
    105         trailing_data,
--> 106     ) = self._receive_response_headers(**kwargs)
    107     trace.return_value = (
    108         http_version,
    109         status,
    110         reason_phrase,
    111         headers,
    112     )
    114 network_stream = self._network_stream

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:177, in HTTP11Connection._receive_response_headers(self, request)
    174 timeout = timeouts.get("read", None)
    176 while True:
--> 177     event = self._receive_event(timeout=timeout)
    178     if isinstance(event, h11.Response):
    179         break

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:217, in HTTP11Connection._receive_event(self, timeout)
    214     event = self._h11_state.next_event()
    216 if event is h11.NEED_DATA:
--> 217     data = self._network_stream.read(
    218         self.READ_NUM_BYTES, timeout=timeout
    219     )
    221     # If we feed this case through h11 we'll raise an exception like:
    222     #
    223     #     httpcore.RemoteProtocolError: can't handle event type
   (...)    227     # perspective. Instead we handle this case distinctly and treat
    228     # it as a ConnectError.
    229     if data == b"" and self._h11_state.their_state == h11.SEND_RESPONSE:

File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_backends/sync.py:128, in SyncStream.read(self, max_bytes, timeout)
    126 with map_exceptions(exc_map):
    127     self._sock.settimeout(timeout)
--> 128     return self._sock.recv(max_bytes)

File /usr/local/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py:1285, in SSLSocket.recv(self, buflen, flags)
   1281     if flags != 0:
   1282         raise ValueError(
   1283             "non-zero flags not allowed in calls to recv() on %s" %
   1284             self.__class__)
-> 1285     return self.read(buflen)
   1286 else:
   1287     return super().recv(buflen, flags)

File /usr/local/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py:1140, in SSLSocket.read(self, len, buffer)
   1138         return self._sslobj.read(len, buffer)
   1139     else:
-> 1140         return self._sslobj.read(len)
   1141 except SSLError as x:
   1142     if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:

KeyboardInterrupt: 

The - (unary minus) operator can be used to invert the direction of the property used in the fingerprint:

match statements whose subject is “the continent of Brazil”

next(kb.filter(subject=-(wd.continent(wd.Brazil))))

And-ing and or-ing fingeprints

Entity ids and fingerpints can be combined using the operators & (and) and | (or).

For example:

match three statements such that:

  • subject is “Brazil” or “Argentina”

  • property is “continent” or “highest point”

it = kb.filter(
        subject=wd.Brazil | wd.Argentina,
        property=wd.continent | wd.highest_point,
        limit=3)
for stmt in it:
    display(stmt)

match three statements such that:

  • subject “has continent South America” and “official language is Portuguese”

  • value “is a river” or “is a mountain”

it = kb.filter(
        subject=wd.continent(wd.South_America) & wd.official_language(wd.Portuguese),
        value=wd.instance_of(wd.river) | wd.instance_of(wd.mountain),
        limit=3)
for stmt in it:
    display(stmt)

match three statements such that:

  • subject “is a female” and (“was born in NYC” or “was born in Rio”)

  • property is “field of work” or “is equivalent to Schema.org’s ‘hasOccupation’”

it = kb.filter(
        subject=wd.sex_or_gender(wd.female)\
        & (wd.place_of_birth(wd.New_York_City) | wd.place_of_birth(wd.Rio_de_Janeiro)),
        property=wd.field_of_work\
        | wd.equivalent_property(IRI('https://schema.org/hasOccupation')),
        limit=3)
for stmt in it:
    display(stmt)

Count and contains

A variant of the filter call is kb.count(...) which, instead of statements, counts the number of statements matching restrictions ...:

kb.count(subject=wd.Brazil, property=wd.population | wd.official_language)

The kb.contains() call tests whether a given statement occurs in kb.

stmt1 = wd.official_language(wd.Brazil, wd.Portuguese)
kb.contains(stmt1)
stmt2 = wd.official_language(wd.Brazil, wd.Spanish)
kb.contains(stmt2)

Final remarks

This concludes the quickstart guide.

There are many other calls in the Store API of KIF. For more information see, the API Reference.