Quickstart
KIF is a Wikidata-based framework for integrating knowledge sources.
This quickstart guide presents the basic API of KIF.
Hello world!
We start by importing the kif_lib
namespace:
from kif_lib import *
We’ll also need the Wikidata vocabulary module wd
:
from kif_lib.vocabulary import wd
Let us no we create a SPARQL store pointing to the official Wikidata query service:
kb = Store('wdqs')
A KIF store is an inteface to a knowledge source. It allows us to view the source as a set of Wikidata-like statements.
The store kb
we just created is an interface to Wikidata itself. We can
use it, for example, to fetch from Wikidata three statements about Brazil:
it = kb.filter(subject=wd.Brazil, limit=3)
for stmt in it:
display(stmt)
(Statement (Item Brazil) (ValueSnak (Property Mastodon instance URL) https://mastodon.com.br))
(Statement (Item Brazil) (ValueSnak (Property Mastodon instance URL) https://masto.donte.com.br))
(Statement (Item Brazil) (ValueSnak (Property external data available at URL) http://dados.gov.br))
Filters
The kb.filter(...)
call searches for statements in kb
matching the
restrictions ...
.
The result of a filter call is a (lazy) iterator it
of statements:
it = kb.filter(subject=wd.Brazil)
We can advance it
to obtain statements:
next(it)
(Statement (Item Brazil) (ValueSnak LabelProperty “巴西”@zh-sg))
If no limit
argument is given to kb.filter()
, the returned iterator
contains all matching statements.
Basic filters
We can filter statements by any combination of subject, property, and value.
For example:
match any statement
next(kb.filter())
(Statement (Lexeme wd:L691226) (ValueSnak LanguageProperty (Item Japanese)))
match statements with subject “Brazil” and property “official website”
next(kb.filter(subject=wd.Brazil, property=wd.official_website))
(Statement (Item Brazil) (ValueSnak (Property official website) https://www.gov.br))
match statements with property “official website” and value “https://www.ibm.com/”
next(kb.filter(property=wd.official_website, value=IRI('https://www.ibm.com/')))
(Statement (Item IBM) (ValueSnak (Property official website) https://www.ibm.com/))
match statements with value “78.046950192 dalton”
next(kb.filter(value=Quantity('78.046950192', unit=wd.dalton)))
(Statement (Item Cyclopropane, tris(methylene)-) (ValueSnak (Property mass) (Quantity 78.046950192 (Item dalton))))
We can also match statements having some (unknown) value:
next(kb.filter(snak=wd.date_of_birth.some_value()))
(Statement (Item Aemilius Macer) (SomeValueSnak (Property date of birth)))
Or no value:
next(kb.filter(snak=wd.date_of_death.no_value()))
(Statement (Item wd:Q123038537) (NoValueSnak (Property date of death)))
Fingerprints (indirect ids)
So far, we have been using the symbolic aliases defined in the wd
module to
specify entities in filters:
Alternatively, we can use their numeric Wikidata ids:
match statements with subject Q155 (Brazil) and property P30 (continent)
next(kb.filter(subject=wd.Q(155), property=wd.P(30)))
(Statement (Item Brazil) (ValueSnak (Property continent) (Item South America)))
Sometimes, however, ids are not enough. We might need to specify an entity indirectly by giving not its id but a property it satisfies.
In cases like this, we can use a fingerprint:
match statemets whose subject “is a dog” and value “is a human”
next(kb.filter(subject=wd.instance_of(wd.dog), value=wd.instance_of(wd.human)))
(Statement (Item Decoy Ohtani) (ValueSnak (Property owned by) (Item Shohei Ohtani)))
Properties themselves can also be specified using fingerprints:
match statements whose property is “equivalent to Schema.org’s ‘weight’”
next(kb.filter(property=wd.equivalent_property('https://schema.org/weight')))
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
Cell In[16], line 1
----> 1 next(kb.filter(property=wd.equivalent_property('https://schema.org/weight')))
File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/mixer.py:212, in MixerStore._filter(self, filter, limit, distinct)
210 if src in exausted:
211 continue # skip source
--> 212 stmt = next(src)
213 if distinct:
214 if stmt in seen:
File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/sparql_core.py:420, in _SPARQL_Store._filter(self, filter, limit, distinct)
418 if query.where_is_empty():
419 break # nothing to do
--> 420 res = self.backend.select(str(query))
421 if 'results' not in res:
422 break # nothing to do
File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/sparql_core.py:109, in _SPARQL_Store.Backend.select(self, query)
100 """Evaluates select query over back-end.
101
102 Parameters:
(...) 106 Select query results.
107 """
108 _logger.debug('%s()\n%s', self.select.__qualname__, query)
--> 109 return self._select(query)
File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/httpx.py:98, in HttpxSPARQL_Store.HttpxBackend._select(self, query)
96 @override
97 def _select(self, query: str) -> SPARQL_Results:
---> 98 return self._http_post(query).json()
File ~/Downloads/venv313/lib/python3.13/site-packages/kif_lib/store/sparql/httpx.py:103, in HttpxSPARQL_Store.HttpxBackend._http_post(self, text)
101 assert self._client is not None
102 try:
--> 103 res = self._client.post(
104 self._iri.content, content=text.encode('utf-8'))
105 res.raise_for_status()
106 return res
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:1144, in Client.post(self, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
1123 def post(
1124 self,
1125 url: URL | str,
(...) 1137 extensions: RequestExtensions | None = None,
1138 ) -> Response:
1139 """
1140 Send a `POST` request.
1141
1142 **Parameters**: See `httpx.request`.
1143 """
-> 1144 return self.request(
1145 "POST",
1146 url,
1147 content=content,
1148 data=data,
1149 files=files,
1150 json=json,
1151 params=params,
1152 headers=headers,
1153 cookies=cookies,
1154 auth=auth,
1155 follow_redirects=follow_redirects,
1156 timeout=timeout,
1157 extensions=extensions,
1158 )
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:825, in Client.request(self, method, url, content, data, files, json, params, headers, cookies, auth, follow_redirects, timeout, extensions)
810 warnings.warn(message, DeprecationWarning, stacklevel=2)
812 request = self.build_request(
813 method=method,
814 url=url,
(...) 823 extensions=extensions,
824 )
--> 825 return self.send(request, auth=auth, follow_redirects=follow_redirects)
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:914, in Client.send(self, request, stream, auth, follow_redirects)
910 self._set_timeout(request)
912 auth = self._build_request_auth(request, auth)
--> 914 response = self._send_handling_auth(
915 request,
916 auth=auth,
917 follow_redirects=follow_redirects,
918 history=[],
919 )
920 try:
921 if not stream:
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:942, in Client._send_handling_auth(self, request, auth, follow_redirects, history)
939 request = next(auth_flow)
941 while True:
--> 942 response = self._send_handling_redirects(
943 request,
944 follow_redirects=follow_redirects,
945 history=history,
946 )
947 try:
948 try:
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:979, in Client._send_handling_redirects(self, request, follow_redirects, history)
976 for hook in self._event_hooks["request"]:
977 hook(request)
--> 979 response = self._send_single_request(request)
980 try:
981 for hook in self._event_hooks["response"]:
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_client.py:1014, in Client._send_single_request(self, request)
1009 raise RuntimeError(
1010 "Attempted to send an async request with a sync Client instance."
1011 )
1013 with request_context(request=request):
-> 1014 response = transport.handle_request(request)
1016 assert isinstance(response.stream, SyncByteStream)
1018 response.request = request
File ~/Downloads/venv313/lib/python3.13/site-packages/httpx/_transports/default.py:250, in HTTPTransport.handle_request(self, request)
237 req = httpcore.Request(
238 method=request.method,
239 url=httpcore.URL(
(...) 247 extensions=request.extensions,
248 )
249 with map_httpcore_exceptions():
--> 250 resp = self._pool.handle_request(req)
252 assert isinstance(resp.stream, typing.Iterable)
254 return Response(
255 status_code=resp.status,
256 headers=resp.headers,
257 stream=ResponseStream(resp.stream),
258 extensions=resp.extensions,
259 )
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/connection_pool.py:256, in ConnectionPool.handle_request(self, request)
253 closing = self._assign_requests_to_connections()
255 self._close_connections(closing)
--> 256 raise exc from None
258 # Return the response. Note that in this case we still have to manage
259 # the point at which the response is closed.
260 assert isinstance(response.stream, typing.Iterable)
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/connection_pool.py:236, in ConnectionPool.handle_request(self, request)
232 connection = pool_request.wait_for_connection(timeout=timeout)
234 try:
235 # Send the request on the assigned connection.
--> 236 response = connection.handle_request(
237 pool_request.request
238 )
239 except ConnectionNotAvailable:
240 # In some cases a connection may initially be available to
241 # handle a request, but then become unavailable.
242 #
243 # In this case we clear the connection and try again.
244 pool_request.clear_connection()
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/connection.py:103, in HTTPConnection.handle_request(self, request)
100 self._connect_failed = True
101 raise exc
--> 103 return self._connection.handle_request(request)
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:136, in HTTP11Connection.handle_request(self, request)
134 with Trace("response_closed", logger, request) as trace:
135 self._response_closed()
--> 136 raise exc
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:106, in HTTP11Connection.handle_request(self, request)
95 pass
97 with Trace(
98 "receive_response_headers", logger, request, kwargs
99 ) as trace:
100 (
101 http_version,
102 status,
103 reason_phrase,
104 headers,
105 trailing_data,
--> 106 ) = self._receive_response_headers(**kwargs)
107 trace.return_value = (
108 http_version,
109 status,
110 reason_phrase,
111 headers,
112 )
114 network_stream = self._network_stream
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:177, in HTTP11Connection._receive_response_headers(self, request)
174 timeout = timeouts.get("read", None)
176 while True:
--> 177 event = self._receive_event(timeout=timeout)
178 if isinstance(event, h11.Response):
179 break
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_sync/http11.py:217, in HTTP11Connection._receive_event(self, timeout)
214 event = self._h11_state.next_event()
216 if event is h11.NEED_DATA:
--> 217 data = self._network_stream.read(
218 self.READ_NUM_BYTES, timeout=timeout
219 )
221 # If we feed this case through h11 we'll raise an exception like:
222 #
223 # httpcore.RemoteProtocolError: can't handle event type
(...) 227 # perspective. Instead we handle this case distinctly and treat
228 # it as a ConnectError.
229 if data == b"" and self._h11_state.their_state == h11.SEND_RESPONSE:
File ~/Downloads/venv313/lib/python3.13/site-packages/httpcore/_backends/sync.py:128, in SyncStream.read(self, max_bytes, timeout)
126 with map_exceptions(exc_map):
127 self._sock.settimeout(timeout)
--> 128 return self._sock.recv(max_bytes)
File /usr/local/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py:1285, in SSLSocket.recv(self, buflen, flags)
1281 if flags != 0:
1282 raise ValueError(
1283 "non-zero flags not allowed in calls to recv() on %s" %
1284 self.__class__)
-> 1285 return self.read(buflen)
1286 else:
1287 return super().recv(buflen, flags)
File /usr/local/Cellar/python@3.13/3.13.3/Frameworks/Python.framework/Versions/3.13/lib/python3.13/ssl.py:1140, in SSLSocket.read(self, len, buffer)
1138 return self._sslobj.read(len, buffer)
1139 else:
-> 1140 return self._sslobj.read(len)
1141 except SSLError as x:
1142 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs:
KeyboardInterrupt:
The -
(unary minus) operator can be used to invert the direction of the
property used in the fingerprint:
match statements whose subject is “the continent of Brazil”
next(kb.filter(subject=-(wd.continent(wd.Brazil))))
And-ing and or-ing fingeprints
Entity ids and fingerpints can be combined using the operators &
(and) and
|
(or).
For example:
match three statements such that:
subject is “Brazil” or “Argentina”
property is “continent” or “highest point”
it = kb.filter(
subject=wd.Brazil | wd.Argentina,
property=wd.continent | wd.highest_point,
limit=3)
for stmt in it:
display(stmt)
match three statements such that:
subject “has continent South America” and “official language is Portuguese”
value “is a river” or “is a mountain”
it = kb.filter(
subject=wd.continent(wd.South_America) & wd.official_language(wd.Portuguese),
value=wd.instance_of(wd.river) | wd.instance_of(wd.mountain),
limit=3)
for stmt in it:
display(stmt)
match three statements such that:
subject “is a female” and (“was born in NYC” or “was born in Rio”)
property is “field of work” or “is equivalent to Schema.org’s ‘hasOccupation’”
it = kb.filter(
subject=wd.sex_or_gender(wd.female)\
& (wd.place_of_birth(wd.New_York_City) | wd.place_of_birth(wd.Rio_de_Janeiro)),
property=wd.field_of_work\
| wd.equivalent_property(IRI('https://schema.org/hasOccupation')),
limit=3)
for stmt in it:
display(stmt)
Count and contains
A variant of the filter call is kb.count(...)
which, instead of
statements, counts the number of statements matching restrictions ...
:
kb.count(subject=wd.Brazil, property=wd.population | wd.official_language)
The kb.contains()
call tests whether a given statement occurs in kb
.
stmt1 = wd.official_language(wd.Brazil, wd.Portuguese)
kb.contains(stmt1)
stmt2 = wd.official_language(wd.Brazil, wd.Spanish)
kb.contains(stmt2)
Final remarks
This concludes the quickstart guide.
There are many other calls in the Store API of KIF. For more information see, the API Reference.