PAW

IBM PAIRS RESTful API wrapper: A Python module to access PAIRS’s core API to load data into Python compatible data formats.

Copyright 2019-2021 Physical Analytics, IBM Research All Rights Reserved.

SPDX-License-Identifier: BSD-3-Clause

class paw.MockSubmitResponse(queryID, status_code=200)

Helper class for mocking a PAIRS query submit response.

It is useful to simulate a query submission when reloading a previously submitted query based on a given PAIRS query ID.

class paw.PAIRSQuery(query, pairsHost=None, auth=None, port=None, overwriteExisting=True, deleteDownload=False, downloadDir='./downloads', baseURI=None, verifySSL=True, vectorFormat=None, inMemory=False, guiURL=None, publish2GUI=None, guiPassword=None, authType='password')

Representation of a PAIRS query.

Parameters:
  • query (dict or str) – dictionary equivalent to PAIRS JSON load that defines a query or path that references a ZIP file identified with a PAIRS query or ID of existing (submitted) query

  • pairsHost (str) – base URL + scheme of PAIRS host to connect to, e.g. ‘https://pairs.res.ibm.comnote: the initialization tries its best to autodetect even the baseURI and port if contained in pairsHost already

  • auth ((str, str) or authentication.OAuth2) – user name and password as tuple for access to pairsHost

  • port (str) – port to use for pairsHost

  • overwriteExisting (bool) – destroy locally cached data, if existing, otherwise grab the latest locally cached data, latest is defined by alphanumerical ordering of the PAIRS query ID note: ignored in case of a file path (string) is provided as query

  • deleteDownload (bool) – destroy downloaded data with destruction of class instance

  • downloadDir (str) – directory where to store downloaded data note: ignored if the query is a string representing the PAIRS query ZIP directory

  • baseURI (str) – PAIRS API base URI to append to the base URL (cf. pairsHost)

  • verifySSL (bool) – if SSL connections get verified

  • vectorFormat (str) – data format of the vector data

  • inMemory (bool) – triggers storing files directly in memory note: ignored if query is loaded from existing ZIP file

  • guiURL (str) – URL of PAIRS GUI to be used for publishing query result (if any)

  • publish2GUI (bool) – determines whether or not the query result is automatically published to the PAIRS GUI

  • guiPassword (str) – password to be used when PAIRS GUI password is different from PAIRS API password, note: the user is the same as for the PAIRS API (typically the user’s e-mail address)

  • authType – ‘password’ or ‘api-key’

Raises:

Exception – if an invalid URL was specified if the query defintion is not understood if a manually set PAIRS query ZIP directory does not exist

create_layer(fileName, layerMeta, defaultExtension='')

Load layer data such as raster or vector data.

Parameters:
  • fileName (str) – the key to identify a data layer, associated with the corresponding file’s name

  • layerMeta (dict) – meta information of layer to load

  • defaultExtension (str) – sets default extension for data layer types not specified

Raises:

Exception – if layer data cannot be loaded from query ZIP file

create_layers(defaultExtension='')

From PAIRS query ZIP file generate Python data structures for layers in memory.

Parameters:

defaultExtension (str) – sets default extension for data layer types not specified

download(cosInfoJSON=None, cosPollIntSec=None, cosTimeout=-1, printStatus=False)

Get the data previously queried and save the ZIP file.

Parameters:
  • cosInfoJSON (dict) –

    IBM PAIRS with Cloud Object Storage bucket information like ```JSON {

    ”provider”: “ibm”, “endpoint”: “https://s3.us.cloud-object-storage.appdomain.cloud”, “bucket”: “<your bucket name>”, “token”: “<your secret token for bucket>”

    if set, the query result is published in the cloud and not stored locally on your machine. It is a useful feature in combination with IBM Watson Studio notebooks

  • cosPollIntSec (float) – seconds to idle between polls to IBM COS

  • printStatus (bool) – triggers printing the poll status information

  • cosTimeout (int) – maximum (positive) time in seconds allowed to poll till finished, the default is infinitely polling

classmethod from_query_result_dir(queryDir, pairsHost=None, queryID=None, baseURI=None)

Generates a PAIRS query object from a native PAIRS query directory.

Note: Used for PAIRS query Jupyter notebook service.

Parameters:
  • queryDir (str) – query directory from which to load raster and vector data

  • pairsHost (str) – PAIRS host to be used for PAIRS API calls

  • queryID (str) – PAIRS query ID associated with the data folder queryDir

  • baseURI (str) – base URI to use for PAIRS API calls

Returns:

PAIRS API wrapper query object

Return type:

api_wrapper.PAIRSQuery

get_query_JSON(queryID, reloadData=False)

Obtain JSON load that defines a PAIRS query assigned a given ID.

Parameters:
  • queryID (str) – PAIRS query ID for which to obtain the defining query JSON load

  • reloadData (bool) – triggers usage of already retrieved/cached data

Returns:

PAIRS query defining JSON load

Return type:

dict

Raises:

Exception – if the data cannot be obtained from PAIRS through an API call if cached data do not contain valid JSON load information

get_query_dir_name()

Compute query directory name by setting self.queryDir.

Raises:

Exception – if required information is missing if the query hash cannot be constructed

get_vector_polygon_table(includeGeometry=False)

For vector data obtain polygon geometry information from PAIRS.

Parameters:

includeGeometry (bool) – triggers whether to include the geometrys of the polygons or not

Raises:

Exception – if there is no polygon geometry specified, if it fails to retrieve vector geometries or info from PAIRS

list_layers(defaultExtension='', refresh=False)

Get general metadata information for data of the query.

Parameters:
  • defaultExtension (str) – sets default extension for data layer types not specified

  • refresh (bool) – triggers the reload of the meta data from scratch

Raises:

Exception – if no PAIRS meta data can be found to list layer information if there is an issue reading the meta data information

poll(passNonSubmitted=False)

Polls the status a single time and updating self.queryStatus.

Parameters:

passNonSubmitted (bool) – allow the method to pass although no query has been submitted (this is used for locally cached data)

Raises:

Exception – if query submit response is unsuccessful, if no query or query ID is defined, if the provided credentials are incorrect

poll_till_finished(pollIntSec=None, printStatus=False, timeout=-1)

Polls the status until not running anymore. If successful the result is published in the PAIRS GUI (given the user specified this on query generation).

Parameters:
  • pollIntSec (float) – seconds to idle between polls

  • printStatus (bool) – triggers printing the poll status information

  • timeout (int) – maximum (positive) time in seconds allowed to poll till finished, the default is infinitely polling

Raises:

Exception – if query submit response is unsuccessful or not existing, if no query or query ID is defined if a user set timeout has been reached

print_data_acknowledgement()

Simply print out data acknowledgement statement.

query_pairs_polygon(polyID)

Uses PAIRS API to obtain the polygon that corresponds to a given AoI ID.

Parameters:

polyID (int) – PAIRS AoI ID to query data for

Returns:

shapely.geometry.shape of the polygon associated with polyID if there is an error on retrieval, None is returned

query_pairs_polygon_meta(polyID)

Uses PAIRS API to obtain polygon meta-information that corresponds to a given AoI ID.

Parameters:

polyID (int) – PAIRS AoI ID to query data for

Returns:

dict of polygon meta-data associated with polyID if there is an error on retrieval, None is returned

read_data_acknowledgement()

Extracts data acknowledge statement from PAIRS query result ZIP file.

Raises:

Exception – if no acknowledgement is found

set_geometry_id_column(regionName)

Set geometry column for vector data.

Parameters:

regionName (str) – pandas dataframe column name of data with region information

set_lat_lon_columns(latColName, lonColName, geomColName)

Set latitude and longitude columns in order to generate a GeoPandas dataframe.

Parameters:
  • latColName (str) – self.vdf Pandas data frame column name for latitude coordinate

  • lonColName (str) – self.vdf Pandas data frame column name for longitude coordinate

  • geomColName (str) – self.vdf GeoPandas data frame column name for point geometry

Raises:

Exception – if it fails to generate a GeoPandas dataframe

set_timestamp_column(timeName)

Set timestamp column for vector data and try to convert it to datetime objects.

Parameters:

timeName (str) – pandas dataframe column name of data with timestamps

Raises:

Exception – if it fails to convert timestamps

split_property_string_column()

Split the property string into multiple pandas vector dataframe columns.

note: Applies with CSV vector data import only.

Raises:

Exception – if existing columns clash with the generation of property columns produced here

submit()

Submit query to PAIRS (if defined).

Raises:

Exception – if no query is defined if no local cache is available which is requested to use if no PAIRS query ID can be identified from the return of the PAIRS server

class paw.PAIRSTimeSeries(querySpecs)

Assemble time series data from PAIRS data layers.

Parameters:

querySpecs (dict) – defines layers and spatio-temporal intervals to pull from PAIRS, the JSON schema is defined by PAIRSTimeSeries.QUERY_INPUT_JSON_SCHEMA

Raises:

jsonschema.exceptions.ValidationError – in case the input querySpecs does not conform to the required format

get_dataframe(pairsBaseURL=None, verifySSL=True, auth=None, spatioTemporalIndex=False, authType='password')

Function to query point data from PAIRS.

Parameters:
  • pairsBaseURL (str) – PAIRS base URL to be used for API endpoint, example: https://pairs.res.ibm.com:443

  • verifySSL (bool) – if SSL connections get verified

  • auth ((str, str)) – PAIRS API credentials in (user, password) format

  • spatioTemporalIndex – whether or not to spatio-temorally index the Pandas dataframe to be returned, note: temporal index comes last for time series optimization

:type spatioTemporalIndex bool :param authType: ‘password’ or ‘api-key’ :type port: str :returns: table with PAIRS data :rtype: pandas.DataFrame :raises urllib3.exceptions.MaxRetryError: in case PAIRS is unreachable :raises requests.HTTPError: in case the PAIRS HTTP response code is not 200

paw.get_pairs_api_password(server, user, passFile=None)

Tries to obtain the PAIRS API password for a given user on a given server.

Parameters:
  • server (str) – PAIRS API server name, e.g. ‘pairs.res.ibm.com’

  • user (str) – user name for which to obtain the corresponding password

  • passFile (str) – path to file with password, it is expected to have the format <server>:<user>:<password>, colons in passwords need to be escaped

Returns:

corresponding password if available, None otherwise note: if either user or server is None, no password searched for, and None is returned

Return type:

str

Raises:

Exception – if password file does not exist if password was not found

paw.load_environment_variables()

Some settings of this module can be set by environment variables. This function loads them.

In particular, server credentials and connection details are set via paw.ENVIRONMENT_VARIABLES prefixed by paw.PAW_ENV_BASE_NAME+'_', e.g. by starting your Python shell with:

PAW_PAIRS_DEFAULT_USER='<your PAIRS user name>' PAW_PAIRS_DEFAULT_BASE_URI python