PAW¶
IBM PAIRS RESTful API wrapper: A Python module to access PAIRS’s core API to load data into Python compatible data formats.
Copyright 2019-2021 Physical Analytics, IBM Research All Rights Reserved.
SPDX-License-Identifier: BSD-3-Clause
- class paw.MockSubmitResponse(queryID, status_code=200)¶
Helper class for mocking a PAIRS query submit response.
It is useful to simulate a query submission when reloading a previously submitted query based on a given PAIRS query ID.
- class paw.PAIRSQuery(query, pairsHost=None, auth=None, port=None, overwriteExisting=True, deleteDownload=False, downloadDir='./downloads', baseURI=None, verifySSL=True, vectorFormat=None, inMemory=False, guiURL=None, publish2GUI=None, guiPassword=None, authType='password')¶
Representation of a PAIRS query.
- Parameters:
query (dict or str) – dictionary equivalent to PAIRS JSON load that defines a query or path that references a ZIP file identified with a PAIRS query or ID of existing (submitted) query
pairsHost (str) – base URL + scheme of PAIRS host to connect to, e.g. ‘https://pairs.res.ibm.com’ note: the initialization tries its best to autodetect even the baseURI and port if contained in pairsHost already
auth ((str, str) or authentication.OAuth2) – user name and password as tuple for access to pairsHost
port (str) – port to use for pairsHost
overwriteExisting (bool) – destroy locally cached data, if existing, otherwise grab the latest locally cached data, latest is defined by alphanumerical ordering of the PAIRS query ID note: ignored in case of a file path (string) is provided as query
deleteDownload (bool) – destroy downloaded data with destruction of class instance
downloadDir (str) – directory where to store downloaded data note: ignored if the query is a string representing the PAIRS query ZIP directory
baseURI (str) – PAIRS API base URI to append to the base URL (cf. pairsHost)
verifySSL (bool) – if SSL connections get verified
vectorFormat (str) – data format of the vector data
inMemory (bool) – triggers storing files directly in memory note: ignored if query is loaded from existing ZIP file
guiURL (str) – URL of PAIRS GUI to be used for publishing query result (if any)
publish2GUI (bool) – determines whether or not the query result is automatically published to the PAIRS GUI
guiPassword (str) – password to be used when PAIRS GUI password is different from PAIRS API password, note: the user is the same as for the PAIRS API (typically the user’s e-mail address)
authType – ‘password’ or ‘api-key’
- Raises:
Exception – if an invalid URL was specified if the query defintion is not understood if a manually set PAIRS query ZIP directory does not exist
- create_layer(fileName, layerMeta, defaultExtension='')¶
Load layer data such as raster or vector data.
- Parameters:
fileName (str) – the key to identify a data layer, associated with the corresponding file’s name
layerMeta (dict) – meta information of layer to load
defaultExtension (str) – sets default extension for data layer types not specified
- Raises:
Exception – if layer data cannot be loaded from query ZIP file
- create_layers(defaultExtension='')¶
From PAIRS query ZIP file generate Python data structures for layers in memory.
- Parameters:
defaultExtension (str) – sets default extension for data layer types not specified
- download(cosInfoJSON=None, cosPollIntSec=None, cosTimeout=-1, printStatus=False)¶
Get the data previously queried and save the ZIP file.
- Parameters:
cosInfoJSON (dict) –
IBM PAIRS with Cloud Object Storage bucket information like ```JSON {
”provider”: “ibm”, “endpoint”: “https://s3.us.cloud-object-storage.appdomain.cloud”, “bucket”: “<your bucket name>”, “token”: “<your secret token for bucket>”
if set, the query result is published in the cloud and not stored locally on your machine. It is a useful feature in combination with IBM Watson Studio notebooks
cosPollIntSec (float) – seconds to idle between polls to IBM COS
printStatus (bool) – triggers printing the poll status information
cosTimeout (int) – maximum (positive) time in seconds allowed to poll till finished, the default is infinitely polling
- classmethod from_query_result_dir(queryDir, pairsHost=None, queryID=None, baseURI=None)¶
Generates a PAIRS query object from a native PAIRS query directory.
Note: Used for PAIRS query Jupyter notebook service.
- Parameters:
queryDir (str) – query directory from which to load raster and vector data
pairsHost (str) – PAIRS host to be used for PAIRS API calls
queryID (str) – PAIRS query ID associated with the data folder queryDir
baseURI (str) – base URI to use for PAIRS API calls
- Returns:
PAIRS API wrapper query object
- Return type:
api_wrapper.PAIRSQuery
- get_query_JSON(queryID, reloadData=False)¶
Obtain JSON load that defines a PAIRS query assigned a given ID.
- Parameters:
queryID (str) – PAIRS query ID for which to obtain the defining query JSON load
reloadData (bool) – triggers usage of already retrieved/cached data
- Returns:
PAIRS query defining JSON load
- Return type:
dict
- Raises:
Exception – if the data cannot be obtained from PAIRS through an API call if cached data do not contain valid JSON load information
- get_query_dir_name()¶
Compute query directory name by setting self.queryDir.
- Raises:
Exception – if required information is missing if the query hash cannot be constructed
- get_vector_polygon_table(includeGeometry=False)¶
For vector data obtain polygon geometry information from PAIRS.
- Parameters:
includeGeometry (bool) – triggers whether to include the geometrys of the polygons or not
- Raises:
Exception – if there is no polygon geometry specified, if it fails to retrieve vector geometries or info from PAIRS
- list_layers(defaultExtension='', refresh=False)¶
Get general metadata information for data of the query.
- Parameters:
defaultExtension (str) – sets default extension for data layer types not specified
refresh (bool) – triggers the reload of the meta data from scratch
- Raises:
Exception – if no PAIRS meta data can be found to list layer information if there is an issue reading the meta data information
- poll(passNonSubmitted=False)¶
Polls the status a single time and updating self.queryStatus.
- Parameters:
passNonSubmitted (bool) – allow the method to pass although no query has been submitted (this is used for locally cached data)
- Raises:
Exception – if query submit response is unsuccessful, if no query or query ID is defined, if the provided credentials are incorrect
- poll_till_finished(pollIntSec=None, printStatus=False, timeout=-1)¶
Polls the status until not running anymore. If successful the result is published in the PAIRS GUI (given the user specified this on query generation).
- Parameters:
pollIntSec (float) – seconds to idle between polls
printStatus (bool) – triggers printing the poll status information
timeout (int) – maximum (positive) time in seconds allowed to poll till finished, the default is infinitely polling
- Raises:
Exception – if query submit response is unsuccessful or not existing, if no query or query ID is defined if a user set timeout has been reached
- print_data_acknowledgement()¶
Simply print out data acknowledgement statement.
- query_pairs_polygon(polyID)¶
Uses PAIRS API to obtain the polygon that corresponds to a given AoI ID.
- Parameters:
polyID (int) – PAIRS AoI ID to query data for
- Returns:
shapely.geometry.shape of the polygon associated with polyID if there is an error on retrieval, None is returned
- query_pairs_polygon_meta(polyID)¶
Uses PAIRS API to obtain polygon meta-information that corresponds to a given AoI ID.
- Parameters:
polyID (int) – PAIRS AoI ID to query data for
- Returns:
dict of polygon meta-data associated with polyID if there is an error on retrieval, None is returned
- read_data_acknowledgement()¶
Extracts data acknowledge statement from PAIRS query result ZIP file.
- Raises:
Exception – if no acknowledgement is found
- set_geometry_id_column(regionName)¶
Set geometry column for vector data.
- Parameters:
regionName (str) – pandas dataframe column name of data with region information
- set_lat_lon_columns(latColName, lonColName, geomColName)¶
Set latitude and longitude columns in order to generate a GeoPandas dataframe.
- Parameters:
latColName (str) – self.vdf Pandas data frame column name for latitude coordinate
lonColName (str) – self.vdf Pandas data frame column name for longitude coordinate
geomColName (str) – self.vdf GeoPandas data frame column name for point geometry
- Raises:
Exception – if it fails to generate a GeoPandas dataframe
- set_timestamp_column(timeName)¶
Set timestamp column for vector data and try to convert it to datetime objects.
- Parameters:
timeName (str) – pandas dataframe column name of data with timestamps
- Raises:
Exception – if it fails to convert timestamps
- split_property_string_column()¶
Split the property string into multiple pandas vector dataframe columns.
note: Applies with CSV vector data import only.
- Raises:
Exception – if existing columns clash with the generation of property columns produced here
- submit()¶
Submit query to PAIRS (if defined).
- Raises:
Exception – if no query is defined if no local cache is available which is requested to use if no PAIRS query ID can be identified from the return of the PAIRS server
- class paw.PAIRSTimeSeries(querySpecs)¶
Assemble time series data from PAIRS data layers.
- Parameters:
querySpecs (dict) – defines layers and spatio-temporal intervals to pull from PAIRS, the JSON schema is defined by PAIRSTimeSeries.QUERY_INPUT_JSON_SCHEMA
- Raises:
jsonschema.exceptions.ValidationError – in case the input querySpecs does not conform to the required format
- get_dataframe(pairsBaseURL=None, verifySSL=True, auth=None, spatioTemporalIndex=False, authType='password')¶
Function to query point data from PAIRS.
- Parameters:
pairsBaseURL (str) – PAIRS base URL to be used for API endpoint, example: https://pairs.res.ibm.com:443
verifySSL (bool) – if SSL connections get verified
auth ((str, str)) – PAIRS API credentials in (user, password) format
spatioTemporalIndex – whether or not to spatio-temorally index the Pandas dataframe to be returned, note: temporal index comes last for time series optimization
:type spatioTemporalIndex bool :param authType: ‘password’ or ‘api-key’ :type port: str :returns: table with PAIRS data :rtype: pandas.DataFrame :raises urllib3.exceptions.MaxRetryError: in case PAIRS is unreachable :raises requests.HTTPError: in case the PAIRS HTTP response code is not 200
- paw.get_pairs_api_password(server, user, passFile=None)¶
Tries to obtain the PAIRS API password for a given user on a given server.
- Parameters:
server (str) – PAIRS API server name, e.g. ‘pairs.res.ibm.com’
user (str) – user name for which to obtain the corresponding password
passFile (str) – path to file with password, it is expected to have the format <server>:<user>:<password>, colons in passwords need to be escaped
- Returns:
corresponding password if available, None otherwise note: if either user or server is None, no password searched for, and None is returned
- Return type:
str
- Raises:
Exception – if password file does not exist if password was not found
- paw.load_environment_variables()¶
Some settings of this module can be set by environment variables. This function loads them.
In particular, server credentials and connection details are set via
paw.ENVIRONMENT_VARIABLES
prefixed bypaw.PAW_ENV_BASE_NAME+'_'
, e.g. by starting your Python shell with:PAW_PAIRS_DEFAULT_USER='<your PAIRS user name>' PAW_PAIRS_DEFAULT_BASE_URI python