Quick Start - Point Query

Your First Query

The most basic Geospatial APIs query is the point query, we are going to get you started with the Geospatial APIs SDK by using it to do a point query:

[1]:
import os
import pandas as pd
import ibmpairs.authentication as authentication
import ibmpairs.client as client
import ibmpairs.query as query

# It is best practice not to include secrets in source code so
# we read an api key, tenant id and org id from operating system
# environment variables.
EI_API_KEY   = os.environ.get('EI_API_KEY')
EI_TENANT_ID = os.environ.get('EI_TENANT_ID')
EI_ORG_ID    = os.environ.get('EI_ORG_ID')

# Authenticate and get a client object.
ei_client = client.get_client(api_key   = EI_API_KEY,
                               tenant_id = EI_TENANT_ID,
                               org_id    = EI_ORG_ID)

# The Geospatial APIs query expressed as a JSON structure
query_json = {
      "layers" : [
          {"type" : "raster", "id" : "49464"}
      ],
      "spatial" : {
          "type" : "point",
          "coordinates" : ["50.92163290389907", "-1.4837586747526244"]
      },
      "temporal" : {"intervals" : [
          {"start" : "2023-01-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}
      ]}
  }

# Submit the query
query_result = query.submit(query_json)

# Convert the results to a dataframe
point_df = query_result.point_data_as_dataframe()
# Convert the timestamp to a human readable format
point_df['datetime'] = pd.to_datetime(point_df['timestamp'] * 1e6, errors = 'coerce')
point_df
2023-08-15 09:11:40 - paw - INFO - The client authentication method is assumed to be OAuth2.
2023-08-15 09:11:40 - paw - INFO - Legacy Environment is False
2023-08-15 09:11:43 - paw - INFO - Authentication success.
2023-08-15 09:11:43 - paw - INFO - HOST: https://api.ibm.com/geospatial/run/na/core/v3
2023-08-15 09:11:43 - paw - INFO - TASK: submit STARTING.
2023-08-15 09:11:58 - paw - INFO - TASK: submit COMPLETED.
[1]:
layer_id layer_name dataset timestamp longitude latitude value datetime
0 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1672531200000 -1.483759 50.921633 0.0464 2023-01-01
1 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1672790400000 -1.483759 50.921633 0.03069999999999995 2023-01-04
2 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1672963200000 -1.483759 50.921633 0.16830000000000012 2023-01-06
3 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1673222400000 -1.483759 50.921633 0.0746 2023-01-09
4 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1673395200000 -1.483759 50.921633 0.34330000000000016 2023-01-11
... ... ... ... ... ... ... ... ...
65 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1687046400000 -1.483759 50.921633 0.02200000000000002 2023-06-18
66 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1687219200000 -1.483759 50.921633 0.04310000000000014 2023-06-20
67 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1687651200000 -1.483759 50.921633 0.5886 2023-06-25
68 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1687910400000 -1.483759 50.921633 0.4386000000000001 2023-06-28
69 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) 1688083200000 -1.483759 50.921633 -0.01739999999999997 2023-06-30

70 rows × 8 columns

The above query requests NDVI values from Geospatial APIs layer 49464, the High res imagery (ESA Sentinel 2) dataset, for a location somewhere in Southampton, UK – the coordinates 50.92/-1.48 (latitude/longitude).

Geospatial APIs returns about 70 rows of data, which are now stored in the point_df dataframe.

Point queries such as the above are unique in that they instantly return a response. This makes them particularly suited to testing as well as exploration and experimentation. If unsure about the data you are interested in- its spatial coverage frequency, or temporal extent- start with a point query. Having said that, note that some advanced features – most notably user defined functions are not available for point queries.

Time intervals such as:

{"start" : "2023-01-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}

are defined as follows: The start time is included, the end time is included. In other words, the interval is open at the beginning and open at the end: 2023-01-01T00:00:00Z <= t <= 2023-06-30T00:00:00Z.

Understanding the Example

We start with various import statements:

import os                                        # used to read environment variables
import ibmpairs.authentication as authentication # deals with Geospatial APIs authentication
import ibmpairs.client as client                 # represents an authenticated HTTP client
import ibmpairs.query as query                   # manages the submission of queries and retrieval of results

After the imports we create a client object and use an API_KEY, TENANT_ID (or CLIENT_ID) and an ORG_ID to create an authenticated HTTP client.

ei_client = client.get_client(api_key   = EI_API_KEY,
                              tenant_id = EI_TENANT_ID,
                              org_id    = EI_ORG_ID)

This is a required step before you start doing queries but you only need to do it once.

The most interesting part of the above example is the definition of the actual query JSON that we send to Geospatial APIs.

query_json = {
    "layers" : [
      {"type" : "raster", "id" : "49464"}                                  # What - the data layer
    ],
    "spatial" : {"type" : "point", "coordinates" : ["50.92163290389907", "-1.4837586747526244"]},     # Where - the spatial location
    "temporal" : {"intervals" : [
      {"start" : "2023-01-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}   # When - the temporal range
    ]}
  }

In general, the query_json object answers the following questions: what?, where? and when?. What we are requesting is specified by the value associated to layers. Here, we are requesting a single raster layer with ID 49464. This is the NDVI layer in the High res imagery (ESA Sentinel 2) dataset. Next we define the spatial coverage of the query with the spatial key. In the above, we only request data for a single point in the format [latitude, longitude]. Note that longitudes in Geospatial APIs range from -180 to +180 degrees. Using values larger than +180 will lead to error messages. Similarly, latitudes range of course from -90 to +90 degrees. Finally we define a single time range via the temporal field.

Subsequently we submit the query to Geospatial APIs. As this is a point query, the result is returned directly from the submit method call:

query_result = query.submit(query_json)

Note that we don’t explicitly need to tell the query object to use the authenticated client we created previously as it finds it automatically.

Geospatial APIs returns the result of a point query as JSON data. We use a helper method to turn this data into a local data frame:

point_df = query_result.point_data_as_dataframe()

From this point on all the data is in a local data frame and we can operate on it as we would any other data frame.

A Not So Minimal Working Example

The largest part of this documentation will be concerned with extensions to the query_json object. Once again let’s just jump into a working example:

The layer IDs used here can be found using the catalogue sub-module.

[2]:
query_json = {
    "layers" : [
        {
            "type" : "raster", "id" : "91",
            "temporal" : {"intervals" : [
                {"start" : "2022-12-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}
            ]},
            "aggregation" : "Mean"
        },
        {
            "type" : "raster", "id" : "49464",
            "temporal" : {"intervals" : [
                {"start" : "2023-01-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}
            ]},
            "aggregation" : "Max"
        }
    ],
    "spatial" : {"type" : "point",  "coordinates" : ["40.7128", "-74.006", "37.7749", "-122.4194"]},
    "temporal" : {"intervals" : [
        {"start" : "2023-01-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}
    ]}
  }

query_result = query.submit(query_json)

point_df = query_result.point_data_as_dataframe()

point_df
2023-08-15 09:11:58 - paw - INFO - TASK: submit STARTING.
2023-08-15 09:13:53 - paw - INFO - TASK: submit COMPLETED.
[2]:
layer_id layer_name dataset longitude latitude value aggregation alias
0 49464 Normalized difference vegetation index High res imagery (ESA Sentinel 2) -122.4194 37.7749 0.08440000000000003 Max 49464.1672531200000>1688083200000
1 91 Daily precipitation Daily US weather (PRISM) -122.4194 37.7749 4.467446735076344 Mean 91.1669852800000>1688083200000
2 91 Daily precipitation Daily US weather (PRISM) -74.0060 40.7128 3.035630824596353 Mean 91.1669852800000>1688083200000

There is quite a lot going on in the above example. To begin, we are requesting data from two different layers:

91 - Daily precipitation from the Daily US weather (PRISM) dataset

{
    "type" : "raster", "id" : "91",
    "temporal" : {"intervals" : [
        {"start" : "2022-12-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}
    ]},
    "aggregation" : "Mean"
}

49464 - NDVI from the ESA Sentinel 2 l2a dataset

{
    "type" : "raster", "id" : "49464",
    "temporal" : {"intervals" : [
        {"start" : "2023-01-01T00:00:00Z", "end" : "2023-06-30T00:00:00Z"}
    ]},
    "aggregation" : "Max"
},

For each of these we use a different temporal range and we are aggregating the first two over their respective time ranges. Mean in the case of 91 and Max in the case of 49464. A layer can appear multiple times, for example, once with Mean aggregation, once with Sum aggregation and once without and the results will reflect the three different requests. The possible aggregation functions for temporal aggregations supported at this stage are Mean, Max, Min and Sum.

The spatial specification describes two points using an array:

"spatial" : {"type" : "point",  "coordinates" : ["40.7128", "-74.006", "37.7749", "-122.4194"]},

The format is [lat-point-1, long-point1, lat-point2, long-point2]. You will see in the results that data is returned for each layer, for each timestamp (or once for an aggregation) and for each point.

The temporal section appearing at the end the above query – outside the layers block – gives a default time range that is used if a an element of the layers block comes without a time range. In the above example it is redundant. However, the current implementation requires its presence even if the information is not used.