1 PAIRS User’s Manual & API Documentation 2 PAIRS User’s Manual Table of Contents 1. Introduction .......................................................................................................................................... 4 2. PAIRS access .......................................................................................................................................... 5 1) 3. Web address & Log in ....................................................................................................................... 5 PAIRS web menu ................................................................................................................................... 5 1) Query................................................................................................................................................. 5 3.1.1. Submit new ........................................................................................................................... 8 3.1.2. Area of interest ................................................................................................................... 11 2) JOBS................................................................................................................................................. 12 3) Metadata......................................................................................................................................... 13 3.3.1. Dataset ................................................................................................................................ 13 3.3.2. Data layers .......................................................................................................................... 14 3.3.3. Data regions ........................................................................................................................ 15 4) Administration ................................................................................................................................ 16 5) Logout ............................................................................................................................................. 17 4. Join Datasets and Filter with Conditions............................................................................................. 17 1) Temporal data retrieval from PAIRS ............................................................................................... 17 2) Data filtering and joining principles ................................................................................................ 18 3) Add condition (filtering and joining) ............................................................................................... 20 4) Aggregate ........................................................................................................................................ 21 5. Examples of web query ....................................................................................................................... 21 6. PAIRS API ............................................................................................................................................. 21 1) Dataset (/ws/datasets).................................................................................................................... 22 3 2) Datalayer (/ws/datalayers) ............................................................................................................. 22 3) Submitting a Query (/ws/query/submit) ........................................................................................ 23 6.3.1. Spatial Coverage.................................................................................................................. 23 6.3.2. Temporal Coverage ............................................................................................................. 24 6.3.3. Data Coverage ..................................................................................................................... 25 6.3.4. Filtering ............................................................................................................................... 25 4) Query Job (/ws/queryjobs) ............................................................................................................. 26 5) Area of Interest (/ws/queryaois) .................................................................................................... 27 6) Query Examples .............................................................................................................................. 27 1. 6.6.1. A single point query on DataLayer 111 ............................................................................... 27 6.6.2. Rectangular area query ....................................................................................................... 28 6.6.3. Examples with GFS and MODIS ........................................................................................... 29 6.6.4. Using wget........................................................................................................................... 29 Appendix A. Details of Datasets .......................................................................................................... 31 1) High resolution satellite biweekly Landsat 8 .................................................................................. 32 2) High resolution satellite biweekly Landsat 8 (SR) ........................................................................... 32 3) Medium resolution satellite daily: Aqua (13), Aqua (09 SR), Terra (13), Terra (09 SR) .................. 34 4) Prism Climate Data.......................................................................................................................... 34 5) USA weather forecast ..................................................................................................................... 35 6) California weather condition measurements ................................................................................. 35 7) Global weather forecast.................................................................................................................. 36 8) ECMWF (European Center for Medium-Range Weather Forecasting) ........................................... 37 9) Historical crop planting map ........................................................................................................... 38 10) Elevation ..................................................................................................................................... 40 11) Soil properties ............................................................................................................................. 40 4 2. 12) Reference Evapotranspiration .................................................................................................... 42 13) SMT – IBM’s cognitive forecast in USA ....................................................................................... 42 14) SMT-IBM’s Long Term Forecast Globally .................................................................................... 43 Acknowledgement .............................................................................................................................. 44 1. Introduction Physical Analytics Integrated Data Repository and Services (PAIRS) is a big data analytics platform coupled with a massive store of aligned pre-processed geo-spatial data for macroscopic analytics. The spatial pre-alignment of disparate data layers is the key differentiator of PAIRS, drastically accelerating analytics workflows by minimizing data discovery and processing for large scale analytics. Complex multilayer queries can be achieved orders of magnitudes faster than through conventional data services. PAIRS is based on open source Hadoop/ HBase distributed data technology, and the PAIRS API uses a RESTFul Web Service implementation. PAIRS accepts spatial queries in the form of physical boundaries (polygons, rectangles), or single points, combined with temporal queries in the form of time intervals or single dates. Spatial queries can also be based on physical characteristics of reference data layers (e.g., satellite, weather, soil type, etc.), where only those areas are returned where the reference data layer displays certain values. These cross-layer type of queries can be extremely powerful, allowing users to leverage the big-data platform while only downloading data from areas that they are ultimately interested in. Current datasets include global satellite imagery, weather data, topography, census data, soil properties, land use data, and one of a kind analytics datasets which are updated daily. The analytics layers currently available on PAIRS are (1) a significantly improved short term weather forecast (SMT, approximately 30 % better than other publically available forecasts) based on machine learning, (2) a significantly improved long term weather forecast (6 months ahead daily forecast) based on machine learning, (3) global reference evapotranspiration forecasts. The datasets can be accessed through a web interface (section 2-5) or an API (section 6). Queried data is returned in standard file formats (e.g., geotiff, csv). An introduction video to PAIRS is available here(1): https://www.youtube.com/watch?v=Nxwi6x0ObT0 5 2. PAIRS access 1) Web address & Log in PAIRS web access is provided through(2) http://pairs.mmthub.com (to get the most recent updates, refresh the browser periodically). PAIRS access for IBM internal users is provided through(3) http://pairs.watson.ibm.com:8080/pairs/ Fig 1 shows the log in page. Fig. 1 Log in page A user account and password are required for the use of PAIRS. The signup process is self-explanatory. Once a User account has been granted, the above log-in page appears. Please log in with your account and password, after which the main menu will appear. There are 5 main menu items on the main page: Query, Jobs, Metadata, Administration, and Logout. We will introduce them one by one in the next section. 3. PAIRS web menu 1) Query The Query tab leads to a page where query parameters can be entered. There are two different modes, “Submit new” issues a query, and “Area of Interest” lets users upload their own polygons, after which they are available for new queries. 6 7 8 3.1.1. Submit new “Submit new” query requires 3 types of inputs: Spatial coverage, Temporal coverage, and Data selection, as seen in Fig 2 of the Query - Submit New page. Spatial coverage can be defined as either: “Single Point”, “Polygon”, or “Area”. Temporal coverage can be an “interval” of days or a single “date” of interest. Data selection defines the datasets and parameters of interest for the query. We currently have 4 categories of data available: Satellite, Weather, Survey, and Analytics. Single point is most convenient in retrieving data for a single location or a set of point locations. Fig 2 shows the Single point query interface. A detail tutorial video of Point Query is available here(4): http://pairs.mmthub.com/manual/videos/point_query.webm Please note the Latitude and Longitude take the convention that Latitude has positive values in the Northern Hemisphere and negative values in the Southern Hemisphere, while Longitude takes positive values in the Eastern Hemisphere, and takes negative values in the Western Hemisphere. For example, in USA continent the Latitude will be positive, and the Longitude will be negative. In Australia, the Latitude will be negative, and the Longitude will be Positive. The new interface lets user click on a point of the map to define the latitude and longitude. Fig. 3 shows the Polygon query interface. Polygons are predefined in the “Area of Interest” page under the Query menu. Under Polygon query, the list of available polygons such as “USA-KS” is shown for Kansas State. Currently we only support kml file upload of polygons. Polygons uploaded by the user will be shown under the Personal tab, while polygons shared by administrator and other users will be shown under the Group tab. A rectangular Area query on the map can be defined using the Latitude/Longitude (from SW) – the south west corner location, and the Latitude/Longitude (to NE) – the north east corner location of the rectangular area of interest, as shown in Fig 4. The rectangular area can be defined by clicking and dragging the mouse on the map. 9 Fig. 3 Polygon query 10 Fig. 4 Rectangular Area query Interval as shown in Fig 2 defines the start date and end data of the query temporal range. (from 201507-01 to 2015-07-02 will retrieve data with timestamp >= “2015-07-01 00:00:00” and <= “2015-07-02 00:00:00”. Date is used to pick a single timestamp for the query. 2015-07-01 00:00:00 will retrieve data from the closest available temporal point before 2015-07-01 00:00:00. Datasets are grouped into 4 main categories: Satellite, Weather, Survey, and Analytics. The parameters or bands of a dataset are called data layer. For example, different bands of satellite images are considered layers for the satellite dataset, while different weather parameters are layers for a specific weather model. ECMWF Weather Forecast for example has tens of parameters. To date we have ingested the key weather parameters into PAIRS, including temperature, wind, pressure, precipitation, solar irradiance, etc. The details of the available datasets are listed in the appendix. Once a dataset is chosen, the corresponding available layers will populate the datalayer field. The user can add datalayers to selection by highlighting them and clicking the double right arrow. Multiple Datasets and multiple Layers can be selected for the same query. Click SUBMIT to submit the request to the server and the window will go to the JOBS page automatically. Each request will be logged and saved in the users account. 11 Once the data has been retrieved, you can find it under “JOBS” on the main menu. Small datasets should be available quickly, but large datasets can take some time. You don’t need to wait for the result to show up in the JOBS menu. The results will be there even after you log out and log in again. This is different than a typical web search function. Data retrieval is done in the background. Here are some empirical guidelines for query: - elevation has very high resolution of ~10 meter, so it is recommended to query an area in the level of a county typically around 100 square miles or less - crop planting map and landsat data have a resolution of 30 meter, so it is recommended to query in the level of a medium size state around 1,000 square miles or less - MODIS satellite data resolution is ~250 meter, this dataset can be reliably queried in the medium size state level ~ 250,000 square miles or less - all weather data have relatively coarse resolution, so it can be queried in the country level ~ millions of square miles Basic aggregation can be performed by checking the Min, Max, or Mean fields. This aggregates the data for its Minimum, Maximum, or Mean over the selected temporal period. Additional filtering and join operations can be added using the “Add condition” button as explained in section 4. 3.1.2. Area of interest Areas of interests are predefined regions that can be used to submit polygon queries as shown in Fig 3. They are specified using GIS shape formats. Currently only KML is supported. Fig 5 shows the Area of Interest page, where users can upload their own kml polygon files. 12 Fig. 5 Area of Interest Clicking the Add sign above will bring you to the polygon upload page shown in Fig 6. Fig. 6 Upload polygon file in kml format to define Area of Interest Specify the Key, and Name, then click Browse… to choose a kml file on your computer. In this case we choose KS.kml. You can choose to share your polygon files with other users by checking the box next to “Share with group?”. Once the polygon is uploaded, it will show up under “Submit new” queries, in the Personal Polygon list and also in other people’s Group Polygon list if you choose to share with the group. 2) JOBS Jobs are queries submitted to PAIRS. Once a query is submitted, you will automatically be directed to the JOBS page. The JOBS page shows current jobs that are still in progress with progress percentages, and lists all completed jobs from previous queries. The QueryJobs page is shown in Fig 7 with its download and visualize functions. 13 Fig. 7 JOBS page A completed job will let the user download all the files in geotiff format by clicking the download button next to the job name. Users are encouraged to download the results and carry out further analytics for their study. 3) Metadata Metadata contains overviews of available data in terms of Datasets, Layers, and Regions. This menu item is only available for users with administrative privileges. 3.3.1. Dataset The first item in the drop-down menu of the METADATA menu is Dataset. After clicking the Filter button a complete list of all the datasets currently available is shown (Fig. 8). This list can be narrowed by Filtering on a string entered in the Name field (press Filter again after entering a Name). Clicking the file Size icon in the last row will show the dataset size in MB. 14 Fig. 8 Available datasets under METADATA, shown after pressing the Filter button 3.3.2. Data layers The second item in the METADATA drop-down menu is data layer. As discussed above (3.1.1.), a datalayer represents a parameter or band of a dataset. For a chosen dataset, clicking the Filter button will show all the associated datalayers (Fig 9). Here the layer name will be shown together with a Column Family and Column Qualifier, which can be ignored for now. 15 Fig. 9 Example of available data layers in the case of PRISM data, accessed through METADATA and Data Layers. 3.3.3. Data regions The third item in the METADATA drop-down menu is data region (Fig. 10). A datalayer is associated with a spatial coverage, and a temporal coverage. Data region shows the detailed temporal availability of each region. For example, MODIS satellite data comes in tiles which are defined by horizontal/vertical index. 16 Fig. 10 Data Region view 4) Administration Password change is done through the Administration page as seen in Fig 11. Fig. 11 Password change under Administration tab 17 5) Logout Logout will bring you back to the Logon page. 4. Join Datasets and Filter with Conditions 1) Temporal data retrieval from PAIRS It is very easy to join datasets in PAIRS in both spatial and temporal terms. PAIRS support two types of temporal query: snapshot of a single day (Date) and Interval. • Snapshot (Date): In this mode, PAIRS will return a single snapshot for each of the selected datalayers. Only one timestamp is entered in the query, and for each data layer the closest date at or before the snapshot is returned. Fig. 12 Illustration of snapshot temporal data query In this case, if all 3 datalayers were selected PAIRS would return the 01/01/14 timestamp of A, the 07/01/14 timestamp of B and the 01/01/14 timestamp of C. The timestamp chosen is the closest timestamp before the snapshot. • Interval: in this mode, PAIRS will return all the data that falls between the two timestamps entered in the query. This can be zero, one, or many timestamps for each chosen data layer. 18 Fig. 13 Illustration of interval temporal data query In this query, if all the three datalayers are selected PAIRS would return: the 01/01/14 timestamp of A; the 12/01/13, 01/05/14, 07/01/14, 12/01/14 timestamps of B; and the 01/01/14 timestamp of C. 2) Data filtering and joining principles PAIRS allows different filters to be applied during a query, returning only the filtered data to be used in your analytics, here are some examples: query 1, simple filter on the same layer selected: (filtering is defined by the spatial selection functions mentioned under the query function – only for polygon or rectangular regions, not for single point queries) Fig. 14 Filtering on single data layer on a chosen spatial area Here the filter (Data Layer A EQ 8) was applied on the same layer selected on the data coverage, the result looks like the raster on the right side on Fig 14. 19 query 2, filter applied on a different layer, same resolution Here the layer used in the filter (Data Layer B EQ 4) is not entered in the selected layers. PAIRS will apply the filter to find the spatial coverage and return the data for the selected layers with the filter applied as shown in Fig. 15. Fig. 15 Filtering on different datalayers with the same spatial grid resolution query3. filter applied on a different layer with a different resolution: In this case the filter (Data Layer C EQ 5) was applied on a different layer that also has a different resolution (lower). PAIRS will align all the layers using the higher resolution and return the filtered data, see Fig 16. Fig. 16 Filtering and joining datalayers with different spatial grid resolution 20 3) Add condition (filtering and joining) The filtering and joining functions introduced in 4.2 are realized by using the “Add Condition” button on the bottom of the query page. The Add condition button lets users add additional operations to filter the data for the selected layers in polygon or rectangular spatial queries, as shown in Fig 17. OPERATION defines the operation that can be applied, the options are: • Equals to: value needs to be equal to • Greater than: value needs to be greater than • Less than: value need to be lower than • Between: value needs to be in between two values • Among: value needs to be among the list VALUE is the value that should be applied to the condition. Multiple conditions can be connected with logical operators. Currently there are only two operators available AND and OR. “AND” will only return true if all the conditions are true. “OR” will return true if any of the conditions is true. In cases where multiple conditions are connected with AND and OR, AND takes precedence over OR. For example a filter entered as A OR B AND C, where A, B, and C are conditions, will be executed as A OR (B AND C). Fig. 17 Data filtering & joining 21 4) Aggregate In cases where a temporal aggregate of a data layer is of interest, PAIRS offers the option of downloading only the calculated aggregate rather than the complete set of raw data timestamps. Currently offered aggregation functions include min, max, and mean, and they are chosen by checking the corresponding box. Multiple aggregation functions can be chosen in the same query. Fig. 18 shows such an example. The functions are applied over temporal period. Fig. 18 Aggregate retrieved data 5. Examples of web query Here are a few video demos showing examples of cross join of different datalayers. (5) https://www.youtube.com/watch?v=aDlHsxyRlys (Orange Farms in Florida) (6) https://www.youtube.com/watch?v=Bx_c1pykelQ (Wild Fire Potential) (7) https://www.youtube.com/watch?v=igJcm6uWFcQ (Multiple demos) 6. PAIRS API PAIRS provides an API interface for users to write scripts to perform queries. Use your PAIRS access URL <PAIRS URL>: http://pairs.mmthub.com/ (IBM internal users can use http://pairs.watson.ibm.com:8080/pairs/ instead) as a prefix to /ws/… described below, i.e. <PAIRS 22 URL>/ws/..., e.g. http://pairs.mmthub.com/pairs/ws/datasets/list Returned data will be in JSON format for point queries, and GeoTiff format for area/polygon queries. Table 1 in Appendix A shows the current list of datasets with its id, key, name, level, and status. 1) Dataset (/ws/datasets) A dataset object is defined by the following properties: • id (numeric): unique ID of a dataset • key (text): unique string key • name (text): the dataset name • level (numeric): the PAIRS resolution level associated with the dataset • layers (list<Datalayer>): list of all datalayers of this dataset These are the operations available for datasets: • Get (/ws/datasets/<dataset ID>): returns a full description of the dataset with the ID provided, e.g. /ws/datasets/5 • List (/ws/datasets/list): • Search (/ws/datasets/search): returns a list of all datasets that satisfy the filter – any dataset property can be used to filter this list, e.g. /ws/datasets/search?name=satellite returns a list of all datasets available 2) Datalayer (/ws/datalayers) The PAIRS datalayer represents a layer of data in raster format. A datalayer has a spatial as well as a temporal coverage. These are the properties associated with datalayers: • id (numeric): unique ID of a datalayer • name (text): the datalayer's name • dataset (text): the parent dataset of this datalayer 23 • type (text): the datatype, options available are bt: byte (integer), 1 byte sh: short (integer), 2 bytes in: integer (integer), 4 bytes fl: float (floating point number), 4 bytes db: double (floating point number), 8 bytes These are the operations available for datalayers: • Get (/ws/datalayers/<datalayer ID>): returns a full description of the datalayer with the ID provided, e.g. /ws/datalayers/111 • List (/ws/datalayers/list): returns a list of all datalayers available • Search (/ws/datalayers/search): returns a list of all datalayers that satisfy the filter, e.g. /ws/datalayers/search?name=NDVI 3) Submitting a Query (/ws/query/submit) The PAIRS Web Service interface can be used to submit any kind of query. The URL to submit a query is: /ws/query/submit Currently three types of queries are supported on PAIRS: rectangular, polygon, and point. They have a lot in common, except for the spatial coverage specification. The parameters required to submit any kind of query can be divided into 4 major categories: spatial coverage, temporal coverage, data selection, and filtering conditions. Here are some examples together with the definitions: 6.3.1. Spatial Coverage There are three kinds of Area of Interest (AoI): rectangular area, polygon, and point. Rectangular Area (spatial area/bounding box, type=square): To perform a query on a rectangular region, two coordinates need to be provided (lower left SW and upper right corners NE). The coordinates use latitude and longitude separated by comma (,) with latitude followed by longitude. Here are some examples: /ws/query/submit?type=square&coordinates=38,-122,39,-121&datalayers=26015&intervals=03/31/16 This will query a rectangular region with bounding box from 38N, 122W to 39N, 121W for cloud cover from ECMWF for the date 03/31/2016 24 /ws/query/submit?type=square&coordinates=-40.55,105.2,40,105.6&datalayers=26015&intervals=03/31/16 This will query a rectangular region with bounding box from 40.55S, 105.2E to 40S, 105.6E The response from these queries will be similar to the following: { } "id": "1456870865483_69044", "status": "Running", "start": 1458833191762, "pql": null, "swLat": 38, "swLon": -122, "neLat": 39, "neLon": -121, "exPercent": 0, "flag": false The id field above is the job id you need to download the data or query the status of the query. Polygon (spatial area/polygon): PAIRS supports the submission of queries using a predefined area of interest (AoI). This has to be pre-loaded and associated with the user profile, so it can be used to submit new queries. To list the available AoI associated with your profile, check the section 6.5 of this document (/ws/queryaois/ will list all your AoI and with ID). Once you have the AoI, you can specify it during query submission using the AoI parameter. Here are some examples: /ws/query/submit?type=poly&aoi=22&datalayers=25005&intervals=05/01/16 This will use the AoI with ID 22 (this is central valley California, 25005 is for Rain forecast from CFS) /ws/query/submit?type=poly&aoi=kansas&datalayers=25005&intervals=05/01/16 This will use the AoI with key equal to kansas Point (point location): The third query type supported by PAIRS is the point type. In this case you get the data from all layers selected for all the points. Different from the previous two, this will return the data in a JSON format. Here are some examples: /ws/query/submit?type=point&coordinates=38,-122&datalayers=111&intervals=02/01/15 This will return data for point 38N, 122W (111 is for crop) /ws/query/submit?type=point&coordinates=38,122,-20,-121&datalayers=51&interval=03/31/16 This will return data for multiple points 38N, 122E and 20S, 121W (51 is MODIS_aqua satellite NDVI) 6.3.2. Temporal Coverage PAIRS supports two different types of temporal coverage when it comes to querying: snapshot and interval. 25 Snapshot (date): In this mode, PAIRS will return a snapshot of all the datalayers selected for the given timestamp. Only one timestamp is provided. Here is an example of a snapshot time query: /ws/query/submit?intervals=02/01/15 this returns available data in PAIRS closest to and before/on Feb 1, 2015 Interval (time frame): In this mode, PAIRS will return all versions of the data in between the two timestamps defined. Here is an example of time interval query: /ws/query/submit?intervals=02/01/14,02/01/15 this returns data available within the time frame from Feb 1, 2014 to Feb 1, 2015 6.3.3. Data Coverage Data coverage will use the datalayer ID which can be retrieved from the metadata API of datalayers, section 6.2. Here are two examples: /ws/query/submit?datalayers=10 returns datalayer 10 only /ws/query/submit?datalayers=10, 80, 90 returns datalayers 10, 80 and 90 6.3.4. Filtering PAIRS provides different kinds of filters to be applied during a query. The parameter to specify the filter is filter.pql. Here are some examples: /ws/query/submit?filter.pql=10 EQ 8 simple filter on the same layer selected: layer ID 10 equals to 8 /ws/query/submit?filter.pql=10 GT 8 AND 20 EQ 100 filter applied on a different layer: layer ID 10 greater than 8 and layer 20 equals to 100 The filter can be a combination of multiple expressions connected by a logical operator. Each expression has 3 elements: <LAYER> <OPERATOR> <VALUE>. Here LAYER OPERATION is the ID of the layer that this filter should be applied to defines the operation that should be applied, the options are: EQ (Equals): value needs to be equal to <VALUE> GT (Greater than): value needs to be greater than <VALUE> LT (Lower than): value need to be lower than <VALUE> BT (Between): value needs to be in between two values <VALUE> which are comma separated, e.g. 10 BT 1,4.6 for the values of datalayer with ID 10 in between 1 and 4.6 (boundary values not included!), if the first value is greater than the second one, an error is thrown 26 AM (Among): value needs to be among a comma separated list <VALUE>, e.g. 8 AM 1,5.3,2.12 for values matching 1, 5.3 or 2.12 in layer with ID 8 VALUE is the value(s) that should be applied on the expression Expressions are connected to each other by a logical operator. There are two options available right now: AND OR logical “and” having precedence over logical “or” The former will only return true if all the expressions are true and the latter will return true if any of the expressions are true. 4) Query Job (/ws/queryjobs) A QueryJob represents any query submitted to PAIRS. These objects are used to retrieve status of a submitted query, as well as getting the data back from a finished query. These are the properties of a QueryJob object. • id (text): the unique ID of a query job • status (text): the current status of the job, options are: Running: Job not finished yet Succeeded: Job successfully finished. Data ready for download Failed: Job failed – technical issue. NoDataFound: Job finished, but no data found in the area requested. • start (date): the start time of the current query job • pql (text): description of all filters (PAIRS Query Language) used on this query job • folder (text): the folder where the data can be download through FTP These are the operations available for a QueryJob: • { Get (/ws/queryjobs/<job ID>): "id": "1456870865483_69044", "status": "Succeeded", "start": 1458833191762, "pql": null, "swLat": 38, "swLon": -122, "neLat": 39, returns a full description of the query job with the ID provided, below is the example: 27 } • "neLon": -121, "exPercent": 0, "flag": false Download (/ws/queryjobs/download/<job ID>): if the job is done and data is retrieved, this command will write all the GeoTiff files zipped in one file. 5) Area of Interest (/ws/queryaois) Area of interest (AoI) is a pre-defined region that can be used to submit queries. It is specified using a GIS shape format. Currently KML with a single polygon is supported only. Section 3.1.2 describes how to load your own AoI. Here are the properties of an AoI object: • id (text): the unique ID of an area of interest • key (text): this is a unique key defined by the user that can be used to submit queries • name (text): the area of interest name 6) Query Examples 6.6.1. A single point query on DataLayer 111 Note: You can use the PAIRS metadata API to translate names to a corresponding ID. <PAIRS URL>/ws/query/submit?type=point&coordinates=38,-122&datalayers=111&intervals=02/01/15 RESULT: [{"dataset": {"id": 11, "key": "cropscape-prs", "name": "Historical crop planting map (USA)"}, "datalayer": {"id": 111, "name": "Crop"}, "lat": 38, "lon": -122, "timestamp": 1388534400000, ”value": 176, "group": null}] This is the JSON representation of the data. In particular, it represents the value of the “cropscape” layer for the geo-locatinon 38,-122, namely 176. 28 6.6.2. Rectangular area query <PAIRS URL>/ws/query/submit?type=square&coordinates=38,-122,38.5,121.5&datalayers=111&intervals=02/01/15 RESULT: {"id": "1448399768880_3736", "status": "Running", "start": 1448471857213, "pql": null, "swLat": 38, "swLon": -122, "neLat": 38.5, "neLon": -121.5, "exPercent": 0} All the area queries on PAIRS run offline, when you submit your query, a job and a corresponding job ID will be created to extract the data. The result of your query submission is the job information. As you can confirm from the previous example, the status of the job is “Running”. You can then monitor the job using another URL: <PAIRS URL>/ws/queryjobs/1448399768880_3736 RESULT: {"id": "1448399768880_3736", "status": "Succeeded", "start": 1448471857213, "pql": null, "swLat": 38, "swLon": -122, "neLat": 38.5, "neLon": -121.5, "exPercent": 0} Now the job is finished, “Succeeded”, and you are ready to download the data using this URL: <PAIRS URL>/ws/queryjobs/download/1448399768880_3736 In contrast <PAIRS URL>/ws/queryjobs/download/1454563001483_0006 will return an error. As obvious, the link is the job ID that you received once you submitted your query. This URL will allow you to download a ZIP file with one or many GeoTIFF images containing the data requested on your query. 29 These are two very simple queries that you can submit and check if it works for you. You will have to authenticate to submit them. PAIRS output standard formats. The first case is JSON (text) that can be parsed by many languages. The second query returns a GeoTIFF format image – a special image with geo-location information readable by most GIS software tools such as e.g. QGIS. 6.6.3. Examples with GFS and MODIS GFS Temperature <PAIRS URL>/ws/query/submit?type=point&coordinates=51.506,0.114&datalayers=16100&intervals=02/01/16 MODIS Aqua NDVI <PAIRS URL>/ws/query/submit?type=point&coordinates=51.506,0.114&datalayers=51&intervals=02/01/16 MODIS Terra NDVI <PAIRS URL>/ws/query/submit?type=point&coordinates=51.506,0.114&datalayers=71&intervals=02/01/16 6.6.4. Using wget Most Linux distributions ship with the GNU Wget command line tool. You can employ it directly to submit and download your query results from the PAIRS web server. E.g. after you submitted a one-by-one degree square area query to get a corresponding job ID, say 1448399768897_0783, saved in JSON format in response.txt: wget -O response.txt --user=xxxxxx --password=xxxxxxxx "<PAIRS URL>/ws/query/submit?type=square&coordinates=38,-122,39,121&datalayers=51&intervals=02/01/16" you can download the result as QueryResult.zip using wget -O QueryResult.zip --user=xxxxxx --password=xxxxxxxx "<PAIRS URL>/ws/queryjobs/download/1448399768897_0783" In the case of a point query it is even simpler since you get the result directly in JSON format (saved as PointQueryResult.txt in this example): 30 wget -O PointQueryResult.txt --user=xxxxxx --password=xxxxxxxx "<PAIRS URL>/ws/query/submit?type=point&coordinates=38,-122&datalayers=51&intervals=02/01/16" 31 1. Appendix A. Details of Datasets Table 1. List of Datasets ID Key Name Level Status 1 lsat7-etm Landsat 7 (USGS and NASA satellite imagery) 21 beta 2 lsat8-lev1 High resolution satellite biweekly Landsat8 21 beta 3 lsat8-lev2 High resolution satellite biweekly Landsat8(SR) 21 beta 5 modis-aqua-13prs Medium resolution satellite daily Aqua (13) 18 product 6 modis-aqua-09-q1 Medium resolution satellite daily Aqua (09 SR) 18 product 7 modis-terra-13prs Medium resolution satellite daily Terra (13) 18 product 8 modis-terra-09-q1 Medium resolution satellite daily Terra (09 SR) 18 product 9 prism-daily-prs PRISM Climate Data 14 product 11 cropscape-prs Historical crop planting map (USA) 21 product 12 nam-forecast USA Weather Forecast 14 product 13 cimis-raster California weather condition measurements 15 product 14 ned-elevation Elevation 23 product 15 ibm-analytics Reference Evapotranspiration 14 product 16 gfs-forecast Global Weather Forecast 11 product 17 blend2d-forecast SMT (Self-learning weather modeling and forecasting technology) 14 product 18 soil-gssurgo Soil properties (USA) 23 beta 24 daymet Daymet 16 beta 25 cfs-forecast CFS 11 product 26 ecmwf ECMWF 13 product 32 1) High resolution satellite biweekly Landsat 8 World wide 30m resolution Satellite Data from Landsat8 every 16 days. It has the following bands (datalayers) on PAIRS: Lsat8 bands PAIRS Name Unit Column Family Resolution ID Band 1 - Coastal aerosol Coastal aerosol a.u.(-1.2 to 1.2) c1 0.000256 201 Band 2 - Blue Blue a.u.(-1.2 to 1.2) c2 0.000256 202 Band 3 - Green Green a.u.(-1.2 to 1.2) c3 0.000256 203 Band 4 - Red Red a.u.(-1.2 to 1.2) c4 0.000256 204 Band 5 - Near Infrared Near Infrared (NIR) (NIR) a.u.(-1.2 to 1.2) c5 0.000256 205 Band 6 - SWIR 1 SWIR 1 a.u.(-1.2 to 1.2) c6 0.000256 206 Band 7 - SWIR 2 SWIR 2 a.u.(-1.2 to 1.2) c7 0.000256 207 Band 8 - Panchromatic Panchromatic a.u.(-1.2 to 1.2) c8 0.000256 208 Band 9 - Cirrus Cirrus a.u.(-1.2 to 1.2) c9 0.000256 209 Band 10 - Thermal Infrared (TIRS) 1 TIRS 1 K c10 0.000256 210 Band 11 - Thermal Infrared (TIRS) 2 TIRS 2 K c11 0.000256 211 2) High resolution satellite biweekly Landsat 8 (SR) Landsat8 Level2 is surface reflectance corrected dataset. It has the same resolution as Landsat8 Level1, and it is post processed data by NASA. It has the following datalayers: 33 Lsat8 SR bands PAIRS Name Unit Column Family Resolution ID Band 1 - Coastal aerosol Coastal aerosol a.u.[-2e3,1.6e4] c1 0.000256 301 Band 2 - Blue Blue a.u.[-2e3,1.6e4] c2 0.000256 302 Band 3 - Green Green a.u.[-2e3,1.6e4] c3 0.000256 302 Band 4 - Red Red a.u.[-2e3,1.6e4] c4 0.000256 304 Band 5 - Near Infrared (NIR) Near Infrared (NIR) a.u.[-2e3,1.6e4] c5 0.000256 305 Band 6 - SWIR 1 SWIR 1 a.u.[-2e3,1.6e4] c6 0.000256 306 Band 7 - SWIR 2 SWIR 2 a.u.[-2e3,1.6e4] c7 0.000256 307 NDVI NDVI a.u.[-1,1] c8 0.000256 308 SAVI SAVI a.u.[-1,1] c9 0.000256 309 MSAVI MSAVI a.u.[-1,1] c10 0.000256 310 EVI EVI a.u.[-1,1] c11 0.000256 311 CLOUD CLOUD a.u.[0,7] c12 0.000256 312 NDMI NDMI a.u.[-1,1] c13 0.000256 313 NBR NBR a.u.[-1,1] c14 0.000256 314 NBR 2 NBR 2 a.u.[-1,1] c15 0.000256 315 Cloud Mask Cloud Mask a.u. c16 0.000256 316 Cloud Mask Confidence Cloud Mask Confidence a.u. c17 0.000256 317 34 3) Medium resolution satellite daily: Aqua (13), Aqua (09 SR), Terra (13), Terra (09 SR) There are two identical MODIS satellites – Aqua / Terra. MODIS Aqua (13) / MODIS Terra (13) have the following datalayers: MODIS 13 bands PAIRS Name Unit Column Family Resolution ID 51/71 250m 16 days NDVI NDVI a.u.[-2e2,1e4] b0 0.002048 250m 16 days red Red reflectance (Band 1) a.u.[0,1e4] b1 0.002048 250m 16 days NIR NIR reflectance (Band 2) a.u.[0,1e4] b2 0.002048 250m 16 days blue Blue reflectance (Band 3) a.u.[0,1e4] b3 0.002048 250m 16 days MIR MIR reflectance (Band 7) a.u.[0,1e4] b4 0.002048 52/72 53/73 54/74 55/75 MODIS Aqua (09) / MODIS Terra (09) have the following datalayers: MODIS SR bands 250m Surface Reflectance Band 1 (620–670 nm) 250m Surface Reflectance Band 2 (841–876 nm) PAIRS Name Band 1 Unit K Band 2 K Column Family c0 c1 Resolution ID 0.002048 61 / 81 0.002048 62 / 82 4) Prism Climate Data Prism data is historical daily weather condition measurements in USA. It has the following datalyers: PRISM parameters PAIRS Name Units Column Family Resolution ID Daily total precipitation Precipitation Inch -> mm b0 0.032768 91 35 (rain+melted snow) Daily maximum temperature Temperature Max F -> kelvin b1 0.032768 92 Daily minimum temperature Temperature Min F -> kelvin b2 0.032768 93 Daily mean temperature, calculated as (tmax+tmin)/2 Temperature Mean 0.032768 94 F -> kelvin b3 5) USA weather forecast USA weather forecast is a 3km resolution weather forecast with historical data. It has the following layers in PAIRS: Parameters PAIRS Name Units Column Family Resolution ID Ground temperature Ground temperature K c1 0.032768 1200 Ground relative humidity Ground relative humidity % c2 0.032768 1300 Solar irradiance Solar irradiance W/m2 c3 0.032768 1400 Wind toward east Wind toward east m/s c4 0.032768 1500 Wind toward north Wind toward north m/s c5 0.032768 1600 Pressure_GND Pressure_GND Pa c6 0.032768 1700 Precipitation (mm/s) precip mm/s c7 0.032768 1800 6) California weather condition measurements California weather condition measurements dataset provides gridded data for the state of California. It has the following datalayers: CIMIS parameters PAIRS Name Units Column Family Unit Conversion Resolution ID Reference evapotranspiration Reference evapotranspiration Mm c00 mm 0.016384 130 Net radiation Net radiation W/m2 c01 (MJ/m2)/11.57 0.016384 131 36 Net long-wave radiation Net long-wave radiation W/m2 c02 (MJ/m2)/11.57 0.016384 132 Clear sky solar radiation Clear sky solar radiation W/m2 c03 (MJ/m2)/11.57 0.016384 133 Clearness factor Clearness factor No unit c04 0.016384 134 Daily minimum air temperature Daily minimum air temperature K c05 273.15+C 0.016384 135 Daily maximum air temperature Daily maximum air temperature K c06 273.15+C 0.016384 136 Dew point temperature Dew point temperature K c07 273.15 +C 0.016384 137 Wind speed Wind speed m/s c08 m/s 0.016384 138 7) Global weather forecast Global weather forecast dataset is a world wide forecast model from NOAA with 0.5 degree spatial resolution. 10 days forecast is ingested into PAIRS for weather forecast around the world. All the parameters follow the same conventions as USA weather forecasts except the precipitation is an averaged precipitation rate over 3 hours. Global weather forecast has the following datalayers available on PAIRS: Parameters PAIRS Name Units Column Family Resolution ID Temp_2m_Gnd Ground temperature K c1 0.262144 16100 RH_2m_Gnd Ground relative humidity % c2 0.262144 16200 Total_Sh_Dw_inline Solar irradiance W/m2 c3 0.262144 16300 Wind_u_10m_Gnd Wind toward east m/s c4 0.262144 16400 Wind_v_10m_Gnd Wind toward north m/s c5 0.262144 16500 Pres_GND Pressure_GND Pa c6 0.262144 16600 precip precip mm/s c7 0.262144 16700 37 8) ECMWF (European Center for Medium-Range Weather Forecasting) ECMWF issues 10 days ahead weather forecast globally with 0.125 degree spatial resolution with 3 hourly interval for the first 6 days and then 6 hourly for the other 4 days. We have acquired 15 surface parameters into Pairs with spatial interpolation into a PAIRS grid of 0.065536 degree. In addition, the accumulated solar radiation parameters have been interpolated into the instantaneous values using clear sky profile . Accumulated total precipitation and convective precipitation have been converted to averaged precipitation rate for the interval. Parameter PAIRS Name Units Column Family Resolution ID Ground temperature Ground temperature K c1 0.065536 26001 Solar irradiance_GHI Global Horizontal Irradiance w m-2 c2 0.065536 26002 Solar irradiance_DNI Direct Normal Irradiance w m-2 c3 0.065536 26003 Wind toward east_10m Wind toward east_10m m s-1 c4 0.065536 26004 Wind toward north_10m Wind toward north_10m m s-1 c5 0.065536 26005 Wind toward east_100m Wind toward east_100m m s-1 c6 0.065536 26006 Wind toward north_100m Wind toward north_100m m s-1 c7 0.065536 26007 Dewpoint Dewpoint kelvin c8 0.065536 26008 Surface Albedo Surface Albedo No unit (0-1) c9 0.065536 26009 Max_precip_rate Max precipitation rate mm/hour c10 0.065536 26010 Min_precip_rate Min precipitation rate mm/hour c11 0.065536 26011 Total_precipitation_rate_ Total precipitation avg mm/hour c12 0.065536 26012 Convective_precip_rate_ Convective precipitation avg mm/hour c13 0.065536 26013 Ground Pressure Ground Pressure pa c14 0.065536 26014 Cloud Cover Cloud Cover No unit (0-1) c15 0.065536 26015 38 9) Historical crop planting map USDA issues crop information yearly in 30m resolution. PAIRS has 2014 and 2015 data uploaded. Details are in the following website: (8) http://nassgeodata.gmu.edu/CropScape/ Name PAIRS Name Units Column Family Resolution ID CROP Crop b0 0.000256 none 111 The crop index is listed here for look up purposes: Value Category Value Category Value Category 1 Corn 55 Caneberries 206 Carrots 2 Cotton 56 Hops 207 Asparagus 3 Rice 57 Herbs 208 Garlic 4 Sorghum 58 Clover/Wildflowers 209 Cantaloupes 5 Soybeans 59 Sod/Grass Seed 210 Prunes 6 Sunflower 60 Switchgrass 211 Olives 10 Peanuts 61 Fallow/Idle Cropland 212 Oranges 11 Tobacco 62 Pasture/Grass 213 Honeydew Melons 12 Sweet Corn 63 Forest 214 Broccoli 13 Pop or Orn Corn 64 Shrubland 216 Peppers 14 Mint 65 Barren 217 Pomegranates 21 Barley 66 Cherries 218 Nectarines 22 Durum Wheat 67 Peaches 219 Greens 23 Spring Wheat 68 Apples 220 Plums 39 24 Winter Wheat 69 Grapes 221 Strawberries 25 Other Small Grains 70 Christmas Trees 222 Squash 26 Dbl Crop WinWht/Soybeans 71 Other Tree Crops 223 Apricots 27 Rye 72 Citrus 224 Vetch 28 Oats 74 Pecans 225 Dbl Crop WinWht/Corn 29 Millet 75 Almonds 226 Dbl Crop Oats/Corn 30 Speltz 76 Walnuts 227 Lettuce 31 Canola 77 Pears 229 Pumpkins 32 Flaxseed 81 Clouds/No Data 230 Dbl Crop Lettuce/Durum Wht 33 Safflower 82 Developed 231 Dbl Crop Lettuce/Cantaloupe 34 Rape Seed 83 Water 232 Dbl Crop Lettuce/Cotton 35 Mustard 87 Wetlands 233 Dbl Crop Lettuce/Barley 36 Alfalfa 88 Nonag/Undefined 234 Dbl Crop Durum Wht/Sorghum 37 Other Hay/Non Alfalfa 92 Aquaculture 235 Dbl Crop Barley/Sorghum 38 Camelina 111 Open Water 236 Dbl Crop WinWht/Sorghum 39 Buckwheat 112 Perennial Ice/Snow 237 Dbl Crop Barley/Corn 41 Sugarbeets 121 Developed/Open Space 238 Dbl Crop WinWht/Cotton 42 Dry Beans 122 Developed/Low Intensity 239 Dbl Crop Soybeans/Cotton 40 43 Potatoes 123 Developed/Med Intensity 240 Dbl Crop Soybeans/Oats 44 Other Crops 124 Developed/High Intensity 241 Dbl Crop Corn/Soybeans 45 Sugarcane 131 Barren 242 Blueberries 46 Sweet Potatoes 141 Deciduous Forest 243 Cabbage 47 Misc Vegs & Fruits 142 Evergreen Forest 244 Cauliflower 48 Watermelons 143 Mixed Forest 245 Celery 49 Onions 152 Shrubland 246 Radishes 50 Cucumbers 176 Grassland/Pasture 247 Turnips 51 Chick Peas 190 Woody Wetlands 248 Eggplants 52 Lentils 195 Herbaceous Wetlands 249 Gourds 53 Peas 204 Pistachios 250 Cranberries 54 Tomatoes 205 Triticale 254 Dbl Crop Barley/Soybeans 10)Elevation There is a 10-m resolution dataset for elevation for the USA. Name PAIRS Name Units Column Family Resolution ID Elevation Elevation Meter c1 0.000008 140 11)Soil properties We are in the process of ingesting soil property survey data into PAIRS. Soil PAIRS Name Units Column Family Resolution ID gssurgo_slope_1356998400 Slope % C1 0.000256 18001 gssurgo_runoff_1356998400 RunOff n.u. C2 0.000256 18002 41 gssurgo_component_135699 Component 8400 n.u. C3 0.000256 18003 gssurgo_ec_1356998400 Electrical Conductivity dS/m C4 0.000256 18004 gssurgo_cec_1356998400 Cation Exchange Capacity meq/ 100g C5 0.000256 18005 gssurgo_ph_1356998400 pH pH C6 0.000256 18006 gssurgo_silt_1356998400 Silt total % C7 0.000256 18007 % C8 0.000256 18008 gssurgo_sand_1356998400 Sand total gssurgo_clay_1356998400 Clay total % C9 0.000256 18009 gssurgo_om_1356998400 Organic Matter % C10 0.000256 18010 gssurgo_bd_1356998400 BulkDensity (1/3 bar) g/cm3 C11 0.000256 18011 gssurgo_awc_1356998400 Available Water Holding Capacity n.u. C12 0.000256 18012 gssurgo_sar_1356998400 Sodium Adsorption Ratio n.u. C13 0.000256 18013 gssurgo_horizondep_1356998400 Horizon Depth cm C14 0.000256 18014 gssurgo_dep-restrictlayer_1356998400 Depth to a Restrictive cm Layer C15 0.000256 18015 gssurgo_drainage_13569984 Drainage 00 n.u. C16 0.000256 18016 gssurgo_horizon_135699840 Horizon 0 n.u. C17 0.000256 18017 gssurgo_surfalbedo_1356998400 n.u. C18 0.000256 18018 Surface Albedo 42 12)Reference Evapotranspiration We have multiple one of a kind analytics on PAIRS. Two of them are in the Weather category: SMT (selflearning weather modeling and forecast ) and SMT (long term seasonal forecast). The Evapotranspiration model is hosted under Analytics category. When the models are developed based on other datasets on PAIRS and validated, we ingest the derived analytical layers back onto PAIRS as a separate dataset. Currently daily reference evapotranspiration for the continental USA as well as on a global scale (coarser resolution than USA data layer) is available. Reference evapotranspiration is critical in irrigation forecast and decision making. Analytics Layers PAIRS Name Units Comments Resolution ID GFS based evapotranspiration ET0 mm/day ET0 for global scale 0. 262144 15200-10 NAM based evapotranspiration ET0 mm/day ET0 for USA 0.032768 15100-10 ECMWF based evapotranspiration ET0 Mm/day ET0 for global scale 0.065536 15300-10 13)SMT – IBM’s cognitive forecast in USA An improved weather forecast based on Model blending machine learning algorithm is generated daily for the continental USA. Resolution is the same as USA forecast. The Solar irradiance and wind speed parameters are super important for renewable energy industry. We deliver the forecast to renewable energy utility customers daily. It has the following datalayers: Parameters PAIRS Name Units Column Family Resolution ID Temp_2m_Gnd Ground Temperature K c1 0.032768 17100 RH_2m_Gnd Ground relative humidity % c2 0.032768 17200 Total_Sh_Dw_inline Solar irradiance W/m2 c3 0.032768 17300 Wind_speed m/s c4 0.032768 17400 Wind speed 43 14)SMT-IBM’s Long Term Forecast Globally Seasonal forecast projecting 6 months ahead is issued by NOAA daily. Based on NOAA’s forecast, we built an improved model using machine learning. The new analytics layers is under weather category called SMT (Long Term Forecast). It has the following data layers: Parameters PAIRS Name Units Column Family Resolution ID Ground temperature Ground temperature K C1 0.262144 25001 Solar irradiance Solar irradiance w/m^2 C2 0.262144 25002 Wind toward east Wind toward east m/s C3 0.262144 25003 Wind toward north Wind toward north m/s C4 0.262144 25004 Categorical Rain rain_or_not n.u. C5 0.262144 25005 Precip Rate* precip_rate mm/hour C6 0.262144 25006 Precipitable water* precip_water kg/m^2 0.262144 25007 C7 44 Links 1) Pairs Introduction video: https://www.youtube.com/watch?v=Nxwi6x0ObT0 2) Pairs web address: http://pairs.mmthub.com/ 3) Pairs web address within IBM intranet: http://pairs.watson.ibm.com:8080/pairs/ 4) Point query tutorial video: http://pairs.mmthub.com/manual/videos/point_query.webm 5) Cross layer join demo on orange farm https://www.youtube.com/watch?v=aDlHsxyRlys 6) Cross layer join demo on wildfire potential https://www.youtube.com/watch?v=Bx_c1pykelQ 7) General demo with multiple examples https://www.youtube.com/watch?v=MlPhTKE189s 8) Cropscape web address: http://nassgeodata.gmu.edu/CropScape/ 2. Acknowledgement -Lansat 7 and Landsat 8 datasets are derived from U.S. Geological Survey (USGS)/NASA Landsat Program The USGS home page is http://www.usgs.gov The NASA home page is http://www.nasa.gov -MODIS datasets are derived from USGS MODIS program datasets -NED dataset is derived from Data available from the USGS See USGS Visual Identity System Guidance http://www.usgs.gov/visual-id/ for further details -NED dataset is distributed by the Land Processes Distributed Active Archive Center (LP DAAC) It is located at USGS/EROS, Sioux Falls, SD. http://lpdaac.usgs.gov -Global forecast system (GFS), North America Mesoscale (NAM), Climate Forecast System (CFS) are derived products from NOAA datasets The NOAA home page is http://www.noaa.gov/ -soil data is derived from SSURGO datasets distributed by USDA under Creative Commons License The web page of USDA is http://www.usda.gov/ -ECMWF datasets are derived Type B and Type C products from data and products of the European Center for Medium-range Weather Forecasts (copyright© 2016 ECMWF) -PRISM dataset is derived from PRISM Climate Group, Oregon State University 45 -Cropscape data is from USDA National Agricultural Statistics Services The web page of NASS is http://nassgeodata.gmu.edu/CropScape/ - -Daymet historical weather dataset is derived from Daymet dataset distributed by Oak Ridge National Laboratory, which is under NASA's EarthData license policy https://earthdata.nasa.gov/ Citation to Daymet data is in this web page: https://daac.ornl.gov/DAYMET/guides/Daymet_mosaics.html#Daymet_m_citation Thornton, P.E., Running, S.W., White, M.A. 1997. Generating surfaces of daily meteorological variables over large regions of complex terrain. Journal of Hydrology 190: 214 - 251. http://dx.doi.org/10.1016/S0022-1694(96)03128-9