As a GIS professional, I have worked across a range of Geographic Information System (GIS) environments and utilised thousands of different datasets, both spatial and non-spatial.
A common theme I noticed while working with spatial data was the significant challenges that traditional GIS tools had when it came to processing very large datasets. Splitting datasets, simplifying geometries and installing better hardware with the hope of smooth rendering and rapid analysis were all commonplace, and still are commonplace in many workflows.
Meanwhile, non-spatial data processes rarely suffered this problem - mostly thanks to many years of collective progress by the broader IT industry (and a little bit to do with the complexities of geospatial data).
With a growing amount of data containing location components, the GIS industry is in a great position to utilise cloud computing to enable geospatial workflows that can scale according to increasing demands.
In this blog, we’ll take a look at how cloud infrastructure is being leveraged by the geospatial community, and some awesome technology coming out of the collaboration between Google Cloud Platform (GCP) and CARTO.
Most traditional data analysis processes are made up of four key components, and these remain more or less the same for geospatial analysis workloads too. These components include:
Until recently, traditional cloud infrastructure wasn’t readily supporting geospatial data types and functions to enable the four data analysis components to be carried out.
In 2018, Google announced support for geospatial data types and functions in Google Cloud BigQuery. With this, they solved the first three components of the data analysis equation for geospatial workloads in the cloud.
The first component is ingestion. Google BigQuery supports rapid streaming of geospatial datasets (KML, GeoJSON, shapefiles) directly into BigQuery tables along with batch importing of very large datasets. While the data is being imported, the GCP storage engine dynamically scales to handle storage of petabytes of spatial data on the fly.
Running analysis on geospatial data is possible in BigQuery thanks to native support for geospatial data types and out of the box Spatial Type (ST) functions. ST functions can be used to analyse spatial data, determine spatial relationships between geographical features and create or edit the geometry of spatial datasets. By combining these functions with the computation power of BigQuery, we can run all of these standard spatial processes on massive datasets in a matter of seconds.
This leaves us with the final component of the data analysis process, the visualisation of our geospatial data processing and analysis results. This has proven to be one of the more difficult components to get right when it comes to millions of rows of spatial data.
The CARTO BigQuery Tiler enables rendering of your petabyte-scale spatial data by creating and serving tilesets directly out of BigQuery tables.
The diagram above demonstrates how the BigQuery Tiler and CARTO interact. From your BigQuery table (1), the BigQuery Tiler generates the tileset (2), your app requests the map tiles (3) and the tiles are served via the Carto Maps API (4).
This removes the limitations of client-side rendering and opens up a world of possibilities when it comes to visualising very large datasets. For example, visualising 1.4 billion rows of species locations from the Global Biodiversity Information Facility as seen in the screenshot below.
If you are interested in learning more about the next generation of spatial data infrastructure, get in touch with the NGIS team.
In addition, if you’d like to see how BigQuery and CARTO would handle your own spatial datasets, you can sign up for a free CARTO account and get started using the BigQuery Tiler right away.
Sign up to receive our fortnightly Geotech Friday newsletter.
About the author: Dion Fleming
Dion Fleming is a Customer Engineer and the CARTO team lead at Liveli.