Coming from a software engineering discipline with no background in geographic information systems (GIS), I have been picking up all sorts of mapping libraries and terms since I joined NGIS over six months ago.
In today's Geotech Friday blog, I will be sharing some beginner concepts for handling geospatial data and some of the useful data types you may encounter along the way.
Geospatial data is data with location information attached to it. For example, linking customers with their residential postcode or simple latitude and longitude positions of places on Earth.
Most of the time this data will be collected, cleansed, structured and stored into databases, and later retrieved through queries for further analysis or reporting purposes. In software terms, this is called an ETL (Extract, Transform, Load) pipeline process.
To make storage and transmission possible, there needs to be an agreed-upon representation format to ensure there is no misunderstanding between people and software. This is where the geolocation file format and a variety of specifications come into play.
(source: https://xkcd.com/927/ )
All jokes aside, specification defines specific ways to encode geometry and sometimes its associated attributes, while reducing the features into something that can be described by text. The text will then be translated into bytes and understood by computers.
Before diving into some of the different encoding schemes, first we have to understand the building blocks of geospatial shapes. Although it might appear to be complex at first glance, spatial objects can actually be constructed with simple math concepts; points and vectors.
The term “vector” in the next section refers to the mathematical vector, not the GIS vector, see below diagram for the comparison.
(source – left: Ducksters. (2021). Physics for Kids: Basic Vector Math. Ducksters. Retrieved from https://www.ducksters.com/science/physics/vector_math.php)
(source – right: Chapter2_GIS_Fundamentals by University of Massachusetts Amherst http://www.geo.umass.edu/courses/geo494a/Chapter2_GIS_Fundamentals.pdf)
2D Points or coordinates are typically represented by (x,y) - or a (longitude, latitude) pair. Vectors are instructions about how the dots should be connected to form a shape.
With points and vectors we can derive shapes such as:
These two primitive shapes can be further combined to form:
(Image credit: the geometry images comes from Wikipedia, https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry )
The most commonly used standards or representations include, but are not limited to:
Well Known Text
WKT is the string representation of a geometry object. It was introduced and published by OGC (Open Geospatial Consortium) and ISO (International Organisation for Standardisation) some time ago.
WKT defines a shape by stating its type, followed by the coordinates. A more compact version of WKT is called Well Known Binary (WKB). It sacrifices readability by encoding data in a hexadecimal string in exchange for being quick to process by a computer. It is worth noting that WKT only contains location information of the shape, but not the attributes related to the shape.
Example of WKT:
which usually denotes an object. The "properties" key makes it possible to store related attributes along with the shape (make it a feature rather than pure geometry). GeoJSONs are popular within the GIS industry, as well as the software industry, and are widely supported by databases and mapping libraries.
Example of a GeoJSON:
Apart from geodatabases, a Shapefile is another way to store geolocation features. The Shapefile format was developed by Esri (the software company that makes ArcGIS) in the late 90s. Shapefile breaks down the parts into multiple files with different extensions.
When it is zipped for transfer, it is important to include all the bits and pieces of the Shapefile, not just the ones that end with .shp
Example of a Shapefile that appears in ArcCatalog:
In comparison to the long-lived Shapefile, GPKG is a rising star among the open source community. GPKG is an open, platform-independent data format that relies on SQLite (an on-disk, lightweight database) as its storage container. It supports both raster and vector data, and has the benefits of being portable, easy to read and write, efficient to transfer (one ready-to-use single file) and suitable for mobile application.
If you’d like to see GPKG in action, here is a video by Klas Karlsson on basic file operation and creating shapes and layers in QGIS with GPKG format.
So far we’ve walked through four common GIS standards. If you’re interested in learning the comparison between those formats, Terramonitor, a Finnish GIS company, has a great comparison grid in their article Shapefile vs. GeoJSON vs. GeoPackage.
With these formats on hand, the next question is - how are we going to manipulate them to make these shapes do something useful? There are a couple tools we can use to do this.
To display points, lines and polygons on the map, we first need to choose a mapping library. Then we need to find a converter library (or write it yourself) to translate between what you receive (e.g. from a database or an open dataset) and what you need to feed to the map.
I have worked with the combination of Google Maps and WKT. Since Google Maps doesn't provide native support for WKT, we used this library called Wicket to do it for us. With a couple lines of code, we parse in a WKT string, and the shape can be visualised on the map.
Analyse geospatial data and shapes
Sometimes we need to do more complicated operations with the geometry and shape attributes, such as compute the centre of a polygon, find the places within several km of a given point.
These computationally-extensive tasks are best performed in the database directly. Here are a few options:
PostgreSQL with PostGIS extension is our go-to when it comes to relational databases. It comes with heaps of spatial query functions. Read the spatial function documentation here.
MongoDB is a popular no-SQL, document-based database with the advantage of being easy to scale and flexible. Watch this short tutorial on DEV Community to get you started with MongoDB spatial query.
If you are into programming like myself and want to experiment, there are a couple of great Python libraries to try out (list kindly suggested by my colleague, Stafford):
Are you a graduate developer based in Perth or Sydney? If you are, check out our careers page for open graduate positions! We are looking for people who bring ideas to the table, aren’t afraid to take initiative and care deeply about our values and philosophies.
If we don't have a role available that you're looking for, fill out the expression of interest form and we'll get in touch when there's an opening.
Sign up to receive our fortnightly Geotech Friday newsletter.
About the author: Daphne Yu
Daphne is a Graduate Developer at NGIS.