1  Geographic data in Julia

using Pkg
Pkg.status()
Status `~/work/geocompjl/geocompjl/Project.toml`
  [cbdf2221] AlgebraOfGraphics v0.11.7
  [c9ce4bd3] ArchGDAL v0.10.10
  [13f3f980] CairoMakie v0.15.6
  [a93c6f00] DataFrames v1.8.0
  [0703355e] DimensionalData v0.29.24
  [e37f2e79] FlexiJoins v0.1.38
  [62cb38b5] GeoDataFrames v0.4.0
  [68eda718] GeoFormatTypes v0.4.4
  [cf35fbd7] GeoInterface v1.6.0
  [61d90e0f] GeoJSON v0.8.4
  [db073c08] GeoMakie v0.7.15
  [3251bfac] GeometryOps v0.1.29
  [a90b1aa1] LibGEOS v0.9.6
  [ee78f7c6] Makie v0.24.6
  [20f20a25] MakieCore v0.10.0
  [436b0209] NaturalEarth v0.1.0
  [c94c279d] Proj v1.9.0
  [a3a2b9e3] Rasters v0.14.6
  [10745b16] Statistics v1.11.1
  [2913bbd2] StatsBase v0.34.6
  [e170d443] Tyler v0.2.4

1.1 Introduction

This chapter outlines two fundamental geographic data models — vector and raster — and introduces the main Julia packages for working with them. Before demonstrating their implementation in Julia, we will introduce the theory behind each data model and the disciplines in which they predominate.

The vector data model (Section 1.2) represents the world using points, lines, and polygons. These have discrete, well-defined borders, meaning that vector datasets usually have a high level of precision (but not necessarily accuracy). The raster data model (Section 1.3), on the other hand, divides the surface up into cells of constant size. Raster datasets are the basis of background images used in web-mapping and have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices. Rasters aggregate spatially specific features to a given resolution, meaning that they are consistent over space and scalable, with many worldwide raster datasets available.

Which to use? The answer likely depends on your domain of application, and the datasets you have access to:

  • Vector datasets and methods dominate the social sciences because human settlements and and processes (e.g., transport infrastructure) tend to have discrete borders.
  • Raster datasets and methods dominate many environmental sciences because of the reliance on remote sensing data.

Julia has strong support for both data models. We will focus on GeoDataFrames.jl and the GeoInterface.jl ecosystem for working with vector data, including the packages GeometryOps.jl and LibGEOS.jl. We will focus on the Rasters.jl package for working with rasters.

There is much overlap in some fields and raster and vector datasets can be used together: ecologists and demographers, for example, commonly use both vector and raster data. Furthermore, it is possible to convert between the two forms (see Chapter 5). Whether your work involves more use of vector or raster datasets, it is worth understanding the underlying data models before using them, as discussed in subsequent chapters.

Important

The distinction between Vector and Raster data models is rather arbitrary, and historically based on (GIS) storage formats. Rasters are structured vector datasets, and depending on the sampling can be seen as a collection of measurement points, or fields (polygons, or faces in a mesh). Modern developments, like discrete global grid systems (hierarchical meshes) or vector data cubes do not follow either data models.

1.2 Vector data

The geographic vector data model is based on points located within a coordinate reference system (CRS). Points can represent self-standing features (e.g., the location of a bus stop), or they can be linked together to form more complex geometries such as lines and polygons. Most point geometries contain only two dimensions (3-dimensional CRSs may contain an additional \(z\) value, typically representing height above sea level).

In this system, London, for example, can be represented by the coordinates (-0.1,51.5). This means that its location is -0.1 degrees east and 51.5 degrees north of the origin. The origin, in this case, is at 0 degrees longitude (a prime meridian located at Greenwich) and 0 degrees latitude (the Equator) in a geographic (‘lon/lat’) CRS (Figure 1.1, left panel). The same point could also be approximated in a projected CRS with ‘Easting/Northing’ values of (530000,180000) in the British National Grid, meaning that London is located 530 \(km\) East and 180 \(km\) North of the origin of the CRS (Figure 1.1, right panel). The location of National Grid’s origin, in the sea beyond South West Peninsular, ensures that most locations in the UK have positive Easting and Northing values.

Tip

The Dutch national grid not only ensures positive values for Easting and Northing, but has shifted its origin so far south that the ranges of Easting and Northing coordinates do not overlap (i.e. the origin is farther south than the Netherlands is wide).

(a) A geographic CRS with an origin at 0° longitude and latitude
(b) A projected CRS with an origin located in the sea west of the South West Peninsula.
Figure 1.1: Illustration of vector (point) data in which location of London (the red X) is represented with reference to an origin (the blue circle).

Illustration of vector (point) data in which location of London (the red X) is represented with reference to an origin (the blue circle). The left plot represents a geographic CRS with an origin at 0° longitude and latitude. The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula. :::

There is more to CRSs, as described in Section 1.4 and Chapter 6 but, for the purposes of this section, it is sufficient to know that coordinates consist of two numbers representing the distance from an origin, usually in \(x\) and \(y\) dimensions.

TODO: explain the JuliaGeo ecosystem like they explain geopandas E.g GeoInterface defines how to access any geometry, then LibGEOS (wrapping GEOS), GeometryOps, Proj, etc consume such geometries.

1.2.1 Vector data classes

Julia’s geographic vector data model is based on the Simple Features standard, which is an ISO standard for representing vector data. Simple Features defines types of geometries (points, lines, polygons, multipolygons, etc.), as well as “feature collections” that are basically tables of geometries associated with some data.

Starting with the highest level class, feature collections come in two flavours: - Loaded from file (Shapefile.Table, GeoJSON.FeatureCollection, …) - Tables with geometry columns (e.g. DataFrame, but can be any Julia table)

One can easily convert from feature collections to

using GeoDataFrames
df = GeoDataFrames.read("data/world.gpkg")
177×11 DataFrame
171 rows omitted
Row geometry iso_a2 name_long continent region_un subregion type area_km2 pop lifeExp gdpPercap
IGeometr… String? String String String String String Float64 Float64? Float64? Float64?
1 Geometry: wkbMultiPolygon FJ Fiji Oceania Oceania Melanesia Sovereign country 19290.0 885806.0 69.96 8222.25
2 Geometry: wkbMultiPolygon TZ Tanzania Africa Africa Eastern Africa Sovereign country 9.32746e5 5.22349e7 64.163 2402.1
3 Geometry: wkbMultiPolygon EH Western Sahara Africa Africa Northern Africa Indeterminate 96270.6 missing missing missing
175 Geometry: wkbMultiPolygon XK Kosovo Europe Europe Southern Europe Sovereign country 11230.3 1.8218e6 71.0976 8698.29
176 Geometry: wkbMultiPolygon TT Trinidad and Tobago North America Americas Caribbean Sovereign country 7737.81 1.35449e6 70.426 31181.8
177 Geometry: wkbMultiPolygon SS South Sudan Africa Africa Eastern Africa Sovereign country 6.24909e5 1.1531e7 55.817 1935.88
using CairoMakie

plot(df.geometry)

1.2.2 Vector from scratch

1.3 Raster data

1.3.1 Raster from scratch

In this section, we are going to demonstrate the creation of rasters from scratch. We will construct two small rasters, elev and grain, which we will use in examples later in the book. Unlike creating a vector layer (see Section 1.2.2), creating a raster from scratch is rarely needed in practice because aligning a raster with the proper spatial extent is challenging to do programmatically (“georeferencing” tools in GIS software are a better fit for the job). Nevertheless, the examples will be helpful to become more familiar with the Rasters.jl data structures.

using Rasters
import GeoFormatTypes as GFT

Conceptually, a raster is an array combined with georeferencing information, whereas the latter comprises:

  • Lookup vectors for the axes, encoding the spatial coordinates for each grid cell. These take the form of the X and Y dimensions in the raster that you’ll see below.
  • A coordinate reference system (CRS) definition, specifying the association of the raster’s coordinates with the surface of the earth.

Therefore, to create a raster, we first need to have an array with the values, and then supplement it with the georeferencing information. Let’s create the arrays elev and grain. The elev array is a \(6 \times 6\) array with sequential values from 1 to 36. It can be created as follows using base Julia functions.

elev = reshape(UInt8(1):UInt8(36), (6, 6))
6×6 reshape(::UnitRange{UInt8}, 6, 6) with eltype UInt8:
 0x01  0x07  0x0d  0x13  0x19  0x1f
 0x02  0x08  0x0e  0x14  0x1a  0x20
 0x03  0x09  0x0f  0x15  0x1b  0x21
 0x04  0x0a  0x10  0x16  0x1c  0x22
 0x05  0x0b  0x11  0x17  0x1d  0x23
 0x06  0x0c  0x12  0x18  0x1e  0x24

The grain array represents a categorical raster with values 0, 1, 2, corresponding to categories “clay”, “silt”, “sand”, respectively. We will create it from a specific arrangement of pixel values, using reshape.

v = UInt8[
  1, 0, 1, 2, 2, 2, 
  0, 2, 0, 0, 2, 1, 
  0, 2, 2, 0, 0, 2, 
  0, 0, 1, 1, 1, 1, 
  1, 1, 1, 2, 1, 1, 
  2, 1, 2, 2, 0, 2
]
grain = reshape(v, (6, 6))
6×6 Matrix{UInt8}:
 0x01  0x00  0x00  0x00  0x01  0x02
 0x00  0x02  0x02  0x00  0x01  0x01
 0x01  0x00  0x02  0x01  0x01  0x02
 0x02  0x00  0x00  0x01  0x02  0x02
 0x02  0x02  0x00  0x01  0x01  0x00
 0x02  0x01  0x02  0x01  0x01  0x02

Note that in both cases, we are using the uint8 (unsigned integer in 8 bits, i.e., 0-255) data type, which is sufficient to represent all possible values of the given rasters (see ?tbl-numpy-data-types). This is the recommended approach for a minimal memory footprint.

What is missing now is the georeferencing information (see ?sec-using-rasters-jl). In this case, since the rasters are arbitrary, we also set up arbitrary dimension lookups for the x and y axes, where:

  • The origin (\(x_{min}\), \(y_{max}\)) is at -1.5,1.5
  • The raster resolution (\(delta_{x}\), \(delta_{y}\)) is 0.5,-0.5

We can add this information using rasterio.transform.from_origin, and specifying west, north, xsize, and ysize parameters. The resulting transformation matrix object is hereby named new_transform.

new_x = X(range(-1.5, step=0.5, length=6))
new_y = Y(range(1.0, step=-0.5, length=6))
Y 1.0:-0.5:-1.5

We can now construct a Raster object, from the elev array and the dimensions new_x and new_y.
We assign to it a CRS of EPSG:4326 (which encodes that the coordinate system is longitude/latitude on the “standard” WGS84 definition of the Earth’s curvature).

Here, we use the GFT.EPSG(code) constructor to create an object that encodes a reference code under the European Petroleum Survey Group (EPSG) authority’s database of coordinate reference systems.

elev_raster = Raster(elev, (new_x, new_y); crs = GFT.EPSG(4326))
6×6 Raster{UInt8, 2}
├──────────────────────┴──────────────────────────────────────── dims ┐
  X Projected{Float64} -1.5:0.5:1.0 ForwardOrdered Regular Points,
  Y Projected{Float64} 1.0:-0.5:-1.5 ReverseOrdered Regular Points
├───────────────────────────────────────────────────────────── raster ┤
  extent: Extent(X = (-1.5, 1.0), Y = (-1.5, 1.0))
  crs: EPSG:4326
└─────────────────────────────────────────────────────────────────────┘
        1.0     0.5     0.0    -0.5    -1.0    -1.5
 -1.5  0x01    0x07    0x0d    0x13    0x19    0x1f
 -1.0  0x02    0x08    0x0e    0x14    0x1a    0x20
 -0.5  0x03    0x09    0x0f    0x15    0x1b    0x21
  0.0  0x04    0x0a    0x10    0x16    0x1c    0x22
  0.5  0x05    0x0b    0x11    0x17    0x1d    0x23
  1.0  0x06    0x0c    0x12    0x18    0x1e    0x24

The raster can now be plotted in its coordinate system, passing the array elev along with the transformation matrix new_transform to rasterio.plot.show (Figure 1.2).

plot(elev_raster)
Figure 1.2: Plot of the elev raster, a minimal example of a continuous raster, created from scratch

The grain raster can be plotted the same way, as we are going to use the same transformation matrix for it as well (Figure 1.3).

plot(Raster(grain, (new_x, new_y); crs = GFT.EPSG(4326)))
Figure 1.3: Plot of the grain raster, a minimal example of a categorical raster, created from scratch

At this point, we have two rasters, each composed of an array and related dimension lookups. We can work with the raster using Rasters.jl by:

  • Keeping in mind that any other layer we use in the analysis is in the same CRS

Finally, to export the raster for permanent storage, along with the spatial metadata, we need to go through the following steps:

  1. Create a raster file (where we set the lookups and the CRS, among other settings)
  2. Write the array with raster values into the connection
  3. Close the connection

Don’t worry if the code below is unclear; the concepts related to writing raster data to file will be explained in ?sec-data-output-raster. For now, for completeness, and also to use these rasters in subsequent chapters without having to re-create them from scratch, we just provide the code for exporting the elev and grain rasters into the output directory. In the case of elev, we do it as follows with the Rasters.write functions and methods of the Rasters.jl package.

write("output/elev.tif", elev_raster; force = true)
"output/elev.tif"

Note that the CRS we (arbitrarily) set for the elev raster is WGS84, defined using crs=4326 according to the EPSG code.

Exporting the grain raster is done in the same way, with the only differences being the file name and the array we write into the connection.

write("output/grain.tif", Raster(grain, (new_x, new_y); crs = GFT.EPSG(4326)); force = true)
"output/grain.tif"

As a result, the files elev.tif and grain.tif are written into the output directory. We are going to use these small raster files later on in the examples (for example, Section 5.5.1).

Note that the transform matrices and dimensions of elev and grain are identical. This means that the rasters are overlapping, and can be combined into one two-band raster, processed in raster algebra operations (?sec-map-algebra), etc.

1.4 Coordinate reference systems