This chapter outlines two fundamental geographic data models — vector and raster — and introduces the main Julia packages for working with them. Before demonstrating their implementation in Julia, we will introduce the theory behind each data model and the disciplines in which they predominate.
The vector data model (Section 1.2) represents the world using points, lines, and polygons. These have discrete, well-defined borders, meaning that vector datasets usually have a high level of precision (but not necessarily accuracy). The raster data model (Section 1.3), on the other hand, divides the surface up into cells of constant size. Raster datasets are the basis of background images used in web-mapping and have been a vital source of geographic data since the origins of aerial photography and satellite-based remote sensing devices. Rasters aggregate spatially specific features to a given resolution, meaning that they are consistent over space and scalable, with many worldwide raster datasets available.
Which to use? The answer likely depends on your domain of application, and the datasets you have access to:
Vector datasets and methods dominate the social sciences because human settlements and and processes (e.g., transport infrastructure) tend to have discrete borders.
Raster datasets and methods dominate many environmental sciences because of the reliance on remote sensing data.
Julia has strong support for both data models. We will focus on GeoDataFrames.jl and the GeoInterface.jl ecosystem for working with vector data, including the packages GeometryOps.jl and LibGEOS.jl. We will focus on the Rasters.jl package for working with rasters.
There is much overlap in some fields and raster and vector datasets can be used together: ecologists and demographers, for example, commonly use both vector and raster data. Furthermore, it is possible to convert between the two forms (see Chapter 5). Whether your work involves more use of vector or raster datasets, it is worth understanding the underlying data models before using them, as discussed in subsequent chapters.
Important
The distinction between Vector and Raster data models is rather arbitrary, and historically based on (GIS) storage formats. Rasters are structured vector datasets, and depending on the sampling can be seen as a collection of measurement points, or fields (polygons, or faces in a mesh). Modern developments, like discrete global grid systems (hierarchical meshes) or vector data cubes do not follow either data models.
1.2 Vector data
The geographic vector data model is based on points located within a coordinate reference system (CRS). Points can represent self-standing features (e.g., the location of a bus stop), or they can be linked together to form more complex geometries such as lines and polygons. Most point geometries contain only two dimensions (3-dimensional CRSs may contain an additional \(z\) value, typically representing height above sea level).
In this system, London, for example, can be represented by the coordinates (-0.1,51.5). This means that its location is -0.1 degrees east and 51.5 degrees north of the origin. The origin, in this case, is at 0 degrees longitude (a prime meridian located at Greenwich) and 0 degrees latitude (the Equator) in a geographic (‘lon/lat’) CRS (Figure 1.1, left panel). The same point could also be approximated in a projected CRS with ‘Easting/Northing’ values of (530000,180000) in the British National Grid, meaning that London is located 530 \(km\) East and 180 \(km\) North of the origin of the CRS (Figure 1.1, right panel). The location of National Grid’s origin, in the sea beyond South West Peninsular, ensures that most locations in the UK have positive Easting and Northing values.
Tip
The Dutch national grid not only ensures positive values for Easting and Northing, but has shifted its origin so far south that the ranges of Easting and Northing coordinates do not overlap (i.e. the origin is farther south than the Netherlands is wide).
(a) A geographic CRS with an origin at 0° longitude and latitude
(b) A projected CRS with an origin located in the sea west of the South West Peninsula.
Figure 1.1: Illustration of vector (point) data in which location of London (the red X) is represented with reference to an origin (the blue circle).
Illustration of vector (point) data in which location of London (the red X) is represented with reference to an origin (the blue circle). The left plot represents a geographic CRS with an origin at 0° longitude and latitude. The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula. :::
There is more to CRSs, as described in Section 1.4 and Chapter 6 but, for the purposes of this section, it is sufficient to know that coordinates consist of two numbers representing the distance from an origin, usually in \(x\) and \(y\) dimensions.
TODO: explain the JuliaGeo ecosystem like they explain geopandas E.g GeoInterface defines how to access any geometry, then LibGEOS (wrapping GEOS), GeometryOps, Proj, etc consume such geometries.
1.2.1 Vector data classes
Julia’s geographic vector data model is based on the Simple Features standard, which is an ISO standard for representing vector data. Simple Features defines types of geometries (points, lines, polygons, multipolygons, etc.), as well as “feature collections” that are basically tables of geometries associated with some data.
Starting with the highest level class, feature collections come in two flavours: - Loaded from file (Shapefile.Table, GeoJSON.FeatureCollection, …) - Tables with geometry columns (e.g. DataFrame, but can be any Julia table)
One can easily convert from feature collections to
In this section, we are going to demonstrate the creation of rasters from scratch. We will construct two small rasters, elev and grain, which we will use in examples later in the book. Unlike creating a vector layer (see Section 1.2.2), creating a raster from scratch is rarely needed in practice because aligning a raster with the proper spatial extent is challenging to do programmatically (“georeferencing” tools in GIS software are a better fit for the job). Nevertheless, the examples will be helpful to become more familiar with the Rasters.jl data structures.
usingRastersimportGeoFormatTypes as GFT
Conceptually, a raster is an array combined with georeferencing information, whereas the latter comprises:
Lookup vectors for the axes, encoding the spatial coordinates for each grid cell. These take the form of the X and Y dimensions in the raster that you’ll see below.
A coordinate reference system (CRS) definition, specifying the association of the raster’s coordinates with the surface of the earth.
Therefore, to create a raster, we first need to have an array with the values, and then supplement it with the georeferencing information. Let’s create the arrays elev and grain. The elev array is a \(6 \times 6\) array with sequential values from 1 to 36. It can be created as follows using base Julia functions.
The grain array represents a categorical raster with values 0, 1, 2, corresponding to categories “clay”, “silt”, “sand”, respectively. We will create it from a specific arrangement of pixel values, using reshape.
Note that in both cases, we are using the uint8 (unsigned integer in 8 bits, i.e., 0-255) data type, which is sufficient to represent all possible values of the given rasters (see ?tbl-numpy-data-types). This is the recommended approach for a minimal memory footprint.
What is missing now is the georeferencing information (see ?sec-using-rasters-jl). In this case, since the rasters are arbitrary, we also set up arbitrary dimension lookups for the x and y axes, where:
The origin (\(x_{min}\), \(y_{max}\)) is at -1.5,1.5
The raster resolution (\(delta_{x}\), \(delta_{y}\)) is 0.5,-0.5
We can add this information using rasterio.transform.from_origin, and specifying west, north, xsize, and ysize parameters. The resulting transformation matrix object is hereby named new_transform.
We can now construct a Raster object, from the elev array and the dimensions new_x and new_y.
We assign to it a CRS of EPSG:4326 (which encodes that the coordinate system is longitude/latitude on the “standard” WGS84 definition of the Earth’s curvature).
Here, we use the GFT.EPSG(code) constructor to create an object that encodes a reference code under the European Petroleum Survey Group (EPSG) authority’s database of coordinate reference systems.
The raster can now be plotted in its coordinate system, passing the array elev along with the transformation matrix new_transform to rasterio.plot.show (Figure 1.2).
plot(elev_raster)
Figure 1.2: Plot of the elev raster, a minimal example of a continuous raster, created from scratch
The grain raster can be plotted the same way, as we are going to use the same transformation matrix for it as well (Figure 1.3).
Figure 1.3: Plot of the grain raster, a minimal example of a categorical raster, created from scratch
At this point, we have two rasters, each composed of an array and related dimension lookups. We can work with the raster using Rasters.jl by:
Keeping in mind that any other layer we use in the analysis is in the same CRS
Finally, to export the raster for permanent storage, along with the spatial metadata, we need to go through the following steps:
Create a raster file (where we set the lookups and the CRS, among other settings)
Write the array with raster values into the connection
Close the connection
Don’t worry if the code below is unclear; the concepts related to writing raster data to file will be explained in ?sec-data-output-raster. For now, for completeness, and also to use these rasters in subsequent chapters without having to re-create them from scratch, we just provide the code for exporting the elev and grain rasters into the output directory. In the case of elev, we do it as follows with the Rasters.write functions and methods of the Rasters.jl package.
write("output/elev.tif", elev_raster; force =true)
"output/elev.tif"
Note that the CRS we (arbitrarily) set for the elev raster is WGS84, defined using crs=4326 according to the EPSG code.
Exporting the grain raster is done in the same way, with the only differences being the file name and the array we write into the connection.
write("output/grain.tif", Raster(grain, (new_x, new_y); crs = GFT.EPSG(4326)); force =true)
"output/grain.tif"
As a result, the files elev.tif and grain.tif are written into the output directory. We are going to use these small raster files later on in the examples (for example, Section 5.5.1).
Note that the transform matrices and dimensions of elev and grain are identical. This means that the rasters are overlapping, and can be combined into one two-band raster, processed in raster algebra operations (?sec-map-algebra), etc.