1 Meuse River Data

In this worked example, we’ll take a look at the Meuse river data set from the sp package¹.

You will need to download and install the package first if it’s not already instaled. Then, you can make the data set available using data(meuse).

library(sp)
data(meuse)

This map data is currently in the form of a simple data.frame.HOwever, since we will be working with simple features, we need to convert this data set accordingly. First, however, it’s important to note that if you examine the x and y values in this data set, it becomes clear that they do not represent longitude and latitude measurements. By looking at the documentation for meuse, we find that the coordinates are actually Rijksdriehoek (RDH) coordinates, used in the Netherlands’ topographical mapping. To ensure everything works out correctly later on, it’s essential to make sure that our simple features data correctly interprets this information.

To convert the data into a simple features object, we use the st_as_sf() function, using the coords argument to specify which variables represent our x and y coordinates, and then we use the crs argument to denote which projection system they are measured in. RDH coordinates have the EPSG code 28992. So we use that as input.

library(sf)

meuse_sf <-
  meuse %>%
  st_as_sf(coords = c("x", "y"), crs = 28992) %>%
  st_transform(crs = 4326)

Before we start analyzing this data set, let’s first set up the ggplot2 theme for mapping. This time, we will use the theme_map() from the ggthemes package, instead of theme_void(), which we used previously.

library(tidyverse)
library(ggthemes)

theme_set(theme_map(base_size = 11))

Let’s start with a simple plot, looking at lead, creating a bubble plot by mapping the presence of lead to size of the points.

ggplot(meuse_sf) +
  geom_sf(aes(size = lead)) +
  theme(legend.position = "right")

Figure 1: Concentration of lead around the river Meuse.

2 Adding the River

The meuse data set is is named after its association with pollution data around the river Meuse. Therefore, it would be logical to include the location of this river in our plot as well.

To achieve this, we need some map data representing the rier Meuse. Our trusty rnaturalearth package can assist us once more. However, this time, the necessary data is not included directly in the package. Instead, it must be downloaded using the ne_download() function.

Let’s begin by downloading the data.

library(rnaturalearth)

rivers <- ne_download(
  scale = 10,
  type = "rivers_lake_centerlines",
  category = "physical",
  returnclass = "sf"
)

## Reading layer `ne_10m_rivers_lake_centerlines' from data source 
##   `C:\Users\Bezo\AppData\Local\Temp\RtmpQZ733B\ne_10m_rivers_lake_centerlines.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1473 features and 38 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: -164.9035 ymin: -52.15775 xmax: 177.5204 ymax: 75.79348
## Geodetic CRS:  WGS 84

rivers includes rivers and lakes from around the world, which means that plotting it would result in an excess of unnecessary information. There are two ways to address this, but in either case, we first need to figure out a bounding box that includes the data set’s coordinates, ideally with some extra margin. As you may have begun to notice, there often exists a st_*() function for nearly any mapping related issue you might think of. This situation is o exception: st_bbox() is the function we can use here.

bbox <- st_bbox(meuse_sf)

bbox

##     xmin     ymin     xmax     ymax 
##  5.72319 50.95661  5.76304 50.99156

The first and simplest method for plotting the data involves using the coord_sf() function with our bbox object to set the limits. While there’s nothing inherently wrong with this approach, and we’ll use it here, it’s worth nothing that a more refined method would be to use st_crop(), which directly interacts with our bounding box.

# Alternative solution: river_meuse <- st_crop(rivers, bbox)

ggplot(meuse_sf) +
  geom_sf(aes(size = lead)) +
  geom_sf(data = rivers) +
  coord_sf(
    xlim = c(bbox[["xmin"]], bbox[["xmax"]]),
    ylim = c(bbox[["ymin"]], bbox[["ymax"]])
  ) +
  theme(legend.position = "right")

Unfortunately, this doesn’t quite achieve our desired result. The resolution of this vector raster is too low, causing the river to appear merely a line. Let’s try a different approach.

3 Raster Charts

Instead, we will plot a raster map that includes the river its surrounding area. To do this, we’ll download the map from stamen maps, using the ggmap package.

library(ggmap)

To download a map, we need to call get_stadiamap() and specify a bounding box. Fortunately for us, we already computed the bounding box of the points in the last step, so let’s use that.

# extend the ranges slightly
xrng <- extendrange(c(bbox$xmin, bbox$xmax))
yrng <- extendrange(c(bbox$ymin, bbox$ymax))

meuse_raster <- get_stadiamap(
  c(
    left   = xrng[1],
    bottom = yrng[1],
    right  = xrng[2],
    top    = yrng[2]
  ),
  zoom = 14,
  maptype = "stamen_terrain"
)

To construct the map, we will use the ggmap() function instead of ggplot(). Now, however, we encounter a challenge: ggmap and sf unfortunately do not play so well together². Given the current situation, it is therefore better to use the standard map features when working with raster maps from ggmap. To do so, we convert our meuse coordinates from simple features back to a standard data frame.

meuse_df <- cbind(
  st_drop_geometry(meuse_sf),
  st_coordinates(meuse_sf)
)

Now, we can combine the raster map with our points by simply overlaying them using the geom_point() function.

ggmap(meuse_raster) +
  geom_point(aes(X, Y, size = lead), data = meuse_df) +
  theme(legend.position = "right")

Figure 2: Concentration of lead around Meuse.

4 More Heavy Metal

Let’s make the plot slightly more interesting by including also data on other heavy metals. One approach is to use facets, but for this to effective, our data must be in a long format. This means having one column for the variable names and another for their respective values.

meuse_df_long <-
  meuse_df %>%
  pivot_longer(
    cadmium:zinc,
    names_to = "metal",
    values_to = "concentration"
  )

We construct our plot as before, faceting on the type of metal.

ggmap(meuse_raster) +
  geom_point(
    aes(X, Y, size = concentration),
    alpha = 0.5,
    data = meuse_df_long
  ) +
  facet_wrap("metal") +
  theme(legend.position = "right")

Figure 3: Concentration of different heavy metals around the river Meuse in the Netherlands.

5 Geocoding

Purely for instructional purposes, let’s assume we are considering a cleanup of this heavy metal waste and want to assess its potential impact on the nearby ruins of Kasteel Stein. We can illustrate this by displaying the ruins on the map.

We don’t have the coordinates of the castle readily available, so we’ll need to determine them through geocoding.

We use tidygeocoder, which features a host of geocoding services alternatives. A couple of these are free, for instance the Nominatim API.³

Start by generating a tibble (or data.frame) with the names and addresses that you want to geocode.

library(tidygeocoder)

addresses <- tribble(
  ~name,          ~address,
  "Kasteel Stein", "Kasteel Stein, Netherlands"
)

Next, pipe these addresses on to tidygeocoder::geocode().⁴

kasteel_stein <-
  addresses %>%
  tidygeocoder::geocode(
    address, # map to address
    method = "osm" # for Nominatim API which is the default value
  )

kasteel_stein

## # A tibble: 1 × 4
##   name          address                      lat  long
##   <chr>         <chr>                      <dbl> <dbl>
## 1 Kasteel Stein Kasteel Stein, Netherlands  51.0  5.76

To complete our map, we will enhance our previous plot by adding two layers for our newly geocoded data. These layers will display both a label and a dot to represnt the castle ruins.

ggmap(meuse_raster) +
  geom_point(
    aes(X, Y, size = concentration),
    alpha = 0.5,
    data = meuse_df_long
  ) +
  geom_point(
    aes(long, lat),
    col = "red",
    data = kasteel_stein
  ) +
  geom_label(
    aes(long, lat, label = name),
    vjust = -0.5,
    data = kasteel_stein
  ) +
  facet_wrap("metal") +
  theme(legend.position = "right")

Figure 4: Concentration of different heavy metals around the river Meuse in the Netherlands.

6 Source Code

The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-examples/heavy-metal.Rmd.

The sp package is a full-featured solution for visualization data and can be used in its own right to create maps in R.↩︎
In this case, the issue might be hardly noticeable due to the small scale of the map. The problem arises from the bounding box of the raster being in a different projection, but it results in no noticeable distortion.↩︎
The API is free, but it is also rate-limited at one request per second and not as powerful as some of the other available APIs. If you have a project where you need more serious geocoding, you should probably get a key for one of the other APIs.↩︎
There is a namespace clash here with ggmap, which has its own ggmap::geocode() function, so we need to specify which function to use with :: here.↩︎

Worked Example: Heavy Metal

Johan Larsson