In this worked example, we’ll take a look at the Meuse river data set from the sp package1.
You will need to download and install the package first if it’s not already instaled. Then, you can make the data set available using data(meuse)
.
This map data is currently in the form of a simple data.frame
.HOwever, since we will be working with simple features, we need to convert this data set accordingly. First, however, it’s important to note that if you examine the x
and y
values in this data set, it becomes clear that they do not represent longitude and latitude measurements.
By looking at the documentation for meuse
, we find that the coordinates are actually Rijksdriehoek (RDH) coordinates, used in the Netherlands’ topographical mapping. To ensure everything works out correctly later on, it’s essential to make
sure that our simple features data correctly interprets this information.
To convert the data into a simple features object, we use the
st_as_sf()
function, using the coords
argument to specify which
variables represent our x and y coordinates, and then we use the crs
argument to denote which projection system they are measured in.
RDH coordinates have the EPSG code 28992. So we use that as input.
library(sf)
meuse_sf <-
meuse %>%
st_as_sf(coords = c("x", "y"), crs = 28992) %>%
st_transform(crs = 4326)
Before we start analyzing this data set, let’s first set up the ggplot2 theme for mapping.
This time, we will use the theme_map()
from the ggthemes
package, instead of theme_void()
, which we used previously.
Let’s start with a simple plot, looking at lead, creating a bubble plot by mapping the presence of lead to size of the points.
Figure 1: Concentration of lead around the river Meuse.
The meuse
data set is is named after its association with pollution data
around the river Meuse. Therefore, it would be logical to include the location of this river in our plot as well.
To achieve this, we need some map data representing the rier Meuse. Our trusty
rnaturalearth package can assist us once more. However, this
time, the necessary data is not included directly in the package. Instead, it must be downloaded using the ne_download()
function.
Let’s begin by downloading the data.
library(rnaturalearth)
rivers <- ne_download(
scale = 10,
type = "rivers_lake_centerlines",
category = "physical",
returnclass = "sf"
)
## Reading layer `ne_10m_rivers_lake_centerlines' from data source
## `C:\Users\Bezo\AppData\Local\Temp\RtmpQZ733B\ne_10m_rivers_lake_centerlines.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1473 features and 38 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -164.9035 ymin: -52.15775 xmax: 177.5204 ymax: 75.79348
## Geodetic CRS: WGS 84
rivers
includes rivers and lakes from around the world, which means that plotting it would result in an excess of unnecessary information.
There are two ways to address this, but in either case, we first need to figure out
a bounding box that includes the data set’s coordinates, ideally
with some extra margin. As you may have begun to notice, there often exists a st_*()
function for nearly any mapping related issue you might think of. This situation is o exception: st_bbox()
is the function we can use here.
## xmin ymin xmax ymax
## 5.72319 50.95661 5.76304 50.99156
The first and simplest method for plotting the data involves using the coord_sf()
function with our bbox
object to set the limits. While there’s nothing inherently wrong with this approach, and we’ll use it here, it’s worth nothing that a more refined method would be to use st_crop()
, which directly interacts with our bounding box.
# Alternative solution: river_meuse <- st_crop(rivers, bbox)
ggplot(meuse_sf) +
geom_sf(aes(size = lead)) +
geom_sf(data = rivers) +
coord_sf(
xlim = c(bbox[["xmin"]], bbox[["xmax"]]),
ylim = c(bbox[["ymin"]], bbox[["ymax"]])
) +
theme(legend.position = "right")
Unfortunately, this doesn’t quite achieve our desired result. The resolution of this vector raster is too low, causing the river to appear merely a line. Let’s try a different approach.
Instead, we will plot a raster map that includes the river its surrounding area. To do this, we’ll download the map from stamen maps, using the ggmap package.
To download a map, we need to call get_stadiamap()
and specify a bounding
box. Fortunately for us, we already computed the bounding box of the points in
the last step, so let’s use that.
# extend the ranges slightly
xrng <- extendrange(c(bbox$xmin, bbox$xmax))
yrng <- extendrange(c(bbox$ymin, bbox$ymax))
meuse_raster <- get_stadiamap(
c(
left = xrng[1],
bottom = yrng[1],
right = xrng[2],
top = yrng[2]
),
zoom = 14,
maptype = "stamen_terrain"
)
To construct the map, we will use the ggmap()
function instead of ggplot()
.
Now, however, we encounter a challenge: ggmap and sf unfortunately
do not play so well together2. Given the current situation, it is therefore better to use the standard map features when working with raster maps from ggmap. To do so, we convert our meuse coordinates from simple features back to a standard data frame.
Now, we can combine the raster map with our points by simply overlaying them using the geom_point()
function.
ggmap(meuse_raster) +
geom_point(aes(X, Y, size = lead), data = meuse_df) +
theme(legend.position = "right")
Figure 2: Concentration of lead around Meuse.
Let’s make the plot slightly more interesting by including also data on other heavy metals. One approach is to use facets, but for this to effective, our data must be in a long format. This means having one column for the variable names and another for their respective values.
meuse_df_long <-
meuse_df %>%
pivot_longer(
cadmium:zinc,
names_to = "metal",
values_to = "concentration"
)
We construct our plot as before, faceting on the type of metal.
ggmap(meuse_raster) +
geom_point(
aes(X, Y, size = concentration),
alpha = 0.5,
data = meuse_df_long
) +
facet_wrap("metal") +
theme(legend.position = "right")
Figure 3: Concentration of different heavy metals around the river Meuse in the Netherlands.
Purely for instructional purposes, let’s assume we are considering a cleanup of this heavy metal waste and want to assess its potential impact on the nearby ruins of Kasteel Stein. We can illustrate this by displaying the ruins on the map.
We don’t have the coordinates of the castle readily available, so we’ll need to determine them through geocoding.
We use tidygeocoder, which features a host of geocoding services alternatives. A couple of these are free, for instance the Nominatim API.3
Start by generating a tibble
(or data.frame
) with the names
and addresses that you want to geocode.
library(tidygeocoder)
addresses <- tribble(
~name, ~address,
"Kasteel Stein", "Kasteel Stein, Netherlands"
)
Next, pipe these addresses on to tidygeocoder::geocode()
.4
kasteel_stein <-
addresses %>%
tidygeocoder::geocode(
address, # map to address
method = "osm" # for Nominatim API which is the default value
)
kasteel_stein
## # A tibble: 1 × 4
## name address lat long
## <chr> <chr> <dbl> <dbl>
## 1 Kasteel Stein Kasteel Stein, Netherlands 51.0 5.76
To complete our map, we will enhance our previous plot by adding two layers for our newly geocoded data. These layers will display both a label and a dot to represnt the castle ruins.
ggmap(meuse_raster) +
geom_point(
aes(X, Y, size = concentration),
alpha = 0.5,
data = meuse_df_long
) +
geom_point(
aes(long, lat),
col = "red",
data = kasteel_stein
) +
geom_label(
aes(long, lat, label = name),
vjust = -0.5,
data = kasteel_stein
) +
facet_wrap("metal") +
theme(legend.position = "right")
Figure 4: Concentration of different heavy metals around the river Meuse in the Netherlands.
The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-examples/heavy-metal.Rmd.
The sp
package
is a full-featured solution for visualization data and can be used in its
own right to create maps in R.↩︎
In this case, the issue might be hardly noticeable due to the small scale of the map. The problem arises from the bounding box of the raster being in a different projection, but it results in no noticeable distortion.↩︎
The API is free, but it is also rate-limited at one request per second and not as powerful as some of the other available APIs. If you have a project where you need more serious geocoding, you should probably get a key for one of the other APIs.↩︎
There is a
namespace clash here with ggmap, which has its own ggmap::geocode()
function, so we need to specify which function to use with ::
here.↩︎