Do you remember the animated plots we produced in the introductory lecture for this course based on the Gapminder Hans Rosling animated visualization?
In this worked example, we’ll work out how to reproduce that plot as both an animated an interactive visualization.
The dataset that we’ll use is available via the gapminder package. So go ahead and install that package.
install.packages("gapminder")
This makes the dataset available in an object called gapminder
. These are the first few rows of the dataset.
library(gapminder)
head(gapminder)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
The variables should be self-explanatory.
Let’s jump right in and create a bubble plot faceted on year (which we cut into groups), with population mapped to the size of the bubbles, and GDP per capita and life expectancy on the x and y axes respectively.
library(tidyverse)
%>%
gapminder mutate(years = cut_interval(year, length = 5)) %>%
ggplot(aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point(alpha = 0.5) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap("years") +
labs(y = "Life Expectancy",
x = "GDP per Capita",
size = "population")
Life Expectancy and GDP per capita from 1950 to 2010.
Just as we said in the first lecture, this visualization is not (yet) working out so well for us. Let’s make it animated instead. For this, we’ll use the gganimate package. First install the package.
install.packages("gganimate")
To use the gganimate package you also need a renderer to produce animated images. You can use either gifski or ImageMagick. We recommend the former (and gganimate defaults to gifski if it is installed), but either will work just fine. Run one (or both) of the following lines to install a renderer.
install.packages("gifski")
install.packages("magick")
We build the plot as before, but now make it animated by adding the transition_time()
function to the plot as well as use the title label to show the current year.
library(gganimate)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point(alpha = 0.5) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
labs(title = "Year: {frame_time}", # special glue syntax
y = "Life Expectancy",
x = "GDP per Capita",
size = "population") +
transition_time(year)
GDP per capita and life expectancy for some of the countries of the world.
If you think the plot is still crowded, we could alternatively use facets to separate continents. Here we also make use of the country_colors
object that is included in the gapminder package.
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
geom_point(alpha = 0.5) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(title = "Year: {frame_time}",
x = "GDP per capita",
y = "Life expectancy") +
transition_time(year)
GDP per capita and life expectancy; now with facets!
So far our plot does a good job of showing the trends among the various continents of the world but is hard to use if we are interested in one specific country. A remedy for this can be to use labels to let us identify which bubble belongs to which country. The large number of countries, however, means that it’s not a frightfully good idea to label all of them.
Instead, we’ll pick out the largest two countries (at the latest time stamp) on each continent and label those. First, we store the names of the countries in a vector, large_country_names
.
The following steps first filter the dataset so that only observations from the latest year (max(year)
) are kept, then groups the dataset by continent, then slices the dataset so that the observations (countries) with the largest and next-to-largest values of population (pop
) of each group (continent) are kept, and then finally pulls out (pull()
) the country names.
large_country_names <- gapminder %>%
filter(year == max(year)) %>%
group_by(continent) %>%
slice_max(pop, n = 2) %>%
pull(country)
large_country_names
## [1] Nigeria Egypt United States Brazil China
## [6] India Germany Turkey Australia New Zealand
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe
Then we filter the original dataset to create a separate dataset for our labels.
large_countries <- filter(gapminder, country %in% large_country_names)
Now we put everything together; this time we also change the easing of aesthetics from linear to cubic in-and-out using ease_aes()
, to more clearly show that we actually only have data on a 5-year interval here. We label the countries with geom_label_repel()
from the ggrepel package, in order to avoid overlapping labels.1
library(ggrepel)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
geom_point(alpha = 0.5) +
geom_label_repel(
aes(gdpPercap, lifeExp, label = country),
inherit.aes = FALSE,
seed = 1, # important when animating
nudge_x = 5,
nudge_y = -10,
data = large_countries
+
) scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(title = "Year: {frame_time}",
x = "GDP per capita",
y = "Life expectancy") +
transition_time(year) +
ease_aes("cubic-in-out")
Life expectancy and GDP per capita with countries. The two largest countries at the start (in terms of population) have been labeled.
Interactive visualizations are often effective, particularly when we want to visualize a complicated dataset such as this one. Here we’ll use the plotly package to do so, which, as you may recall from the lecture, works well in tandem with ggplot. First install the package.
install.packages("plotly")
Then load the package.
library(plotly)
Now we redraw the plot, adding an interactive slider to select the year using plotly. Make note of the additional mapping that we’ve added to geom_point()
, namely frame
, which is a special mapping that will let plotly know which variable to use to separate the visualization into frames.
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
p <- geom_point(aes(frame = year), alpha = 0.5) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(x = "GDP per capita",
y = "Life expectancy")
ggplotly(p)
An interactive visualization using plotly for the Gapminder data.
Notice how seamless the conversion of ggplots into interactive plots can be with the help of plotly.
The source code for this document is available at https://github.com/stat-lu/STAE04/blob/master/worked-example-gapminder.Rmd.
Working with labels and animated visualizations is something of a challenge. Here I’ve had to tweak the settings (mostly nudge_x
and nudge_y
) several times in order to get something that looks good.↩︎