Gapminder

Do you remember the animated plots we produced in the introductory lecture for this course based on the Gapminder Hans Rosling animated visualization?

In this worked example, we’ll work out how to reproduce that plot as both an animated an interactive visualization.

Animated Visualization

The dataset that we’ll use is available via the gapminder package. So go ahead and install that package.

install.packages("gapminder")

This makes the dataset available in an object called gapminder. These are the first few rows of the dataset.

library(gapminder)

head(gapminder)
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

The variables should be self-explanatory.

Let’s jump right in and create a bubble plot faceted on year (which we cut into groups), with population mapped to the size of the bubbles, and GDP per capita and life expectancy on the x and y axes respectively.

library(tidyverse)

gapminder %>%
  mutate(years = cut_interval(year, length = 5)) %>%
  ggplot(aes(gdpPercap, lifeExp, size = pop, color = continent)) +
  geom_point(alpha = 0.5) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap("years") +
  labs(y = "Life Expectancy",
       x = "GDP per Capita",
       size = "population")
Life Expectancy and GDP per capita from 1950 to 2010.

Life Expectancy and GDP per capita from 1950 to 2010.

Just as we said in the first lecture, this visualization is not (yet) working out so well for us. Let’s make it animated instead. For this, we’ll use the gganimate package. First install the package.

install.packages("gganimate")

To use the gganimate package you also need a renderer to produce animated images. You can use either gifski or ImageMagick. We recommend the former (and gganimate defaults to gifski if it is installed), but either will work just fine. Run one (or both) of the following lines to install a renderer.

install.packages("gifski")
install.packages("magick")

We build the plot as before, but now make it animated by adding the transition_time() function to the plot as well as use the title label to show the current year.

library(gganimate)

ggplot(gapminder,  aes(gdpPercap, lifeExp, size = pop, color = continent)) +
  geom_point(alpha = 0.5) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  labs(title = "Year: {frame_time}", # special glue syntax
       y = "Life Expectancy",
       x = "GDP per Capita",
       size = "population") +
  transition_time(year)
GDP per capita and life expectancy for some of the countries of the world.

GDP per capita and life expectancy for some of the countries of the world.

If you think the plot is still crowded, we could alternatively use facets to separate continents. Here we also make use of the country_colors object that is included in the gapminder package.

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
  geom_point(alpha = 0.5) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(title = "Year: {frame_time}", 
       x = "GDP per capita", 
       y = "Life expectancy") +
  transition_time(year)
GDP per capita and life expectancy; now with facets!

GDP per capita and life expectancy; now with facets!

So far our plot does a good job of showing the trends among the various continents of the world but is hard to use if we are interested in one specific country. A remedy for this can be to use labels to let us identify which bubble belongs to which country. The large number of countries, however, means that it’s not a frightfully good idea to label all of them.

Instead, we’ll pick out the largest two countries (at the latest time stamp) on each continent and label those. First, we store the names of the countries in a vector, large_country_names.

The following steps first filter the dataset so that only observations from the latest year (max(year)) are kept, then groups the dataset by continent, then slices the dataset so that the observations (countries) with the largest and next-to-largest values of population (pop) of each group (continent) are kept, and then finally pulls out (pull()) the country names.

large_country_names <- 
  gapminder %>%
  filter(year == max(year)) %>%
  group_by(continent) %>%
  slice_max(pop, n = 2) %>%
  pull(country)

large_country_names
##  [1] Nigeria       Egypt         United States Brazil        China        
##  [6] India         Germany       Turkey        Australia     New Zealand  
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe

Then we filter the original dataset to create a separate dataset for our labels.

large_countries <-
  filter(gapminder, country %in% large_country_names)

Now we put everything together; this time we also change the easing of aesthetics from linear to cubic in-and-out using ease_aes(), to more clearly show that we actually only have data on a 5-year interval here. We label the countries with geom_label_repel() from the ggrepel package, in order to avoid overlapping labels.1

library(ggrepel)

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
  geom_point(alpha = 0.5) +
  geom_label_repel(
    aes(gdpPercap, lifeExp, label = country),
    inherit.aes = FALSE,
    seed = 1, # important when animating
    nudge_x = 5,
    nudge_y = -10,
    data = large_countries
  ) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(title = "Year: {frame_time}", 
       x = "GDP per capita", 
       y = "Life expectancy") +
  transition_time(year) +
  ease_aes("cubic-in-out")
Life expectancy and GDP per capita with countries. The two largest countries at the start (in terms of population) have been labeled.

Life expectancy and GDP per capita with countries. The two largest countries at the start (in terms of population) have been labeled.

Interactive Visualization

Interactive visualizations are often effective, particularly when we want to visualize a complicated dataset such as this one. Here we’ll use the plotly package to do so, which, as you may recall from the lecture, works well in tandem with ggplot. First install the package.

install.packages("plotly")

Then load the package.

library(plotly)

Now we redraw the plot, adding an interactive slider to select the year using plotly. Make note of the additional mapping that we’ve added to geom_point(), namely frame, which is a special mapping that will let plotly know which variable to use to separate the visualization into frames.

p <- ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
  geom_point(aes(frame = year), alpha = 0.5) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(x = "GDP per capita", 
       y = "Life expectancy")

ggplotly(p)

An interactive visualization using plotly for the Gapminder data.

Notice how seamless the conversion of ggplots into interactive plots can be with the help of plotly.

Source Code

The source code for this document is available at https://github.com/stat-lu/STAE04/blob/master/worked-example-gapminder.Rmd.


  1. Working with labels and animated visualizations is something of a challenge. Here I’ve had to tweak the settings (mostly nudge_x and nudge_y) several times in order to get something that looks good.↩︎