Gapminder

Do you remember the animated plots we produced in the introductory lecture for this course based on the Gapminder Hans Rosling animated visualization?

In this worked example, we’ll figure out how to reproduce that plot as both an animated and an interactive visualization.

Animated Visualization

The dataset that we’ll use is available via the gapminder package. So go ahead and install that package.

install.packages("gapminder")

This makes the dataset available in an object called gapminder. These are the first few rows of the dataset.

library(gapminder)

head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

The variables are self-explanatory.

Let’s jump right in and create a bubble plot, faceted by year (which we have grouped), with population size mapped to the size of the bubbles, and GDP per capita and life expectancy mapped to the x and y axes, respectively.

library(tidyverse)

gapminder %>%
  mutate(years = cut_interval(year, length = 5)) %>%
  ggplot(aes(gdpPercap, lifeExp, size = pop, color = continent)) +
  geom_point(alpha = 0.5) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap("years") +
  labs(
    y = "Life Expectancy",
    x = "GDP per Capita",
    size = "population"
  )
Life Expectancy and GDP per capita from 1950 to 2010.

Life Expectancy and GDP per capita from 1950 to 2010.

Just as we metioned in the first lecture, this visualization is not working out so well for us yet. Let’s make it animated instead. For this, we’ll use the gganimate package. First install the package.

install.packages("gganimate")

To use the gganimate package you also need a renderer to produce animated images. You can use either gifski or ImageMagick. We recommend the former, as gganimate defaults to gifski if it is installed, but either option will work just fine. Run one of the following lines, or both, to install a renderer.

install.packages("gifski")
install.packages("magick")

We build the plot as before, but now we make it animated by adding the transition_time() function to the plot and using the title label to show the current year.

library(gganimate)

ggplot(gapminder,  aes(gdpPercap, lifeExp, size = pop, color = continent)) +
  geom_point(alpha = 0.5) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  labs(
    title = "Year: {frame_time}", # special glue syntax,
    y = "Life Expectancy",
    x = "GDP per Capita",
    size = "population"
  ) +
  transition_time(year)
GDP per capita and life expectancy for some of the countries of the world.

GDP per capita and life expectancy for some of the countries of the world.

If you think the plot is still crowded, we could alternatively use facets to separate continents. Here we also make use of the country_colors object that is included in the gapminder package.

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
  geom_point(alpha = 0.5) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(
    title = "Year: {frame_time}",
    x = "GDP per capita",
    y = "Life expectancy"
  ) +
  transition_time(year)
GDP per capita and life expectancy; now with facets!

GDP per capita and life expectancy; now with facets!

So far, our plot effectively shows the trends across the world’s continents, but it becomes difficult to use when focusing on a specific country. A remedy for this issue could be to use labels to identify which bubble corresponds to which country. However, the large number of countries means that labelling all of them may not be a particularly good idea.

Instead, we’ll select the two largest countries (at the most recent timestamp) on each continent to label. First, we will store the names of these countries in a vector named, large_country_names.

The following steps first filter the dataset to retain only observations from the most recent year, identified by max(year). Next, the dataset is grouped by continent. It is then slices to keep only the countries with the largest and second-largest population (pop) values within each group (continent). Finally the country names are extracted using pull() function.

large_country_names <-
  gapminder %>%
  filter(year == max(year)) %>%
  group_by(continent) %>%
  slice_max(pop, n = 2) %>%
  pull(country)

large_country_names
##  [1] Nigeria       Egypt         United States Brazil        China        
##  [6] India         Germany       Turkey        Australia     New Zealand  
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe

Then, we filter the original dataset to create a separate dataset specifically for our labels.

large_countries <- filter(gapminder, country %in% large_country_names)

Now, we put everything together; we also adjust the easing of aesthetics from linear to cubic in-and-out using ease_aes(), to highlight that our data spans only 5-year intervals. We use geom_label_repel() from the ggrepel package for labeling countries, which helps prevent label overlap.1

library(ggrepel)
library(scales)

ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
  geom_point(alpha = 0.5) +
  geom_label_repel(
    aes(gdpPercap, lifeExp, label = country),
    inherit.aes = FALSE,
    seed = 1, # important when animating
    nudge_x = 6,
    nudge_y = -10,
    data = large_countries
  ) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12), labels = scientific) +
  scale_x_log10(labels = scientific) +
  facet_wrap(~continent) +
  labs(
    title = "Year: {frame_time}",
    x = "GDP per capita",
    y = "Life expectancy"
  ) +
  transition_time(year) +
  ease_aes("cubic-in-out")
Life expectancy and GDP per capita with countries. The two largest countries by population at the start have been labeled.

Life expectancy and GDP per capita with countries. The two largest countries by population at the start have been labeled.

Note that I used the scales package and set labels = scientific for the legend and the x-axis to shorten large numbers.

Interactive Visualization

Interactive visualizations are particularly effective for exploring complex datasets like this one. We’ll use the plotly package, which integrates well with ggplot, as discussed in the lecture. Begin by installing the package.

install.packages("plotly")

Then, we load the package.

library(plotly)

Now, let’s redraw the plot and add an interactive slider for year selection with plotly. Note the new frame mapping added to geom_point(). This special mapping informs plotly about the variable to use for separating the visualization into frames.

p <- ggplot(gapminder, 
            aes(gdpPercap, lifeExp, size = pop, color = country))+
  geom_point(aes(frame = year), alpha = 0.5) +
  scale_colour_manual(values = country_colors, guide = FALSE) +
  scale_size(range = c(2, 12)) +
  scale_x_log10() +
  facet_wrap(~continent) +
  labs(x = "GDP per capita", y = "Life expectancy")

ggplotly(p)

An interactive visualization using plotly for the Gapminder data.

Notice how seamless the transition from ggplot to interactive plots facilitated by plotly.

Source Code

The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-examples/gapminder.Rmd. 9


  1. Working with labels and animated visualizations presents a challenge. Here, I had to fine-tune the settings, particularly nudge_x and nudge_y, several times in order to get something that looks good.↩︎