Do you remember the animated plots we produced in the introductory lecture for this course based on the Gapminder Hans Rosling animated visualization?
In this worked example, we’ll figure out how to reproduce that plot as both an animated and an interactive visualization.
The dataset that we’ll use is available via the gapminder package. So go ahead and install that package.
This makes the dataset available in an object called
gapminder
. These are the first few rows of the dataset.
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
The variables are self-explanatory.
Let’s jump right in and create a bubble plot, faceted by year (which we have grouped), with population size mapped to the size of the bubbles, and GDP per capita and life expectancy mapped to the x and y axes, respectively.
library(tidyverse)
gapminder %>%
mutate(years = cut_interval(year, length = 5)) %>%
ggplot(aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point(alpha = 0.5) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap("years") +
labs(
y = "Life Expectancy",
x = "GDP per Capita",
size = "population"
)
Life Expectancy and GDP per capita from 1950 to 2010.
Just as we metioned in the first lecture, this visualization is not working out so well for us yet. Let’s make it animated instead. For this, we’ll use the gganimate package. First install the package.
To use the gganimate
package you also need a
renderer to produce animated images. You can use either gifski or ImageMagick. We recommend the
former, as gganimate
defaults to gifski if it is installed,
but either option will work just fine. Run one of the following lines,
or both, to install a renderer.
We build the plot as before, but now we make it animated by adding
the transition_time()
function to the plot and using the
title label to show the current year.
library(gganimate)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = continent)) +
geom_point(alpha = 0.5) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
labs(
title = "Year: {frame_time}", # special glue syntax,
y = "Life Expectancy",
x = "GDP per Capita",
size = "population"
) +
transition_time(year)
GDP per capita and life expectancy for some of the countries of the world.
If you think the plot is still crowded, we could alternatively use
facets to separate continents. Here we also make use of the
country_colors
object that is included in the
gapminder package.
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
geom_point(alpha = 0.5) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(
title = "Year: {frame_time}",
x = "GDP per capita",
y = "Life expectancy"
) +
transition_time(year)
GDP per capita and life expectancy; now with facets!
So far, our plot effectively shows the trends across the world’s continents, but it becomes difficult to use when focusing on a specific country. A remedy for this issue could be to use labels to identify which bubble corresponds to which country. However, the large number of countries means that labelling all of them may not be a particularly good idea.
Instead, we’ll select the two largest countries (at the most recent
timestamp) on each continent to label. First, we will store the names of
these countries in a vector named, large_country_names
.
The following steps first filter the dataset to retain only
observations from the most recent year, identified by
max(year)
. Next, the dataset is grouped by continent. It is
then slices to keep only the countries with the largest and
second-largest population (pop
) values within each group
(continent). Finally the country names are extracted using
pull()
function.
large_country_names <-
gapminder %>%
filter(year == max(year)) %>%
group_by(continent) %>%
slice_max(pop, n = 2) %>%
pull(country)
large_country_names
## [1] Nigeria Egypt United States Brazil China
## [6] India Germany Turkey Australia New Zealand
## 142 Levels: Afghanistan Albania Algeria Angola Argentina Australia ... Zimbabwe
Then, we filter the original dataset to create a separate dataset specifically for our labels.
Now, we put everything together; we also adjust the easing
of aesthetics from linear to cubic in-and-out using
ease_aes()
, to highlight that our data spans only 5-year
intervals. We use geom_label_repel()
from the
ggrepel
package for labeling countries, which helps prevent
label overlap.1
library(ggrepel)
library(scales)
ggplot(gapminder, aes(gdpPercap, lifeExp, size = pop, color = country)) +
geom_point(alpha = 0.5) +
geom_label_repel(
aes(gdpPercap, lifeExp, label = country),
inherit.aes = FALSE,
seed = 1, # important when animating
nudge_x = 6,
nudge_y = -10,
data = large_countries
) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12), labels = scientific) +
scale_x_log10(labels = scientific) +
facet_wrap(~continent) +
labs(
title = "Year: {frame_time}",
x = "GDP per capita",
y = "Life expectancy"
) +
transition_time(year) +
ease_aes("cubic-in-out")
Life expectancy and GDP per capita with countries. The two largest countries by population at the start have been labeled.
Note that I used the scales
package and set
labels = scientific
for the legend and the
x-axis
to shorten large numbers.
Interactive visualizations are particularly effective for exploring
complex datasets like this one. We’ll use the plotly package,
which integrates well with ggplot
, as discussed in the
lecture. Begin by installing the package.
Then, we load the package.
Now, let’s redraw the plot and add an interactive slider for year
selection with plotly
. Note the new frame
mapping added to geom_point()
. This special mapping informs
plotly
about the variable to use for separating the
visualization into frames.
p <- ggplot(gapminder,
aes(gdpPercap, lifeExp, size = pop, color = country))+
geom_point(aes(frame = year), alpha = 0.5) +
scale_colour_manual(values = country_colors, guide = FALSE) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~continent) +
labs(x = "GDP per capita", y = "Life expectancy")
ggplotly(p)
An interactive visualization using plotly for the Gapminder data.
Notice how seamless the transition from ggplot
to
interactive plots facilitated by plotly
.
The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-examples/gapminder.Rmd. 9
Working with labels and animated visualizations presents
a challenge. Here, I had to fine-tune the settings, particularly
nudge_x
and nudge_y
, several times in order to
get something that looks good.↩︎