In this example, we’ll take a look at an experiment on cold tolerance of the grass species Echinochloa crus-galli.
Figure 1: A specimen of Echinochloa crus-galli.
The data we are going to look at is available in the CO2
data set from data
sets (a package that is part of the base R distribution). There are five
variables:
Plant
: an ordered factor giving a unique identifier for each plant
Type
: a factor with levels giving the origin of the plant (Quebec or Mississippi)
Treatment
: a factor indicating if the sample was chilled or not
conc
: a numeric vector of ambient carbon dioxide concentrations (mL/L)
uptake
: a numeric vector of carbon dioxide uptake rates (umol/m^2 sec)
The goal of the experiment was to investigate how carbon dioxide uptake is
affected based on the ambient carbon dioxide concentration (conc
), the origin
of the plant Type
, and Treatment
(whether the plant was chilled overnight
before the experiment started or not).
Let’s start with the basics and just look at a scatter plot of the uptake and ambient
concentration. We’ll map uptake to the y axis, and concentration to
the x axis. Before starting, we’ll rename some variables for clearer default axis labels and convert Plant
from an ordered to an unordered variable, as it’s not ordinal in nature. We’ll call the new data set co2
.
library(tidyverse)
co2 <-
CO2 %>%
as_tibble() %>%
rename(Concentration = conc, Origin = Type, Uptake = uptake) %>%
mutate(Plant = factor(Plant, ordered = FALSE))
ggplot(co2, aes(Concentration, Uptake)) +
geom_point()
Figure 2: CO2 uptake and ambient CO2 concentration.
However, using only the point geom here is not a good design choice, since our data is actually matched, meaning that the observations belong to specific plants, and there are multiple measurements for each plant. There are several ways to show this. For instance, we could use color, facets, or simply a line geom with mappings to the group aesthetic to address this issue. In this case, we will try the latter option, but feel free to experiment with the other ideas as well.
Figure 3: Ambient CO2 concentration and uptake for 12 plants in different conditions and from different origins.
Higher concentrations of CO2 evidently result in increased uptake of the same, which is not surprising. What kind of relationship would you describe this as? Please, take a moment to think about this.
Until now, we haven’t included the other aspects of the experiment in our visualization. Let’s do that now. We will start by incorporating the cooling condition by mapping it to color. Also, it’s worth nothing that we have a natural color association here, with blue representing the cool condition, which we will certainly use.
ggplot(co2, aes(Concentration, Uptake, group = Plant, color = Treatment)) +
geom_point() +
geom_line() +
scale_color_manual(values = c("dark orange", "steel blue"))
Figure 4: Ambient CO2 concentration and uptake for 12 plants under two different treatments (chilled overnight before the experiment or not).
It appears that the uptake is generally improved in the non-chilled arm of the experiment, but there are two clear clusters here. Let’s explore whether the origin of the plants might help explain this bi-modal distribution.
ggplot(co2, aes(Concentration, Uptake, group = Plant, color = Treatment)) +
geom_point() +
geom_line() +
facet_wrap("Origin") + # or vars(Origin) or ~ Origin
scale_color_manual(values = c("dark orange", "steel blue"))
Figure 5: Ambient CO2 concentration and uptake for 12 plants under two different treatments (chilled overnight before the experiment or not) and from two different origins.
Yes, indeed, it appears that the origin of the plants explains this precisely. As we know, Mississippi is a much warmer state than Quebec, so it makes sense that we observe this adaptation in the plants in Quebec.
This example demonstrates the importance of exploring multivariate relationships by effectively conditioning your visualizations on certain variables.
The source code for this document is available at https://github.com/stat-lu/dataviz/blob/main/worked-examples/plants.Rmd.