ggplot2

ggplot2 is part of the tidyverse so you can simply call

library(tidyverse) 

to load ggplot2.

ggplot()

The main function of ggplot is the ggplot() function. It has two main arguments that you need to be aware of:

data
the (default) dataset to use for the plot
mapping
the aesthetic mappings

When calling ggplot() we typically don’t call these arguments by their name and instead simply supply them in order.

data

The data argument obviously needs a dataset and this should usually come in the form of a data.frame or a tibble. Let’s starts with a very simple data set, cars, that contains only two variables: speed and stopping distance of 50 different cars.

Before doing anything else, always take a little time to look at bit of the data. If the dataset has relatively few variables (less than ten), the head() function is usually your best option; it simply displays the first few rows of the data.

head(cars)
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

The second argument of heads() decides how many rows to show. Modify this value to show the first 10 cars of the dataset.

glimpse() is very useful when you have more variables (since it is more compact). Call glimpse() on cars and notice that it lists the variables in a table along with the first values of each variable.

mapping

The mapping argument should be supplied with a call to the aes() function that lists the mappings between variables in your data and aesthetics of the plot. Which aesthetics you choose to map to depends on what geoms you intend to use. But for now, let’s assume that we want to map speed (speed) to the x axis, and stopping distance (dist) to the y axis. To do so, we simply use aes(x = speed, y = dist) as the input to the mapping argument. Because the first two arguments of aes() are x and y, we can do without calling them explicitly.

Let’s put this together:

ggplot(cars, aes(speed, dist))

Why does this result in a blank canvas? After all, we did tell ggplot to map speed to the x axis and stopping distance to the y axis?

The answer is that we still haven’t told ggplot2 which layers we want to use.

Layers

The aesthetic mapping is not enough by itself—we also need a layer to display our data. In ggplot2, we accomplish this by adding (stacking) layers to the output of ggplot(). There are several ways to construct layers in ggplot, but the simplest and most common approach is to use the functions named geom_*() (such as geom_point()).

The names of these geom_*() functions are actually misnomers, given that the they in fact set up entire layers and not just the geoms.

To provide a layer with the point geom to the plot, you simply need to call the following lines.

ggplot(cars, aes(speed, dist)) +
  geom_point()

Notice the use of + here to add layers to the canvas that we started out with. Also observe that we didn’t need to specify what data to use or which mappings we wanted in the call to geom_point()—they were inherited from the ggplot() call. If you want to avoid this type of inheritance (which you will want when constructing plots from different data sources), you can do so by specifying inherit.aes = FALSE in the call to geom_point().

Try to switch geom for your plot. Replace geom_point() by geom_line().

Now let’s try using geom_rect() instead.

ggplot(cars, aes(speed, dist)) +
  geom_rect()

Well that didn’t work, but why not? Well, the reason is simply that geom_rect() uses the rectangle geom and a rectangle needs more aesthetics than just x and y coordinates.

Aesthetics

Look at the documentation of the geom_point() function. Under Aesthetics we can see that this layer understands the following aesthetics:

  • x
  • y
  • alpha
  • colour
  • fill
  • group
  • shape
  • size
  • stroke

and that only the two first are in fact required. The remaining aesthetics have been supplied defaults, such as alpha = 1 (no transparency) and colour = "black" (black points). But these aesthetics can also be changed, either by mapping them to another variable in our data or by manually adjusting them directly.

When you change an aesthetic by specifying a value (and not a mapping) you simply specify that aesthetic in the call to the geom you would like to change. Let’s change the color of the points to navy blue, increase the size, and change shape. Doing so is a breeze with ggplot2:

ggplot(cars, aes(speed, dist)) +
  geom_point(colour = "navy",
             size = 1.8,
             shape = 4)

Take some time to play around with the various aesthetics. Switch geom and try to flip the axes. Can you figure out what the stroke aesthetic does?

It might be worth noting that some of the aesthetics used here have synonyms that you will likely come across every now and then, for instance cex = size, col = color = colour, and pch = shape, which are there mostly for reasons that are historical or have to do with compatibility.

Source Code

The source code for this document is available at https://github.com/stat-lu/STAE04/blob/master/worked-example-first-steps-with-ggplot2.Rmd.