class: middle, center, title-slide # The Grammar of Graphics and ggplot2 ## STAE04: Data Visualization ### Johan Larsson ### The Department of Statistics, Lund University --- ## The Grammar of Graphics .pull-left[ visualizations can be complicated and difficult to describe Leland Wilkinson <a name=cite-wilkinson2005></a>([Wilkinson, 2005](#bib-wilkinson2005)) **The Grammar of Graphics**: an attempt to formalize the basic principles of visualizations we will use Hadley Wickham's *Layered Grammar of Graphics* <a name=cite-wickham2010></a>([Wickham, 2010](http://www.tandfonline.com/doi/abs/10.1198/jcgs.2009.07098)) ] .pull-right[ <img src="images/wilkinson.jpg" width="200px" style="display: block; margin: auto;" /> <img src="images/wickham.jpg" width="200px" style="display: block; margin: auto;" /> ] --- ## The Layered Grammar of Graphics The layered grammar of graphics defines the components of a plot as * layers, * scales, * a coordinate system, and * facets. It also includes a **hierarchy of defaults**. .pull-left-60[ ### ggplot2 The grammar of graphics is central to the R package ggplot2 (part of tidyverse), which is the focus of this course. ] .pull-right-40[ <img src="images/ggplot2.png" width="75%" style="display: block; margin: auto;" /> ] --- ## Layers .pull-left[ A layer consists of * data and aesthetic mapping, * a statistical transformation (stat), * a geometric object (geom), and * a position adjustment. The plot to the right uses two layers: a density estimate and points. ```r mpg %>% ggplot(aes(cty, displ)) + geom_point() + geom_density_2d() ``` ] .pull-right[.vcenter[ <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-4-1.png" width="360" style="display: block; margin: auto;" /> ]] --- ## Layers: Data and Mappings .pull-left[ Any visualization needs a dataset (here `mpg` from ggplot2). We **map** variables (in the dataset) to aesthetics (in the plot), such as * city miles per gallon (cty) to the x axis * engine displacement (displ) to the y axis * car class to color ```r library(tidyverse) mpg %>% ggplot(aes(x = cty, y = displ, color = class)) + geom_point() ``` ] .pull-right[.vcenter[ <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-5-1.png" width="360" style="display: block; margin: auto;" /> ]] --- ## Layers: Stats .pull-left-40[ Statistical transformations modify or summarize data. can be used to smooth, summarize, or modify A stat can add **new** variables. Stats have names of the form `stat_*`. can also be specified directly in `geom_*` functions ] .pull-right-60[ ```r ggplot(faithful, aes(waiting, eruptions)) + geom_point() + stat_density_2d() ``` <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-6-1.png" width="360" style="display: block; margin: auto;" /> ] --- ## Layers: Geoms .pull-left[ **Geoms** decide what geometrical objects are used when plotting. Geoms have names of the form `geom_*`. ```r # construct base plot p <- economics %>% head(25) %>% ggplot(aes(date, pce)) # three different geoms p + geom_line() p + geom_point() p + geom_area() ``` ] .pull-right[ <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-7-1.png" width="360" style="display: block; margin: auto;" /><img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-7-2.png" width="360" style="display: block; margin: auto;" /><img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-7-3.png" width="360" style="display: block; margin: auto;" /> ] --- ## Layers: Position Adjustments sometimes need to modify positions of geoms, such as stacking bar geoms side-by-side or jittering points that overlap .pull-left[ ```r # points overlap ggplot(mpg, aes(hwy, drv)) + geom_point() ``` <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-8-1.png" width="360" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r # jitter to avoid overlap ggplot(mpg, aes(hwy, drv)) + geom_point( * position = position_jitter( * width = 0, * height = 0.2) ) ``` <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-9-1.png" width="360" style="display: block; margin: auto;" /> ] --- ## Scales controls **how** variables are mapped to aesthetics **Guides** is the inverse of a scale, showing how to read the scale. ```r p <- ggplot(msleep, aes(brainwt, sleep_total)) + geom_point() ``` .pull-left[ ```r p # + scale_x_continuous() ``` <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-11-1.png" width="316.8" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r p + scale_x_log10() ``` <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-12-1.png" width="316.8" style="display: block; margin: auto;" /> ] --- ## Coordinate Systems (coord) .pull-left-60[ coordinate systems controls the position of objects on the plot ```r # cartesian coordinates (bar chart) mutate(mpg, cyl = factor(cyl)) %>% ggplot(aes(x = cyl, fill = cyl)) + geom_bar(show.legend = FALSE) + * coord_cartesian() + ggtitle("cartesian") ``` ```r # polar coordinates (pie chart) mutate(mpg, cyl = factor(cyl)) %>% ggplot(aes(x = "1", fill = cyl)) + geom_bar(width = 1) + * coord_polar(theta = "y") + ggtitle("polar") ``` ] .pull-right-40[ <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-15-1.png" width="252" style="display: block; margin: auto;" /> <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-16-1.png" width="252" style="display: block; margin: auto;" /> ] --- ## Faceting powerful tool that divide visualizations into small multiples ggplot2 provides `facet_grid()` and `facet_wrap()` ```r d <- as_tibble(Titanic) %>% pivot_wider(names_from = Survived, values_from = n) %>% group_by(Class, Sex) %>% summarise(survival_rate = sum(Yes)/sum(Yes + No)) ``` .pull-left[ ```r ggplot( d, aes(x = Class, y = survival_rate) ) + geom_col() + * facet_wrap(vars(Sex)) ``` ] .pull-right[ <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-18-1.png" width="360" style="display: block; margin: auto;" /> ] --- ## A Hierarchy of Defaults .pull-left[ having to supply all the parts of the grammar with each plot would be **very** tiresome thankfully, the grammar of graphics (and ggplot2) comes with a **hierarchy of defaults** <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-19-1.png" width="360" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r ggplot(diamonds, aes(carat, price)) + geom_point() ``` is equivalent to ```r ggplot(diamonds, aes(carat, price)) + layer( data = diamonds, mapping = aes(x = carat, y = price), geom = "point", stat = "identity", position = "identity" ) + scale_y_continuous() + scale_x_continuous() + coord_cartesian() ``` ] --- ## Using the Grammar Understanding the grammar of graphics is the key to understanding ggplot2. Avoid thinking about visualizations as a bag of tricks. makes it easy to make large changes to the plot -- ### Taking a Step Back .pull-left[ We have introduced many new concepts; dont panic! You are not expected to understand all of the code on these slides. In the next section, we will take a step back and begin visualizing data with a single variable. ] .pull-right[ <img src="04-the-grammar-of-graphics-and-ggplot2_files/figure-html/unnamed-chunk-22-1.png" width="360" style="display: block; margin: auto;" /> ] --- ## References <a name=bib-wickham2010></a>[Wickham, H.](#cite-wickham2010) (2010). "A Layered Grammar of Graphics". In: _Journal of computational and graphical statistics_ 19.1, pp. 3-28. ISSN: 1061-8600. DOI: [10.1198/jcgs.2009.07098](https://doi.org/10.1198%2Fjcgs.2009.07098). URL: [http://www.tandfonline.com/doi/abs/10.1198/jcgs.2009.07098](http://www.tandfonline.com/doi/abs/10.1198/jcgs.2009.07098) (visited on mar. 13, 2020). <a name=bib-wilkinson2005></a>[Wilkinson, L.](#cite-wilkinson2005) (2005). _The Grammar of Graphics_. 2nd edition. New York: Springer. ISBN: 978-0-387-24544-7.