class: middle, center, title-slide # Data-Ink ## STAE04: Data Visualization ### Johan Larsson ### The Department of Statistics, Lund University --- ## Data-Ink Consider how you are spending the **ink** in a visualization. data-ink <a name=cite-tufte2001></a>([Tufte, 2001](#bib-tufte2001)): the ink used to draw data $$ `\begin{aligned} \text{data-ink ratio} &= \frac{\text{data-ink}}{\text{total ink used in visualization}}\\ &= 1 - \text{proportion of a graphic that can be erased} \end{aligned}` $$ -- ### Tufte's Principles .pull-left[ * Above all else show the data. * Maximize the data-ink ratio. * Erase non-data ink. * Erase redundant data-ink. * Revise and edit. ] .pull-right[ <img src="images/tufte.png" width="80%" style="display: block; margin: auto;" /> ] --- ## Above All Else Show the Data <div class="figure" style="text-align: center"> <img src="images/playfair-england-usa.jpg" alt="William Playfair, The Commercial and Political Atlas (1786). Public domain." width="80%" /> <p class="caption">William Playfair, The Commercial and Political Atlas (1786). Public domain.</p> </div> --- ## Above All Else Show the Data <div class="figure" style="text-align: center"> <img src="images/playfair-nordic-england.jpg" alt="William Playfair, The Commercial and Political Atlas (1786). Public domain." width="80%" /> <p class="caption">William Playfair, The Commercial and Political Atlas (1786). Public domain.</p> </div> --- ## Maximize the Data-Ink Ratio Non-data-ink may distract your audience from what really matters: the data. Maximizing the data-ink (within reason) is a good rule-of-thumb. .pull-left[ <div class="figure" style="text-align: center"> <img src="06-data-ink_files/figure-html/unnamed-chunk-4-1.png" alt="low data-ink ratio" width="360" /> <p class="caption">low data-ink ratio</p> </div> ] -- .pull-right[ <div class="figure" style="text-align: center"> <img src="06-data-ink_files/figure-html/unnamed-chunk-5-1.png" alt="high data-ink ratio" width="360" /> <p class="caption">high data-ink ratio</p> </div> ] --- ## Redundant Data-Ink **redundant** data-ink: ink displaying information already shown by other ink .pull-left[ <div class="figure" style="text-align: center"> <img src="06-data-ink_files/figure-html/unnamed-chunk-6-1.png" alt="lots of redundant ink" width="360" /> <p class="caption">lots of redundant ink</p> </div> ] -- .pull-right[ <div class="figure" style="text-align: center"> <img src="06-data-ink_files/figure-html/unnamed-chunk-7-1.png" alt="no redundancy" width="360" /> <p class="caption">no redundancy</p> </div> ] **note:** redundancy is not always bad --- ## ggthemes **ggthemes** <a name=cite-arnold2019></a>([Arnold, 2019](https://CRAN.R-project.org/package=ggthemes)) provides a theme and some custom plots for ggplot2 .pull-left[ ```r library(ggthemes) ggplot(mpg, aes(drv, cty)) + * geom_tufteboxplot() + * theme_tufte(base_size = 16) ``` <img src="06-data-ink_files/figure-html/unnamed-chunk-8-1.png" width="360" style="display: block; margin: auto;" /> ] .pull-right[ ```r ggplot(mpg, aes(drv, cty)) + geom_boxplot( stat = "fivenumber" ) ``` <img src="06-data-ink_files/figure-html/unnamed-chunk-9-1.png" width="360" style="display: block; margin: auto;" /> ] --- ## References <a name=bib-arnold2019></a>[Arnold, J. B.](#cite-arnold2019) (2019). _ggthemes: Extra Themes, Scales and Geoms for 'ggplot2'_. R package version 4.2.0. URL: [https://CRAN.R-project.org/package=ggthemes](https://CRAN.R-project.org/package=ggthemes). <a name=bib-tufte2001></a>[Tufte, E. R.](#cite-tufte2001) (2001). _The Visual Display of Quantitative Information_. Second. Cheshire, CT, USA: Graphics Press. ISBN: 978-1-930824-13-3.