class: middle, center, title-slide # One Variable ## STAE04: Data Visualization ### Johan Larsson ### The Department of Statistics, Lund University --- ## Visualizing a Single Variable best place to start: continuous or discrete one-dimensional data often need **statistical transformation** or **position adjustment** -- Dotplots are great for small and discrete data. ```r library(tidyverse) ggplot(faithful, aes(waiting)) + * geom_dotplot(binwidth = 1) ``` <div class="figure" style="text-align: center"> <img src="05-one-variable_files/figure-html/unnamed-chunk-1-1.png" alt="Duration between eruptions of the Old Faithful geyser." width="504" /> <p class="caption">Duration between eruptions of the Old Faithful geyser.</p> </div> --- ## Histograms .pull-left[ For larger datasets it is often better to use a statistical transformation. Histograms accomplish this by **binning** and **summarizing** (counting). **caution:** choice of bins may introduce bias ] .pull-right[ ```r faithful %>% ggplot(aes(waiting)) + * geom_histogram(bins = 15) ``` <div class="figure" style="text-align: center"> <img src="05-one-variable_files/figure-html/unnamed-chunk-2-1.png" alt="Histogram of Old Faithful data." width="360" /> <p class="caption">Histogram of Old Faithful data.</p> </div> ] --- ## Density Plots attractive for variables that are continuous but sensitive to settings (type of kernel, bandwidth, and more) .pull-left[ ```r faithful %>% ggplot(aes(waiting)) + # tighter bandwidth * geom_density(bw = 5) ``` <img src="05-one-variable_files/figure-html/unnamed-chunk-3-1.png" width="360" style="display: block; margin: auto;" /> ] .pull-right[ ```r faithful %>% ggplot(aes(waiting)) + # longer bandwidth * geom_density(bw = 10) ``` <img src="05-one-variable_files/figure-html/unnamed-chunk-4-1.png" width="360" style="display: block; margin: auto;" /> ] often good idea to add a **rug** layer (`geom_rug()`) to density plots --- ## Boxplots .pull-left[ most common type: - **median** as middle bar, - 1st and 3rd **quartiles** as edges of box, - whiskers to last observation within 1.5 times the inter-quartile range, and - points (or stars) beyond whiskers useful when visualizing categories or many variables at once more abstraction than histogram or densityplot not suitable for data with multiple modes ] .pull-right[ ```r faithful %>% ggplot(aes(waiting)) + * geom_boxplot() ``` <div class="figure" style="text-align: center"> <img src="05-one-variable_files/figure-html/unnamed-chunk-5-1.png" alt="Boxplot of the Old Faithful data, which fails completely in accurately describing the distribution." width="360" /> <p class="caption">Boxplot of the Old Faithful data, which fails completely in accurately describing the distribution.</p> </div> ] --- ## Violin Plots a type of density plot, often used instead of boxplot same pitfalls as with other density plots ```r faithful %>% ggplot(aes(waiting, y = 1)) + * geom_violin() ``` <div class="figure" style="text-align: center"> <img src="05-one-variable_files/figure-html/unnamed-chunk-6-1.png" alt="geom_violin() does not work with a single variable, so we need to use the trick y = 1 here." width="468" /> <p class="caption">geom_violin() does not work with a single variable, so we need to use the trick y = 1 here.</p> </div> --- ## Several Variables at Once boxplots and violin plots are useful for visualizing multiple variables at once need to first transform data into **long** format ```r *pivot_longer(mpg, c(displ, cty, hwy)) %>% ggplot(aes(name, value)) + geom_boxplot() ``` <img src="05-one-variable_files/figure-html/unnamed-chunk-7-1.png" width="360" style="display: block; margin: auto;" /> --- ### Facets mostly used to visualize **small multiples** by splitting the plot into facets on a categorical variable but can also be used to visualize multiple variables at once for this use case, it is normally good to set `scales = "free_x"` (or sometimes `"free"`) ```r pivot_longer(mpg, c(displ, cty, hwy)) %>% ggplot(aes(value)) + geom_histogram(bins = 10) + * facet_wrap("name", scales = "free_x") ``` <img src="05-one-variable_files/figure-html/unnamed-chunk-8-1.png" width="648" style="display: block; margin: auto;" />