class: middle, center, title-slide # Multivariate Data ## STAE04: Data Visualization ### Johan Larsson ### The Department of Statistics, Lund University --- ## Visualizing Multivariate Data visualizations often most intriguing when they show multiple variables <div class="figure" style="text-align: center"> <img src="images/minard-migration.jpg" alt="The Emigrants of the World, Charles Joseph Minard, 1858." width="78%" /> <p class="caption">The Emigrants of the World, Charles Joseph Minard, 1858.</p> </div> --- ## 3D .pull-left-40[ seems like a logical choice for a third continuous variable unfortunately, 3D visualizations are notoriously hard to read as a thumb rule, **avoid 3D visualizations** ] .pull-right-60[ <div class="figure" style="text-align: center"> <img src="10-multivariate-data_files/figure-html/unnamed-chunk-2-1.png" alt="A 3D scatter plot." width="504" /> <p class="caption">A 3D scatter plot.</p> </div> ] --- <div class="figure" style="text-align: center"> <img src="10-multivariate-data_files/figure-html/unnamed-chunk-3-1.png" alt="A particularly bad example of a 3D plot. 3D bar charts are never a good idea. Data shows death rates in Virginia in 1940." width="576" /> <p class="caption">A particularly bad example of a 3D plot. 3D bar charts are never a good idea. Data shows death rates in Virginia in 1940.</p> </div> --- ### 3D Visualizations Are Not All Bad! 3D visualizations may be useful in a few situations: 1. interactive or animated 3D plots 2. multiple plots showing multiple perspectives 3. only general pattern matters: reading individual points is not important 4. data actually represents three-dimensional location data (topography) -- <div class="figure" style="text-align: center"> <img src="10-multivariate-data_files/figure-html/unnamed-chunk-4-1.png" alt="The topography of the Maunga Whau Volcano through different perspectives." width="720" /> <p class="caption">The topography of the Maunga Whau Volcano through different perspectives.</p> </div> --- ## Color Color is often a good choice, especially with categorical variables. ```r mtcars <- rownames_to_column(mtcars, "name") %>% mutate_at(vars(cyl, gear), as.factor) *ggplot(mtcars, aes(disp, hp, color = cyl)) + geom_point() ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-5-1.png" width="345.6" style="display: block; margin: auto;" /> --- ### Color and Continuous Variables sometimes fine to map color to continuous variables, particularly when data is in a grid format ```r # see source code for dataset ggplot(volcano_long, aes(longitude, latitude, fill = height)) + * geom_tile() + coord_fixed() # makes sense for latitude, longitude ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-7-1.png" width="540" style="display: block; margin: auto;" /> --- ### Scatter Plots Mapping a continuous variable to color in a scatter plot is less appealing. ```r ggplot(mtcars, aes(disp, hp, color = mpg)) + geom_point() ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-8-1.png" width="432" style="display: block; margin: auto;" /> --- ## Size Mapping to size only makes sense for continuous variables! .pull-left-40[ ### Area most common use case: bubble plots caveat: comparing areas is **hard** ] .pull-right-60[ ```r ggplot(mtcars, aes(disp, hp, size = mpg)) + geom_point() ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-9-1.png" width="396" style="display: block; margin: auto;" /> ] --- ### Width mapping to width can be useful but has few use cases ```r # see source for dataset ggplot(troops, aes(long, lat, group = group, color = direction, size = survivors)) + geom_path(lineend = "round") ``` <div class="figure" style="text-align: center"> <img src="10-multivariate-data_files/figure-html/unnamed-chunk-11-1.png" alt="Basic reproduction of Minard's Napoleon chart." width="720" /> <p class="caption">Basic reproduction of Minard's Napoleon chart.</p> </div> --- ## Shape works quite well for categorical variables, but usually not preferable over color mapping to line shapes is generally better than point shapes ```r *ggplot(mtcars, aes(disp, hp, shape = cyl)) + geom_point() ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-12-1.png" width="432" style="display: block; margin: auto;" /> --- ## Text often makes sense when each observation has a meaningful identity [ggrepel](https://CRAN.R-project.org/package=ggrepel) is a useful package when mapping to text (`geom_text_repel()`) .pull-left[ ```r library(ggrepel) head(msleep, 15) %>% ggplot(aes(brainwt, sleep_total, label = name)) + geom_point() + * geom_text_repel() + scale_x_log10() ``` ] .pull-right[ <img src="10-multivariate-data_files/figure-html/unnamed-chunk-13-1.png" width="345.6" style="display: block; margin: auto;" /> ] --- ## Facets splits and plots datasets in small multiples ### Wrap use `facet_wrap()` when you have **one** variable to facet on. ```r ggplot(mtcars, aes(disp, hp)) + geom_point() + * facet_wrap(vars(cyl)) ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-14-1.png" width="720" style="display: block; margin: auto;" /> --- ### Grid Use `facet_grid()` when you have **two** variables to facet on. ```r ggplot(mtcars, aes(disp, hp)) + geom_point() + * facet_grid(vars(cyl), vars(gear)) ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-15-1.png" width="576" style="display: block; margin: auto;" /> --- ### Facets and Continuous Variables If sacrificing detail is OK, you can facet by transforming a continuous variable into an ordinal one. ```r mtcars %>% * mutate(hp_cat = cut_interval(hp, 3)) %>% ggplot(aes(mpg, disp)) + geom_point() + facet_wrap(vars(hp_cat)) ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-16-1.png" width="720" style="display: block; margin: auto;" /> --- ## Combining Our Tools Putting together these building blocks allows us to create interesting and complex visualizations. ```r tail(mtcars, 15) %>% ggplot(aes(hp, wt, color = cyl, label = name)) + geom_point() + geom_text_repel() + facet_wrap(vars(gear)) ``` <img src="10-multivariate-data_files/figure-html/unnamed-chunk-17-1.png" width="720" style="display: block; margin: auto;" />