class: middle, center, title-slide .title[ # Multivariate Data ] .subtitle[ ## Data Visualization ] .author[ ### Johan Larsson ] .author[ ### Behnaz Pirzamanbein ] .institute[ ### The Department of Statistics, Lund University ] --- ## Visualizing Multivariate Data Visualizations often most intriguing when they show multiple variables. <div class="figure" style="text-align: center"> <img src="images/minard-migration.jpg" alt="The Emigrants of the World, Charles Joseph Minard, 1858." width="78%" /> <p class="caption">The Emigrants of the World, Charles Joseph Minard, 1858.</p> </div> --- ## 3D .pull-left-40[ 3D seems like a logical choice for a third continuous variable. Unfortunately, 3D visualizations are notoriously hard to read. As a thumb rule, **avoid 3D visualizations**. ] .pull-right-60[ <div class="figure" style="text-align: center"> <img src="multivariate-data_files/figure-html/unnamed-chunk-2-1.png" alt="A 3D scatter plot." width="504" /> <p class="caption">A 3D scatter plot.</p> </div> ] --- <div class="figure" style="text-align: center"> <img src="multivariate-data_files/figure-html/unnamed-chunk-3-1.png" alt="A particularly bad example of a 3D plot. 3D bar charts are never a good idea. Data shows death rates in Virginia in 1940." width="576" /> <p class="caption">A particularly bad example of a 3D plot. 3D bar charts are never a good idea. Data shows death rates in Virginia in 1940.</p> </div> --- ### 3D Visualizations Are Not All Bad! 3D visualizations may be useful in a few situations: 1. interactive or animated 3D plots 2. multiple plots showing multiple perspectives 3. only general pattern matters: reading individual points is not important 4. data actually represents three-dimensional location data (topography) -- <div class="figure" style="text-align: center"> <img src="multivariate-data_files/figure-html/unnamed-chunk-4-1.png" alt="The topography of the Maunga Whau Volcano through different perspectives." width="720" /> <p class="caption">The topography of the Maunga Whau Volcano through different perspectives.</p> </div> --- ## Color Color is often a good choice, especially with categorical variables. ```r mtcars <- rownames_to_column(mtcars, "name") %>% mutate_at(vars(cyl, gear), as.factor) *ggplot(mtcars, aes(disp, hp, color = cyl)) + geom_point() ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-5-1.png" width="345.6" style="display: block; margin: auto;" /> --- ### Color and Continuous Variables Sometimes, it is fine to map color to continuous variables, particularly when the data are in a grid format. ```r # see source code for dataset ggplot(volcano_long, aes(longitude, latitude, fill = height)) + geom_tile() + coord_fixed() # makes sense for latitude, longitude ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-7-1.png" width="540" style="display: block; margin: auto;" /> --- ### Scatter Plots Mapping a continuous variable to color in a scatter plot is less appealing. ```r ggplot(mtcars, aes(disp, hp, color = mpg)) + geom_point() ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-8-1.png" width="432" style="display: block; margin: auto;" /> --- ## Size Mapping to size only makes sense for continuous variables! .pull-left-40[ ### 1. Area Most common use case: - bubble plots Caveat: comparing areas is **hard** ] .pull-right-60[ ```r ggplot(mtcars, aes(disp, hp, size = mpg)) + geom_point() ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-9-1.png" width="396" style="display: block; margin: auto;" /> ] --- ### 2. Width Mapping to width can be useful but has few use cases. ```r # see source for dataset ggplot(troops, aes(long, lat, group = group, color = direction, size = survivors)) + geom_path(lineend = "round") ``` <div class="figure" style="text-align: center"> <img src="multivariate-data_files/figure-html/unnamed-chunk-11-1.png" alt="Basic reproduction of Minard's Napoleon chart." width="720" /> <p class="caption">Basic reproduction of Minard's Napoleon chart.</p> </div> --- ## Shape Shape works OK for categorical variables, but usually not preferable over color. Mapping to line shapes is generally better than point shapes. ```r *ggplot(mtcars, aes(disp, hp, shape = cyl)) + geom_point() ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-12-1.png" width="432" style="display: block; margin: auto;" /> --- ## Text Text often makes more sense when each observation has a meaningful identity. [ggrepel](https://CRAN.R-project.org/package=ggrepel) is a useful package when mapping to text (`geom_text_repel()`) .pull-left[ ```r library(ggrepel) head(msleep, 15) %>% ggplot(aes(brainwt, sleep_total, label = name)) + geom_point() + * geom_text_repel() + scale_x_log10() ``` ] .pull-right[ <img src="multivariate-data_files/figure-html/unnamed-chunk-13-1.png" width="345.6" style="display: block; margin: auto;" /> ] --- ## Facets Facet splits and plots datasets in small multiples. ### 1. Wrap Use `facet_wrap()` when you have **one** variable to facet on. ```r ggplot(mtcars, aes(disp, hp)) + geom_point() + * facet_wrap(vars(cyl)) ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-14-1.png" width="576" style="display: block; margin: auto;" /> --- ### 2. Grid Use `facet_grid()` when you have **two** variables to facet on. ```r ggplot(mtcars, aes(disp, hp)) + geom_point() + * facet_grid(vars(cyl), vars(gear)) ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-15-1.png" width="576" style="display: block; margin: auto;" /> --- ### Facets and Continuous Variables If sacrificing detail is OK, you can facet by transforming a continuous variable into an ordinal one. ```r mtcars %>% * mutate(hp_cat = cut_interval(hp, 3)) %>% ggplot(aes(mpg, disp)) + geom_point() + facet_wrap(vars(hp_cat)) ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-16-1.png" width="720" style="display: block; margin: auto;" /> --- ## Combining Our Tools Putting together these building blocks allows us to create interesting and complex visualizations. ```r tail(mtcars, 15) %>% ggplot(aes(hp, wt, color = cyl, label = name)) + geom_point() + geom_text_repel(show.legend = FALSE) + # Prevent text labels from appearing in the legend facet_wrap(vars(gear)) ``` <img src="multivariate-data_files/figure-html/unnamed-chunk-17-1.png" width="720" style="display: block; margin: auto;" />