class: middle, center, title-slide # Categorical Data ## STAE04: Data Visualization ### Johan Larsson ### The Department of Statistics, Lund University --- ## Visualizing Categorical Data visualizing categorical data usually comes down to visualizing **proportions** <div class="figure" style="text-align: center"> <img src="09-categorical-data_files/figure-html/unnamed-chunk-1-1.png" alt="A 'scatter plot' of categorical data; not very appealing." width="2000" /> <p class="caption">A 'scatter plot' of categorical data; not very appealing.</p> </div> ggplot2 has **limited** capabilities for visualizing categorical data --- ## Barcharts simple and readable ```r library(productplots) # for happy data set happy <- drop_na(happy) ggplot(happy, aes(happy)) + * geom_bar() ``` <div class="figure" style="text-align: center"> <img src="09-categorical-data_files/figure-html/unnamed-chunk-2-1.png" alt="Happiness ratings from the GSS in the US." width="432" /> <p class="caption">Happiness ratings from the GSS in the US.</p> </div> --- ## Grouped Barchart good when counts are what matters and group sizes are important, but hard to judge within categories ```r ggplot(happy, aes(degree, fill = happy)) + * geom_bar(position = "dodge") ``` <img src="09-categorical-data_files/figure-html/unnamed-chunk-3-1.png" width="720" style="display: block; margin: auto;" /> --- ## Stacked Barchart same use case as grouped barcharts, but easier to compare within categories whilst harder to compare between ```r ggplot(happy, aes(degree, fill = happy)) + * geom_bar() # position = "stack" is the default ``` <img src="09-categorical-data_files/figure-html/unnamed-chunk-4-1.png" width="720" style="display: block; margin: auto;" /> --- ## Proportional Stacked Barchart easy to compare both between and within categories but lose information on category size ```r ggplot(happy, aes(degree, fill = happy)) + * geom_bar(position = "fill") + ylab("Proportion") ``` <img src="09-categorical-data_files/figure-html/unnamed-chunk-5-1.png" width="720" style="display: block; margin: auto;" /> --- ## Mosaic Plots a type of stacked barchart that maps category size to the width of bars need to use another package for this: [productplots](https://CRAN.R-project.org/package=productplots) or [ggmosaic](https://CRAN.R-project.org/package=ggmosaic), or [vcd](https://CRAN.R-project.org/package=vcd) ```r library(productplots) *prodplot(happy, ~ happy + degree) + aes(fill = happy) + theme(legend.position = "none") # remove legend ``` <img src="09-categorical-data_files/figure-html/unnamed-chunk-6-1.png" width="720" style="display: block; margin: auto;" /> --- ## Mappings choice of mappings with categorical data is important ```r *ggplot(happy, aes(happy, fill = degree)) + geom_bar(position = "fill", col = 1) + ylab("Proportion") ``` <div class="figure" style="text-align: center"> <img src="09-categorical-data_files/figure-html/unnamed-chunk-7-1.png" alt="The proportional stacked bar chart with a different mapping." width="720" /> <p class="caption">The proportional stacked bar chart with a different mapping.</p> </div> pay attention to **which relationship it is you want to display** --- ## Waffle Plots suitable when there are large differences between categories again need to use a new package: [waffle](https://CRAN.R-project.org/package=waffle) <div class="figure" style="text-align: center"> <img src="09-categorical-data_files/figure-html/unnamed-chunk-8-1.png" alt="A waffle plot of the happiness data. Every square represents 100 people. The code is quite complicated (see the source code)." width="720" /> <p class="caption">A waffle plot of the happiness data. Every square represents 100 people. The code is quite complicated (see the source code).</p> </div> --- ## Euler and Venn Diagrams useful when relationships between categorical variables is at interest **proportional** diagrams best but not always possible for more than two categories .pull-left[ ```r library(eulerr) combo <- c( "happy" = 100, "graduated" = 40, "happy&graduated" = 35 ) # fit and inspect r <- euler(combo) # plot plot(r) ``` ] .pull-right[ <img src="09-categorical-data_files/figure-html/unnamed-chunk-9-1.png" width="345.6" style="display: block; margin: auto;" /> ] .footnote[ Disclaimer: Johan Larsson is the author of [eulerr](https://CRAN.R-project.org/package=eulerr). ] --- ## More on Visualizing Categorical Data much more to learn on visualizations of categorical data packages [vcd](https://CRAN.R-project.org/package=vcd) and [vcdExtra](https://CRAN.R-project.org/package=vcdExtra) offer lots of functionality <div class="figure" style="text-align: center"> <img src="09-categorical-data_files/figure-html/unnamed-chunk-10-1.png" alt="A plot from the vcd package." width="338.4" /> <p class="caption">A plot from the vcd package.</p> </div>