Counting "Select All That Apply" Questions in Qualtrics

Qualtrics Messy Data

My friend Devon Cantwell reached out with an interesting messy data caused by how Qualtrics produces “select all that apply” variables. For example, in her (mock) survey, she asks students to select all the colors that they personally find attractive from a list. When downloaded from Qualtrics, we get a dataframe that looks like this:

glimpse(dat)
## Rows: 940
## Columns: 4
## $ color_1 <fct> Sparkle, Blue, Blue, Sparkle, Blue, Sparkle, Sparkle, Green, B~
## $ color_2 <fct> NA, Moldy Book, NA, Moldy Book, Moldy Book, Honey Bee, Moldy B~
## $ color_3 <fct> NA, Apple Core Brown, NA, Apple Core Brown, NA, NA, NA, NA, NA~
## $ color_4 <fct> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA~

So all students pick at least one color, some pick two, but relatively few pick three or four. One thing we might want to know is the first color selected by respondent? That’s relatively easy:

dat %>% count(color_1)
## # A tibble: 8 x 2
##   color_1              n
##   <fct>            <int>
## 1 Blue               233
## 2 Green              134
## 3 Yellow              14
## 4 Sparkle            189
## 5 Apple Core Brown     6
## 6 Honey Bee           13
## 7 Moldy Book          42
## 8 <NA>               309

But this only tells us the first color selected, not how many times a color was selected. What if we want to count all the instances where “Moldy Book” was selected, across columns? Or getting a more succinct answer for all colors? Because these are not ordered in any way, and the respondent wasn’t asked for an ordered preference, we need to count across the variables.

We can use tidyr for a quick solution:

library(tidyr)

dat %>%
  gather(key, value, na.rm = TRUE) %>%
  count(value)
## Warning: attributes are not identical across measure variables;
## they will be dropped
## # A tibble: 7 x 2
##   value                n
##   <chr>            <int>
## 1 Apple Core Brown    78
## 2 Blue               233
## 3 Green              134
## 4 Honey Bee           32
## 5 Moldy Book         222
## 6 Sparkle            230
## 7 Yellow              38

Good thing we checked! It turns out that Sparkle and Moldy Book are basically just as popular as Blue! If we had stopped with just checking the first color picked, our inference for color preference would have been way off.

Ian T. Adams
Ian T. Adams
Ph.D. Candidate and Instructor

My research interests include public workplace surveillance, policing, and emotional labor.

Related