What is Qualtrics?

Per Qualtrics: “We build technology that closes experience gaps.”

Basically, it is an online survey/data collection software platform.

Their customer base includes:


Getting CSV Data from Qualtrics

There are basically two paths to getting csv data from Qualtrics: Indirect and Direct.


Research Problem:

How does one reduce potential error and the amount of steps to export CSV files from Qualtrics to R?

Historically, indirectly exporting .csv data from Qualtrics and uploading to R has been more complicated than it needs to be.

Problem with Indirect Export

The .csv files includes rows with extraneous information.

By adding a third row (when using Qualtrics’s current standard way of exporting .csv data), Qualtrics’s .csv files cannot be read properly by standard import functions in R (e.g., read.csv()).

R interprets the whole data frame as text/string because the second row includes characters rather than numbers.

These extra rows invites more errors importing data into R.


Benefits of Using a Qualtrics API

Automate repetitive processes and pass information in and out of Qualtrics.

Reduce error.

Setting up a Survey

Insert Survey Page — —

Viewing all Surveys


Generating a Token


How to Get Started with Qualtrics in R

To use the API feature, the package “qualtrRics” is necessary. If not using the API, it is not needed. I also loaded tidyverse to be able to transform the data.

The Qualtric API Reference Homepage has a Qualtrics API Extension Overview: https://www.qualtrics.com/support/integrations/api-integration/overview/

Load Libraries

library(qualtRics)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.4     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(leaflet)

INDIRECT METHOD

Downloading CSV file to GitHub and uploading it in R

SurveyDownload <- read.csv("https://raw.githubusercontent.com/professorfoy/DATA-607/main/DATA%20607%20Qualtrics%20Usage%20Survey_November%206%2C%202021_09.00.csv")

By default, Qualtrics will generate a number of extraneous columns:

view(SurveyDownload)
names(SurveyDownload)
##  [1] "StartDate"             "EndDate"               "Status"               
##  [4] "IPAddress"             "Progress"              "Duration..in.seconds."
##  [7] "Finished"              "RecordedDate"          "ResponseId"           
## [10] "RecipientLastName"     "RecipientFirstName"    "RecipientEmail"       
## [13] "ExternalReference"     "LocationLatitude"      "LocationLongitude"    
## [16] "DistributionChannel"   "UserLanguage"          "Q2"

Note below that R interprets the data frame as text/string. Note that the Progress, Duration, Longitude and Latitude of IP Addresses are all recognized as characters.

str(SurveyDownload)
## 'data.frame':    16 obs. of  18 variables:
##  $ StartDate            : chr  "Start Date" "{\"ImportId\":\"startDate\",\"timeZone\":\"America/New_York\"}" "11/4/2021 16:46" "11/4/2021 16:45" ...
##  $ EndDate              : chr  "End Date" "{\"ImportId\":\"endDate\",\"timeZone\":\"America/New_York\"}" "11/4/2021 16:46" "11/4/2021 16:47" ...
##  $ Status               : chr  "Response Type" "{\"ImportId\":\"status\"}" "IP Address" "IP Address" ...
##  $ IPAddress            : chr  "IP Address" "{\"ImportId\":\"ipAddress\"}" "172.58.225.90" "99.35.202.103" ...
##  $ Progress             : chr  "Progress" "{\"ImportId\":\"progress\"}" "100" "100" ...
##  $ Duration..in.seconds.: chr  "Duration (in seconds)" "{\"ImportId\":\"duration\"}" "14" "79" ...
##  $ Finished             : chr  "Finished" "{\"ImportId\":\"finished\"}" "TRUE" "TRUE" ...
##  $ RecordedDate         : chr  "Recorded Date" "{\"ImportId\":\"recordedDate\",\"timeZone\":\"America/New_York\"}" "11/4/2021 16:46" "11/4/2021 16:47" ...
##  $ ResponseId           : chr  "Response ID" "{\"ImportId\":\"_recordId\"}" "R_1LHzJS2E8VU54sP" "R_3qWMlpOe7pKhzQf" ...
##  $ RecipientLastName    : chr  "Recipient Last Name" "{\"ImportId\":\"recipientLastName\"}" "" "" ...
##  $ RecipientFirstName   : chr  "Recipient First Name" "{\"ImportId\":\"recipientFirstName\"}" "" "" ...
##  $ RecipientEmail       : chr  "Recipient Email" "{\"ImportId\":\"recipientEmail\"}" "" "" ...
##  $ ExternalReference    : chr  "External Data Reference" "{\"ImportId\":\"externalDataReference\"}" "" "" ...
##  $ LocationLatitude     : chr  "Location Latitude" "{\"ImportId\":\"locationLatitude\"}" "40.67129517" "34.87750244" ...
##  $ LocationLongitude    : chr  "Location Longitude" "{\"ImportId\":\"locationLongitude\"}" "-73.89430237" "-92.22650146" ...
##  $ DistributionChannel  : chr  "Distribution Channel" "{\"ImportId\":\"distributionChannel\"}" "anonymous" "anonymous" ...
##  $ UserLanguage         : chr  "User Language" "{\"ImportId\":\"userLanguage\"}" "EN" "EN" ...
##  $ Q2                   : chr  "Dear Classmate:\n\nAs part of my Data Science in Context Assignment to present next Wednesday, I need to gather"| __truncated__ "{\"ImportId\":\"QID30\"}" "Never heard of it." "Never heard of it." ...
view(SurveyDownload)

DIRECT METHOD

The “qualtrics_api_credentials()” function stores your API key and base url in environment variables. The .Renviron message informs us that the API key is being stored for repeated use across different sessions.

qualtrics_api_credentials(api_key = "C7w9SmEv0Hg4tsAZWAhSldwCq6la4ou0NWyjbhN0", 
                          base_url = "baruch.sjc1.qualtrics.com",
                          install = TRUE, overwrite = TRUE)
## Your original .Renviron will be backed up and stored in your R HOME directory if needed.
## Your Qualtrics key and base URL have been stored in your .Renviron.  
## To use now, restart R or run `readRenviron("~/.Renviron")`

The “all_surveys()” function fetches a list of all surveys that you own or have access to from Qualtrics.

When listing all the surveys, by default they are sorted by “id”. The“id” field is randomly generated so the list will not likely be in the most workable order. To identify the survey that one would like to work on, the list number will need to be used. Right now, the Qualtrics Usage Survey is number 58 on the list.

surveys <- all_surveys()
surveys
## # A tibble: 67 x 6
##    id                 name          ownerId   lastModified creationDate isActive
##    <chr>              <chr>         <chr>     <chr>        <chr>        <lgl>   
##  1 SV_0lnv9NvzeER2SRo BUS 640 Su21~ UR_a3IUV~ 2021-07-15T~ 2021-07-15T~ TRUE    
##  2 SV_0lySYN9fuKSkDdP ACC_4501.NET~ UR_a3IUV~ 2021-02-22T~ 2020-08-27T~ TRUE    
##  3 SV_0TFBIV9ambqb11X ACC_3021TR2_~ UR_a3IUV~ 2020-05-16T~ 2020-05-09T~ TRUE    
##  4 SV_1YLSbLJ5xilYbJk DATA 607 Mov~ UR_a3IUV~ 2021-11-04T~ 2021-08-31T~ TRUE    
##  5 SV_2hky7K8m23pKQo5 BUS_640.01_S~ UR_a3IUV~ 2020-08-23T~ 2020-06-01T~ TRUE    
##  6 SV_2t2DLuh19rjWq3Q Carrot vs. S~ UR_a3IUV~ 2021-07-28T~ 2021-07-28T~ FALSE   
##  7 SV_3CLlR5BHgP4qdxz ACCT_1101HD0~ UR_a3IUV~ 2020-05-09T~ 2020-02-02T~ TRUE    
##  8 SV_3DEjcE1KpM3DaHb Course_#_Sec~ UR_a3IUV~ 2020-05-09T~ 2019-12-05T~ FALSE   
##  9 SV_3F9b815PxcelDg1 ACCT_1101_OL~ UR_a3IUV~ 2020-09-27T~ 2020-08-27T~ TRUE    
## 10 SV_3gDh6TUeAwz9ebr Daily_Deal_P~ UR_a3IUV~ 2021-09-25T~ 2020-01-05T~ FALSE   
## # ... with 57 more rows

To modify the list, we use dplyr from the tidyverse package and arrange the list descending by the “lastModified” date column that Qualtrics generated.

surveys <- all_surveys() %>% arrange(desc(lastModified))
surveys
## # A tibble: 67 x 6
##    id                 name         ownerId   lastModified  creationDate isActive
##    <chr>              <chr>        <chr>     <chr>         <chr>        <lgl>   
##  1 SV_b8ecSWwygwVSqbA DATA 607 Qu~ UR_a3IUV~ 2021-11-05T1~ 2021-11-04T~ TRUE    
##  2 SV_1YLSbLJ5xilYbJk DATA 607 Mo~ UR_a3IUV~ 2021-11-04T1~ 2021-08-31T~ TRUE    
##  3 SV_cYiASYlHXGeCw7Q Carrot v. S~ UR_a3IUV~ 2021-10-29T1~ 2021-10-03T~ TRUE    
##  4 SV_9MpSKnVaDighJPg ACC_1101_OL~ UR_a3IUV~ 2021-10-04T0~ 2021-08-10T~ TRUE    
##  5 SV_aa4ibRqwE8IZgx0 Dissertatio~ UR_a3IUV~ 2021-10-03T1~ 2021-10-03T~ FALSE   
##  6 SV_3gDh6TUeAwz9ebr Daily_Deal_~ UR_a3IUV~ 2021-09-25T2~ 2020-01-05T~ FALSE   
##  7 SV_5cqUYxziRjAf4yy ACC4501.TY7~ UR_a3IUV~ 2021-09-20T2~ 2021-08-30T~ TRUE    
##  8 SV_9npgqO9RIRqTIUe ACC4501.TY9~ UR_a3IUV~ 2021-09-20T2~ 2021-08-30T~ TRUE    
##  9 SV_416B4w9pzadueB8 BUS_640.02_~ UR_a3IUV~ 2021-08-30T2~ 2021-08-10T~ TRUE    
## 10 SV_dnkpOE5dwVXarZA BUS640.01_F~ UR_a3IUV~ 2021-08-10T2~ 2021-08-10T~ TRUE    
## # ... with 57 more rows

The “fetch_survey()” function downloads a survey from Qualtrics and loads it into R.

SurveyAPI <- fetch_survey(surveyID = surveys$id[1], 
                         verbose = TRUE)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
## 
## -- Column specification --------------------------------------------------------
## cols(
##   StartDate = col_datetime(format = ""),
##   EndDate = col_datetime(format = ""),
##   Status = col_character(),
##   IPAddress = col_character(),
##   Progress = col_double(),
##   `Duration (in seconds)` = col_double(),
##   Finished = col_logical(),
##   RecordedDate = col_datetime(format = ""),
##   ResponseId = col_character(),
##   RecipientLastName = col_logical(),
##   RecipientFirstName = col_logical(),
##   RecipientEmail = col_logical(),
##   ExternalReference = col_logical(),
##   LocationLatitude = col_double(),
##   LocationLongitude = col_double(),
##   DistributionChannel = col_character(),
##   UserLanguage = col_character(),
##   Q2 = col_character()
## )
SurveyAPI
## # A tibble: 17 x 18
##    StartDate           EndDate             Status     IPAddress      Progress
##    <dttm>              <dttm>              <chr>      <chr>             <dbl>
##  1 2021-11-04 20:46:13 2021-11-04 20:46:28 IP Address 172.58.225.90       100
##  2 2021-11-04 20:45:56 2021-11-04 20:47:15 IP Address 99.35.202.103       100
##  3 2021-11-04 22:40:10 2021-11-04 22:40:52 IP Address 67.241.4.199        100
##  4 2021-11-05 02:24:06 2021-11-05 02:24:26 IP Address 24.104.229.250      100
##  5 2021-11-05 13:57:00 2021-11-05 13:57:09 IP Address 74.108.219.115      100
##  6 2021-11-05 14:12:41 2021-11-05 14:13:12 IP Address 172.58.228.118      100
##  7 2021-11-05 18:07:30 2021-11-05 18:07:37 IP Address 172.58.231.33       100
##  8 2021-11-05 18:07:29 2021-11-05 18:07:40 IP Address 173.77.213.108      100
##  9 2021-11-05 18:07:27 2021-11-05 18:07:56 IP Address 24.60.166.221       100
## 10 2021-11-05 18:08:19 2021-11-05 18:08:29 IP Address 24.246.105.130      100
## 11 2021-11-05 18:11:07 2021-11-05 18:11:28 IP Address 108.70.160.200      100
## 12 2021-11-05 18:39:26 2021-11-05 18:39:51 IP Address 208.98.222.65       100
## 13 2021-11-05 18:53:55 2021-11-05 18:54:11 IP Address 216.236.150.50      100
## 14 2021-11-06 04:03:08 2021-11-06 04:03:38 IP Address 173.56.93.241       100
## 15 2021-11-06 16:17:43 2021-11-06 16:17:51 IP Address 209.122.210.99      100
## 16 2021-11-08 02:29:03 2021-11-08 02:29:12 IP Address 24.47.184.26        100
## 17 2021-11-09 20:04:42 2021-11-09 20:05:21 IP Address 172.58.228.182      100
## # ... with 13 more variables: Duration (in seconds) <dbl>, Finished <lgl>,
## #   RecordedDate <dttm>, ResponseId <chr>, RecipientLastName <lgl>,
## #   RecipientFirstName <lgl>, RecipientEmail <lgl>, ExternalReference <lgl>,
## #   LocationLatitude <dbl>, LocationLongitude <dbl>, DistributionChannel <chr>,
## #   UserLanguage <chr>, Q2 <ord>
view(SurveyAPI)

Note below that R now interprets the data frame as text/string and nums. Note that the Progress, Duration, Longitude and Latitude of IP Addresses are all recognized as numeric values.

str(SurveyAPI)
## spec_tbl_df [17 x 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ StartDate            : POSIXct[1:17], format: "2021-11-04 20:46:13" "2021-11-04 20:45:56" ...
##  $ EndDate              : POSIXct[1:17], format: "2021-11-04 20:46:28" "2021-11-04 20:47:15" ...
##  $ Status               : chr [1:17] "IP Address" "IP Address" "IP Address" "IP Address" ...
##   ..- attr(*, "label")= Named chr "Response Type"
##   .. ..- attr(*, "names")= chr "Status"
##  $ IPAddress            : chr [1:17] "172.58.225.90" "99.35.202.103" "67.241.4.199" "24.104.229.250" ...
##   ..- attr(*, "label")= Named chr "IP Address"
##   .. ..- attr(*, "names")= chr "IPAddress"
##  $ Progress             : num [1:17] 100 100 100 100 100 100 100 100 100 100 ...
##   ..- attr(*, "label")= Named chr "Progress"
##   .. ..- attr(*, "names")= chr "Progress"
##  $ Duration (in seconds): num [1:17] 14 79 42 20 8 31 7 10 28 9 ...
##   ..- attr(*, "label")= Named chr "Duration (in seconds)"
##   .. ..- attr(*, "names")= chr "Duration (in seconds)"
##  $ Finished             : logi [1:17] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..- attr(*, "label")= Named chr "Finished"
##   .. ..- attr(*, "names")= chr "Finished"
##  $ RecordedDate         : POSIXct[1:17], format: "2021-11-04 20:46:28" "2021-11-04 20:47:16" ...
##  $ ResponseId           : chr [1:17] "R_1LHzJS2E8VU54sP" "R_3qWMlpOe7pKhzQf" "R_1pFjhjJOIZUeCVj" "R_3fOeKdeXIhvWKui" ...
##   ..- attr(*, "label")= Named chr "Response ID"
##   .. ..- attr(*, "names")= chr "ResponseId"
##  $ RecipientLastName    : logi [1:17] NA NA NA NA NA NA ...
##   ..- attr(*, "label")= Named chr "Recipient Last Name"
##   .. ..- attr(*, "names")= chr "RecipientLastName"
##  $ RecipientFirstName   : logi [1:17] NA NA NA NA NA NA ...
##   ..- attr(*, "label")= Named chr "Recipient First Name"
##   .. ..- attr(*, "names")= chr "RecipientFirstName"
##  $ RecipientEmail       : logi [1:17] NA NA NA NA NA NA ...
##   ..- attr(*, "label")= Named chr "Recipient Email"
##   .. ..- attr(*, "names")= chr "RecipientEmail"
##  $ ExternalReference    : logi [1:17] NA NA NA NA NA NA ...
##   ..- attr(*, "label")= Named chr "External Data Reference"
##   .. ..- attr(*, "names")= chr "ExternalReference"
##  $ LocationLatitude     : num [1:17] 40.7 34.9 43.1 40.7 40.6 ...
##   ..- attr(*, "label")= Named chr "Location Latitude"
##   .. ..- attr(*, "names")= chr "LocationLatitude"
##  $ LocationLongitude    : num [1:17] -73.9 -92.2 -76.1 -73.9 -74 ...
##   ..- attr(*, "label")= Named chr "Location Longitude"
##   .. ..- attr(*, "names")= chr "LocationLongitude"
##  $ DistributionChannel  : chr [1:17] "anonymous" "anonymous" "anonymous" "anonymous" ...
##   ..- attr(*, "label")= Named chr "Distribution Channel"
##   .. ..- attr(*, "names")= chr "DistributionChannel"
##  $ UserLanguage         : chr [1:17] "EN" "EN" "EN" "EN" ...
##   ..- attr(*, "label")= Named chr "User Language"
##   .. ..- attr(*, "names")= chr "UserLanguage"
##  $ Q2                   : Ord.factor w/ 6 levels "Never heard of it."<..: 1 1 1 1 1 6 1 2 1 1 ...
##   ..- attr(*, "label")= Named chr "Dear Classmate:\n\nAs part of my Data Science in Context Assignment to present next Wednesday, I need to gather"| __truncated__
##   .. ..- attr(*, "names")= chr "Q2"
##  - attr(*, "problems")=<externalptr> 
##  - attr(*, "column_map")= tibble [18 x 7] (S3: tbl_df/tbl/data.frame)
##   ..$ qname      : chr [1:18] "StartDate" "EndDate" "Status" "IPAddress" ...
##   ..$ description: chr [1:18] "Start Date" "End Date" "Response Type" "IP Address" ...
##   ..$ main       : chr [1:18] "Start Date" "End Date" "Response Type" "IP Address" ...
##   ..$ sub        : chr [1:18] "" "" "" "" ...
##   ..$ ImportId   : chr [1:18] "startDate" "endDate" "status" "ipAddress" ...
##   ..$ timeZone   : chr [1:18] "Z" "Z" NA NA ...
##   ..$ choiceId   : logi [1:18] NA NA NA NA NA NA ...
head(SurveyAPI)
## # A tibble: 6 x 18
##   StartDate           EndDate             Status     IPAddress      Progress
##   <dttm>              <dttm>              <chr>      <chr>             <dbl>
## 1 2021-11-04 20:46:13 2021-11-04 20:46:28 IP Address 172.58.225.90       100
## 2 2021-11-04 20:45:56 2021-11-04 20:47:15 IP Address 99.35.202.103       100
## 3 2021-11-04 22:40:10 2021-11-04 22:40:52 IP Address 67.241.4.199        100
## 4 2021-11-05 02:24:06 2021-11-05 02:24:26 IP Address 24.104.229.250      100
## 5 2021-11-05 13:57:00 2021-11-05 13:57:09 IP Address 74.108.219.115      100
## 6 2021-11-05 14:12:41 2021-11-05 14:13:12 IP Address 172.58.228.118      100
## # ... with 13 more variables: Duration (in seconds) <dbl>, Finished <lgl>,
## #   RecordedDate <dttm>, ResponseId <chr>, RecipientLastName <lgl>,
## #   RecipientFirstName <lgl>, RecipientEmail <lgl>, ExternalReference <lgl>,
## #   LocationLatitude <dbl>, LocationLongitude <dbl>, DistributionChannel <chr>,
## #   UserLanguage <chr>, Q2 <ord>

Leaflet to Plot Longitude and Latitude of IP Addresses

Using Leaflet, a new object labeled SurveyMap was created for the dataframe. To use Leaflet, the first step is to create an centered area to plot the latitude and longitude coordinates. The addTiles command creates the Leaflet map and the setView command using latitude and longitude coordinates for the center of Brooklyn (found by trial and error) is used as the center of the map. The addCircles command was used to plot blue circles at each location using the latitude and longitude data from the SurveyAPI dataframe.

# Create Leaflet map centered on Brooklyn.
SurveyMap <- SurveyAPI %>% leaflet() %>%
  addTiles() %>%
  setView(-73.95, 40.635, zoom = 11) %>%
  addCircles(lat = ~ LocationLatitude, lng = ~ LocationLongitude) 
SurveyMap

CountResponses <- SurveyAPI %>% count(SurveyAPI$Q2)
CountResponses
## # A tibble: 4 x 2
##   `SurveyAPI$Q2`                                           n
##   <ord>                                                <int>
## 1 Never heard of it.                                      11
## 2 I have heard of it, but not familiar with it at all.     4
## 3 I tried it here and there and I am familiar with it.     1
## 4 <NA>                                                     1

Distribution of choices.

ggplot(SurveyAPI) +
  geom_bar(aes(y = Q2),fill="blue")+
  labs(title="Qualtrics Usage Results") 

References

Gadient-Bruegger, A. (2020). How to import Qualtrics csv files into R. Retrieved from https://www.adrianbruegger.com/post/import-qualtrics-csv-files/.

Silge, J., & Ginn, J. (2021). Introduction to Qualtrics. Retrieved from https://cran.r-project.org/web/packages/qualtRics/vignettes/qualtRics.html.

StackOverflow. (2013). read.csv, header on first line, skip second line [duplicate]. Retrieved from https://stackoverflow.com/questions/15860071/read-csv-header-on-first-line-skip-second-line.

StackOverflow. (2017). How to skip second line is csv file while maintaining first line as column names with read_csv? [duplicate]. Retrieved from https://stackoverflow.com/questions/44273235/how-to-skip-second-line-is-csv-file-while-maintaining-first-line-as-column-names.

Charlton, A. (2016). Bringing in Qualtrics (and other data). Retrieved from https://blogs.uoregon.edu/rclub/2016/05/18/bringing-in-qualtrics-and-other-data/.