Introduction

This the the result report of the Chinese DS Training Initiative Survey. The raw file can be downloaded here. There are 120 total participants.

This document will use R, as well as several libraries (knitr, rCharts), to give a quick example of how to summarize and visualize the result.

Table Summary

Load the result table, show the result.

library(knitr)
result <-read.csv("survey.csv", stringsAsFactors=FALSE, comment.char="",colClasses=c("character"))
kable(result[1:6,], digits = 2, align = 'c', caption = "Survey Results")
Survey Results
ID Type Experience Tools Contents Style Time Location
1 Marketing <1 years R, Python, SAS Theory+Method, Language Tools, Popular Softwares Project 1-2hrs 10001
2 Student <1 years R, Python Theory+Method, Language Tools Project 1-2hrs 11501
3 Research/Development 5-10 years R, Python, SAS Theory+Method, Language Tools Project 1-2hrs 10002
4 Other 1-5 years R, Python, SAS Theory+Method, Language Tools Series 1-2hrs 22182
5 Research/Development <1 years R, Python Theory+Method, Language Tools, Popular Softwares Project 1-2hrs 10001
6 Student <1 years R, Python Theory+Method, Language Tools, Popular Softwares Series <1hrs,, 1-2hrs 10001

Results Visualization

Where do trainees come from?

library(rCharts)
p1 <- nPlot(~ Type, data = result, type = 'pieChart')

Set up the options to embed the plot:

library(knitr)
opts_chunk$set(comment = NA, results = 'asis', tidy = F)

Then show the plots

library(rCharts)
p1$show('inline', include_assets = TRUE, cdn = TRUE)

It shows that top four sections are Research/Development, IT/IT Service, Health Care, and Student.

Their experience?

library(rCharts)
p2 <- nPlot(~ Experience, data = result, type = 'pieChart')
p2$chart(donut = TRUE)
p2$show('inline', include_assets = TRUE, cdn = TRUE)

Most of them have experience less five years, 18/120 have experience over 5 years. 3 of them who have over ten years experience should give lecture sometime.

Which training style do they want?

library(rCharts)
p3 <- nPlot(~ Style, data = result, type = 'pieChart')
p3$show('inline', include_assets = TRUE, cdn = TRUE)

Half of participants(59) want to have lecture with project style, over a quarter like to have theme lecture, the remaining want to have series lectures.

What tools are they using?

library(rCharts)
p4 <- nPlot(~ Tools, data = result, type = 'pieChart')
p4$chart(donut = TRUE)
p4$show('inline', include_assets = TRUE, cdn = TRUE)

Obviously, Python, R, and SAS are the leading developing tools in the community.

Subgroup analysis example

We like to check if there is any style perference difference between groups with different experience.

library(dplyr)
data1 = result %>% group_by(Experience,Style) %>% summarise(count=n())
p5 = nPlot(count ~ Experience, group = "Style", data = data1, type = "multiBarChart")
p5$show('inline', include_assets = TRUE, cdn = TRUE)

Or present the results as the percentage in each group.

data2 = result %>% group_by(Experience,Style) %>% summarise(count=n()) %>% group_by(Experience) %>% mutate(percent = count/sum(count)*100)
p6 = nPlot(percent ~ Experience, group = "Style", data = data2, type = "multiBarChart")
p6$chart(stacked = FALSE, showControls=FALSE, forceY = 100)
p6$show('inline', include_assets = TRUE, cdn = TRUE)

It shows that the less experience student has, the more they like to have project style training.

Interactive Table

We have shown at the begining of the document that knitr library can present dataset in nice table format. Here we show example how to use rCharts library to present interactive table in webpage document.

rCharts integrates the nice R library DT, which is based on a highly flexible plug-in for the jQuery Javascript library. Here we only show two styles as examples.

library(rCharts)
rt = dTable(result,
              sScrollY = "500px",
              width = "1000px",
             sPaginationType=  "full_numbers")
rt$show('inline', include_assets = TRUE, cdn=TRUE)

.

Or showing in this way

library(rCharts)
rt = dTable(result,
              bScrollInfinite = T,
              bScrollCollapse = T,
              sScrollY = "200px",
              width = "1000px")
rt$show('inline', include_assets = TRUE, cdn=TRUE)

. . .

Interactive Map

Many datasets nowdays include location information, Presenting data on a map in a interactive way is very useful. Here we use the leaflet library, which is one of the most polpular open-source JavaScript libraies fro interactive maps.

R also has a very nice library zipcode which supplies geo-locations for all the US zip codes.

Load libraries and data.

library(zipcode)
library(leaflet)
data("zipcode")
result <-read.csv("survey.csv", stringsAsFactors=FALSE, comment.char="",colClasses=c("character"))
mapdata = merge(result,zipcode, by.x = "Location", by.y = "zip", all.x = TRUE)
mapdata = tbl_df(mapdata) %>% filter(longitude != "NA") 

Complie the popup contents for each marker in the map.

content <- paste(sep = "<br/>",
  paste("<b>", mapdata$Type, "</b>"),
  mapdata$Experience,
  mapdata$Tools,
  mapdata$Contents
)
leaflet(data = mapdata) %>% addTiles() %>%
  addMarkers(~longitude, ~latitude, popup = content,clusterOptions = markerClusterOptions())

Last updated, 2015-12-14