This the the result report of the Chinese DS Training Initiative Survey. The raw file can be downloaded here. There are 120 total participants.
This document will use R, as well as several libraries (knitr, rCharts), to give a quick example of how to summarize and visualize the result.
Load the result table, show the result.
library(knitr)
result <-read.csv("survey.csv", stringsAsFactors=FALSE, comment.char="",colClasses=c("character"))
kable(result[1:6,], digits = 2, align = 'c', caption = "Survey Results")
ID | Type | Experience | Tools | Contents | Style | Time | Location |
---|---|---|---|---|---|---|---|
1 | Marketing | <1 years | R, Python, SAS | Theory+Method, Language Tools, Popular Softwares | Project | 1-2hrs | 10001 |
2 | Student | <1 years | R, Python | Theory+Method, Language Tools | Project | 1-2hrs | 11501 |
3 | Research/Development | 5-10 years | R, Python, SAS | Theory+Method, Language Tools | Project | 1-2hrs | 10002 |
4 | Other | 1-5 years | R, Python, SAS | Theory+Method, Language Tools | Series | 1-2hrs | 22182 |
5 | Research/Development | <1 years | R, Python | Theory+Method, Language Tools, Popular Softwares | Project | 1-2hrs | 10001 |
6 | Student | <1 years | R, Python | Theory+Method, Language Tools, Popular Softwares | Series | <1hrs,, 1-2hrs | 10001 |
library(rCharts)
p1 <- nPlot(~ Type, data = result, type = 'pieChart')
Set up the options to embed the plot:
library(knitr)
opts_chunk$set(comment = NA, results = 'asis', tidy = F)
Then show the plots
library(rCharts)
p1$show('inline', include_assets = TRUE, cdn = TRUE)
It shows that top four sections are Research/Development, IT/IT Service, Health Care, and Student.
library(rCharts)
p2 <- nPlot(~ Experience, data = result, type = 'pieChart')
p2$chart(donut = TRUE)
p2$show('inline', include_assets = TRUE, cdn = TRUE)
Most of them have experience less five years, 18/120 have experience over 5 years. 3 of them who have over ten years experience should give lecture sometime.
library(rCharts)
p3 <- nPlot(~ Style, data = result, type = 'pieChart')
p3$show('inline', include_assets = TRUE, cdn = TRUE)
Half of participants(59) want to have lecture with project style, over a quarter like to have theme lecture, the remaining want to have series lectures.
library(rCharts)
p4 <- nPlot(~ Tools, data = result, type = 'pieChart')
p4$chart(donut = TRUE)
p4$show('inline', include_assets = TRUE, cdn = TRUE)
Obviously, Python, R, and SAS are the leading developing tools in the community.
We like to check if there is any style perference difference between groups with different experience.
library(dplyr)
data1 = result %>% group_by(Experience,Style) %>% summarise(count=n())
p5 = nPlot(count ~ Experience, group = "Style", data = data1, type = "multiBarChart")
p5$show('inline', include_assets = TRUE, cdn = TRUE)
Or present the results as the percentage in each group.
data2 = result %>% group_by(Experience,Style) %>% summarise(count=n()) %>% group_by(Experience) %>% mutate(percent = count/sum(count)*100)
p6 = nPlot(percent ~ Experience, group = "Style", data = data2, type = "multiBarChart")
p6$chart(stacked = FALSE, showControls=FALSE, forceY = 100)
p6$show('inline', include_assets = TRUE, cdn = TRUE)
It shows that the less experience student has, the more they like to have project style training.
We have shown at the begining of the document that knitr library can present dataset in nice table format. Here we show example how to use rCharts library to present interactive table in webpage document.
rCharts integrates the nice R library DT, which is based on a highly flexible plug-in for the jQuery Javascript library. Here we only show two styles as examples.
library(rCharts)
rt = dTable(result,
sScrollY = "500px",
width = "1000px",
sPaginationType= "full_numbers")
rt$show('inline', include_assets = TRUE, cdn=TRUE)
.
Or showing in this way
library(rCharts)
rt = dTable(result,
bScrollInfinite = T,
bScrollCollapse = T,
sScrollY = "200px",
width = "1000px")
rt$show('inline', include_assets = TRUE, cdn=TRUE)
. . .
Many datasets nowdays include location information, Presenting data on a map in a interactive way is very useful. Here we use the leaflet library, which is one of the most polpular open-source JavaScript libraies fro interactive maps.
R also has a very nice library zipcode which supplies geo-locations for all the US zip codes.
Load libraries and data.
library(zipcode)
library(leaflet)
data("zipcode")
result <-read.csv("survey.csv", stringsAsFactors=FALSE, comment.char="",colClasses=c("character"))
mapdata = merge(result,zipcode, by.x = "Location", by.y = "zip", all.x = TRUE)
mapdata = tbl_df(mapdata) %>% filter(longitude != "NA")
Complie the popup contents for each marker in the map.
content <- paste(sep = "<br/>",
paste("<b>", mapdata$Type, "</b>"),
mapdata$Experience,
mapdata$Tools,
mapdata$Contents
)
leaflet(data = mapdata) %>% addTiles() %>%
addMarkers(~longitude, ~latitude, popup = content,clusterOptions = markerClusterOptions())
Last updated, 2015-12-14