Week 3 Assignment
This document presents the work done to fullfill the assignment of week 3 in the Developing Data Producs Coursera course, corresponding to the Data Science specialization.
July,24th, 2020
This document presents the work done to fullfill the assignment of week 3 in the Developing Data Producs Coursera course, corresponding to the Data Science specialization.
Let’s read and inspect our data
d <- read.csv("data.csv")
summary(d)
## company province employees ## 2002 PERLINDUSTRIA SL : 1 Barcelona:2497 :2091 ## 3ABC LASURES SL : 1 Girona : 361 <10 : 71 ## 8F MIRABOLANO S.L. : 1 Lleida : 141 100-500: 163 ## ABASIC SL : 1 Tarragona: 200 10-50 : 651 ## ABB ELECTRIFICATION SOLUTIONS S.L.: 1 >500 : 39 ## AB-BIOTICS, S.A. : 1 50-100 : 184 ## (Other) :3193 ## turnover ## :2094 ## 0-2 M € : 178 ## >100 M € : 53 ## 10-50 M € : 253 ## 2-5M € : 340 ## 50-100 M €: 64 ## 5-10 M € : 217
Let’s arrange and clean our factors columns (employees,turnover)
d$employees <- as.character(d$employees)
d$turnover <- as.character(d$turnover)
d$employees[d$employees==""] <- "Unknown"
d$turnover[d$turnover==""] <- "Unknown"
d$employees <- factor(d$employees, levels=c("Unknown","<10",
"10-50","50-100",
"100-500",">500"))
d$turnover <- factor(d$turnover, levels=c("Unknown","0-2 M €",
"2-5M €","5-10 M €",
"10-50 M €",
"50-100 M €",">100 M €"))
summary(d)
## company province employees ## 2002 PERLINDUSTRIA SL : 1 Barcelona:2497 Unknown:2091 ## 3ABC LASURES SL : 1 Girona : 361 <10 : 71 ## 8F MIRABOLANO S.L. : 1 Lleida : 141 10-50 : 651 ## ABASIC SL : 1 Tarragona: 200 50-100 : 184 ## ABB ELECTRIFICATION SOLUTIONS S.L.: 1 100-500: 163 ## AB-BIOTICS, S.A. : 1 >500 : 39 ## (Other) :3193 ## turnover ## Unknown :2094 ## 0-2 M € : 178 ## 2-5M € : 340 ## 5-10 M € : 217 ## 10-50 M € : 253 ## 50-100 M €: 64 ## >100 M € : 53
by_p <- d %>% group_by(province,employees) %>% summarize(count=n())
p <- plot_ly(type="bar")
for (e in levels(by_p$employees)) {
tmp <- filter(by_p,employees==e) %>% arrange(province)
p<-add_trace(p, data=tmp, x=~province ,y=~count, name = e)
}
p <- layout(p,title="Company Size by Employees/Province",
yaxis = list(title = '# companies'),
xaxis = list(title="Province"),
legend = list(
title = list(text = "<b>Employees</b>")),
barmode = 'group')