July,24th, 2020

Developing Data Products

Week 3 Assignment

This document presents the work done to fullfill the assignment of week 3 in the Developing Data Producs Coursera course, corresponding to the Data Science specialization.

Loading data

Let’s read and inspect our data

d <- read.csv("data.csv")

Loaded data

summary(d)
##                                company          province      employees   
##  2002 PERLINDUSTRIA SL             :   1   Barcelona:2497          :2091  
##  3ABC LASURES SL                   :   1   Girona   : 361   <10    :  71  
##  8F MIRABOLANO S.L.                :   1   Lleida   : 141   100-500: 163  
##  ABASIC SL                         :   1   Tarragona: 200   10-50  : 651  
##  ABB ELECTRIFICATION SOLUTIONS S.L.:   1                    >500   :  39  
##  AB-BIOTICS, S.A.                  :   1                    50-100 : 184  
##  (Other)                           :3193                                  
##        turnover   
##            :2094  
##  0-2 M €   : 178  
##  >100 M €  :  53  
##  10-50 M € : 253  
##  2-5M €    : 340  
##  50-100 M €:  64  
##  5-10 M €  : 217

Transforming data

Let’s arrange and clean our factors columns (employees,turnover)

d$employees <- as.character(d$employees)
d$turnover <- as.character(d$turnover)
d$employees[d$employees==""] <- "Unknown"
d$turnover[d$turnover==""] <- "Unknown"
d$employees <- factor(d$employees, levels=c("Unknown","<10",
                                            "10-50","50-100",
                                            "100-500",">500"))
d$turnover <- factor(d$turnover, levels=c("Unknown","0-2 M €",
                                          "2-5M €","5-10 M €",
                                          "10-50 M €",
                                          "50-100 M €",">100 M €"))

Transformed data

summary(d)
##                                company          province      employees   
##  2002 PERLINDUSTRIA SL             :   1   Barcelona:2497   Unknown:2091  
##  3ABC LASURES SL                   :   1   Girona   : 361   <10    :  71  
##  8F MIRABOLANO S.L.                :   1   Lleida   : 141   10-50  : 651  
##  ABASIC SL                         :   1   Tarragona: 200   50-100 : 184  
##  ABB ELECTRIFICATION SOLUTIONS S.L.:   1                    100-500: 163  
##  AB-BIOTICS, S.A.                  :   1                    >500   :  39  
##  (Other)                           :3193                                  
##        turnover   
##  Unknown   :2094  
##  0-2 M €   : 178  
##  2-5M €    : 340  
##  5-10 M €  : 217  
##  10-50 M € : 253  
##  50-100 M €:  64  
##  >100 M €  :  53

Plot generation: Companies/employees/province

by_p <- d %>% group_by(province,employees) %>% summarize(count=n())
p <- plot_ly(type="bar")
for (e in levels(by_p$employees)) {
    tmp <- filter(by_p,employees==e) %>% arrange(province)
    p<-add_trace(p, data=tmp, x=~province ,y=~count, name = e)
}
p <- layout(p,title="Company Size by Employees/Province",
            yaxis = list(title = '# companies'),
            xaxis = list(title="Province"),
            legend = list(
                title = list(text = "<b>Employees</b>")),
            barmode = 'group')

Plot: Companies/employees/province