Company Size versus Job Satisfaction
require(ggplot2)
## Loading required package: ggplot2
dsdata$JobSat<-as.factor(dsdata$JobSat) #change to factor for bar graph x-axis
company_data<-dsdata[-which(dsdata$OrgSize=="Just me - I am a freelancer, sole proprietor, etc."),] #remove single-person companies(lots of variation)
ggplot(company_data, aes(x=JobSat))+geom_bar()+facet_wrap(~OrgSize)+labs(title='Job Satisfaction by Company Size',x = 'Job Satisfaction', y = 'Count') #facet_wrap() on OrgSize

dsdata$JobSat<-as.numeric(dsdata$JobSat) #change back to numeric for violin plot (to use on y-axis as response variable)
company_data<-dsdata[-which(dsdata$OrgSize=="Just me - I am a freelancer, sole proprietor, etc."),]
ggplot(company_data, aes(x=OrgSize, y=JobSat))+geom_violin()+theme(axis.text.x = element_text(angle=15))+ labs(title='Job Satisfaction by Company Size',x = 'Company Size', y = 'Job Satisfaction') #violin plot; widths of plots reflect densities
Variables used in analysis: JobSat (job satisfaction ranging from 1-5 or 2-8, with 1 as not satisfied and 5 or 8 as very satisfied) and OrgSize (size of the organization).
Argument 1 of the article is with regards to Company Size versus Job Satisfaction. The article argues that while overall it is hard to draw conclusions regarding the relationship between company size and job satisfaction, smaller companies have higher job satisfaction ratings than do larger ones. In order to investigate, I first removed the “freelancer” level of the OrgSize variable because freelance/single positions have many external factors that may affect job satisfaction aside from the sole size of the company. With the remaining groups of company sizes, we can see that the article is correct.
For the bar graphs, this trend is true. Overall, the graphs display the same trend of being skewed-left and heavy-tailed, with more density of points in the higher job satisfaction score ranges. Smaller companies only (2-9 employees, 10-19 employees, 20-99 employees) have a taller bar at a score of 8 than at 6. Meanwhile, larger companies have taller or equivalent bars at 6 than at 8, indicating that a score of “very satisfied” is not as common for large companies.
The general trend is the same for all violin plots, where density of points is small at lower job satisfaction scores and increases in the higher ranges of scores. Additionally, for SMALLER size companies (10-19 employees, 2-9 employees, 20-99 employees) the width of the plot is larger at a score of 5 than it is at a score of 4. Meanwhile for large companies (5,000-9,999 or 10,000 or more) the highest job satisfaction scores often have slimmer densities than more moderate scores.
Job Availability by Country and Year
#require(ggplot2)
country_7<-c('United States', 'Spain', 'Germany',
'Australia', 'Ireland', 'United Kingdom', 'India') #a sample of 7 countries from the Country column
dsdata_subset<-dsdata[(dsdata$Country %in% country_7 & dsdata$Data.scientist.or.machine.learning.specialist=='1'),] #subset the data frame to only these countries and to only the observations with a '1' for the job title category
dsdata_subset$Country<-factor(dsdata_subset$Country, levels=c('United States', 'Spain', 'Germany',
'Australia', 'Ireland', 'United Kingdom', 'India'), labels=c('US', 'Spain', 'Germany', 'Australia', 'Ireland', 'UK', 'India')) #Changing the levels of the Country variable to shorter names
ggplot(dsdata_subset, aes(x=Data.scientist.or.machine.learning.specialist))+
geom_bar(aes(fill=Country))+
facet_grid(Year~Country)+
labs(title='Data Scientist and ML Specialist Jobs by Year and Country', x='Data Scientist and ML Specialist Jobs', y='Count') #facet_grid() on both Year and Country

Variables used in analysis: Data.scientist.or.machine.learning.specialist (1 if yes, 0 if not); Country (a subset of 7 major countries); Year (2019 or 2020).
Argument 2 of the article concerned the availability of data scientists and ML specialist jobs by year and country. The article makes the claim that when all countries are taken together, the number of Data Scientist and ML Specialist jobs have noticeably decreased from 2019-2020.
The article is correct, and there is strong evidence in the above exploratory data analysis to prove the claim true. For each of the 7 countries analyzed, facet_grid() was employed to observe the data with Country as the columns and Year as the rows. From 2019-2020, every country showed a decline in the number (count) of Data Scientist or ML Specialist employees (with Ireland staying about the same.) Thus, even when the data is grouped into smaller categories, the overall trend in job reduction remains.
This result is surprising considering the rise of data science as a field in the recent decade. There are 2 reasons this trend may have been observed, and can inform future analysis for this project: 1)There is a simultaneous rise in specialty-related data science jobs, careers in analytics, and careers in machine learning that are not classified under the title “Data Scientist” or “ML Specialist” as encoded in the data. The rise in these other job titles may account for this observed decline. 2)While grouping by country may not reveal a different trend, grouping by YearsCodePro (years of professional coding), EdLevel (educational level), etc. may show different trends for the availability of these 2 job titles over the span of 2019-2020.