When job seekers create their profiles on Stack Overflow Careers, they have the option of specifying specific programming languages and technologies that they like or dislike. Examining this data allows us unique insight into the programming languages and technologies that IT professionals like and dislike.
The dataset used in this report was provided by Dave Robinson from StackOverflow and contains around 20,000 records with variables. The variables include:
We used this data to answer the following research questions:
The analysis was done using R and the results were mapped using Tableau.
# Reading the data
MyData_df <- read.csv(file="C:\\Users\\jgarg\\OneDrive\\Documents\\like_dislike_by_state.csv", header=TRUE, sep=",")
# Displaying the data
head(MyData_df)
## tag state likes dislikes total X
## 1 python AK 15 1 16 93.75000
## 2 javascript AK 14 0 14 100.00000
## 3 linux AK 13 1 14 92.85714
## 4 c# AK 11 2 13 84.61538
## 5 mysql AK 12 0 12 100.00000
## 6 jquery AK 11 0 11 100.00000
# Looking at the structure of the data
str(MyData_df)
## 'data.frame': 19962 obs. of 6 variables:
## $ tag : Factor w/ 721 levels ".net","3d","access-vba",..: 478 297 340 90 393 308 454 294 141 234 ...
## $ state : Factor w/ 50 levels "AK","AL","AR",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ likes : int 15 14 13 11 12 11 10 6 7 7 ...
## $ dislikes: int 1 0 1 2 0 0 1 3 0 0 ...
## $ total : int 16 14 14 13 12 11 11 9 7 7 ...
## $ X : num 93.8 100 92.9 84.6 100 ...
# Loading data.table package
library(data.table)
## Warning: package 'data.table' was built under R version 3.2.3
# Converting data to data.table with key=state
dt1<-data.table(MyData_df,key="state")
# Grouping the data by state
# Then finding maximum likes per state
# Finally, subsetting the data.
likes<-dt1[, .SD[likes %in% max(likes)], by=state]
# Checking the class of likes
class(likes)
## [1] "data.table" "data.frame"
# Converting the datatable to a dataframe
likes_df<-as.data.frame(likes)
# Checking the class of likes (again)
class(likes_df)
## [1] "data.frame"
# Obtaining the relevant columns
likes_df<-likes_df[,c("state","tag","likes")]
# Displaying top 6 rows of the data
head(likes_df)
## state tag likes
## 1 AK python 15
## 2 AL c# 78
## 3 AR java 40
## 4 AZ javascript 183
## 5 CA javascript 2131
## 6 CO javascript 320
# Writing the csv file
write.csv(likes_df, "likes_by_states.csv")
The most liked programming languages are JavaScript, C#, Python and Java. California has the most likes for any programming language with 2131 likes for JavaScript.
For the state Wyoming, there are two programming languages with the highest total of likes C# = 10 JavaScript = 10 therefore its color is mixed representing both C# and JavaScript.
# Grouping the data by state
# Then finding maximum dislikes per state
# Finally, subsetting the data.
dislikes<-dt1[, .SD[dislikes %in% max(dislikes)], by=state]
# Checking the class of dislikes
class(dislikes)
## [1] "data.table" "data.frame"
# Converting the data.table to a dataframe
dislikes_df<-as.data.frame(dislikes)
# Checking the class of dislikes (again)
class(dislikes_df)
## [1] "data.frame"
# Obtaining the relevant columns
dislikes_df<-dislikes_df[,c("state","tag","dislikes")]
# Writing the csv file
write.csv(dislikes_df, "dislikes_by_states.csv")
The most disliked programing language/technology is PHP. California has the most dislikes for any programming language with 312 dislikes for PHP.
For some of states, there are more than one tags that have got same number of maximum dislikes and therefore they have got mixed color. These states are:
Two new columns were added to the data – likes_percentage (likes/total) and dislikes_percentage (dislikes/total) – to facilitate the percentage analysis.
# Reading csv file
Data_df <- read.csv(file="C:\\Users\\jgarg\\OneDrive\\Documents\\like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")
# Converting data from file to a data.table
dt1<-data.table(Data_df,key="state")
# Grouping the data by state
# Then finding maximum number of likes per state
# Finally, subsetting the data.
likes_percentage<-dt1[, .SD[likes%in% max(likes)], by=state]
# Converting the data.table to a dataframe
likes_percentage_df<-as.data.frame(likes_percentage)
# Obtaining the relevant columns
likes_percentage_df<-likes_percentage_df[,c("state","tag","likes_percentage","likes","dislikes")]
# Displaying first six rows
head(likes_percentage_df)
## state tag likes_percentage likes dislikes
## 1 AK python 93.75 15 1
## 2 AL c# 96.30 78 3
## 3 AR java 95.24 40 2
## 4 AZ javascript 99.46 183 1
## 5 CA javascript 98.29 2131 37
## 6 CO javascript 97.26 320 9
# Writing a *.csv file ( It will only show the tags which have the maximum number of likes)
write.csv(likes_percentage_df, "likes_percentage_by_states.csv")
# Obtain tags that have max percentages
likes_percentage_comp<-dt1[, .SD[likes_percentage%in% max(likes_percentage)], by=state]
# Convert data to a dataframe
likes_percentage_df_comp<-as.data.frame(likes_percentage_comp)
# Obtaining the relevant columns
likes_percentage_df_comp<-likes_percentage_df_comp[,c("state","tag","likes_percentage","likes","dislikes")]
# Converting the dataframe to a data.table
dt2<-data.table(likes_percentage_df_comp)
# Obtaining the maximums from the likes column
likes_percentage1<-dt2[, .SD[likes %in% max(likes)], by=state]
# Displaying first six rows
head(likes_percentage1)
## state tag likes_percentage likes dislikes
## 1: AK javascript 100 14 0
## 2: AL javascript 100 70 0
## 3: AR jquery 100 31 0
## 4: AZ jquery 100 118 0
## 5: CA html5 100 624 0
## 6: CO jquery 100 172 0
# Writing the *.csv file
write.csv(likes_percentage1,"likes_compared_percentage_by_states.csv")
# Reading the *.csv file
Data_df <- read.csv(file="C:\\Users\\jgarg\\OneDrive\\Documents\\like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")
# Converting file data to a data.table
dt3<-data.table(Data_df,key="state")
# Grouping the data by state
# Then finding maximum number of dislikes (by percentage) per state
# Finally, subsetting the data.
dislikes_percentage<-dt3[, .SD[dislikes%in% max(dislikes)], by=state]
# Converting the data.table to a dataframe
dislikes_percentage_df<-as.data.frame(dislikes_percentage)
# Obtaining the relevant columns
dislikes_percentage_df<-dislikes_percentage_df[,c("state","tag","dislikes_percentage","likes","dislikes")]
# Displaying first six rows
head(dislikes_percentage_df)
## state tag dislikes_percentage likes dislikes
## 1 AK java 33.33 6 3
## 2 AK windows 75.00 1 3
## 3 AK asp-classic 100.00 0 3
## 4 AL java 13.56 51 8
## 5 AL vb.net 44.44 10 8
## 6 AR windows 62.50 3 5
# Writing the *.csv file ( It will only show the tags with the maximum number of dislikes)
write.csv(dislikes_percentage_df, "dislikes_percentage_by_states.csv")
# Only take tags with max percentages
dislikes_percentage_comp<-dt3[, .SD[dislikes_percentage%in% max(dislikes_percentage)], by=state]
# Converting data to dataframe
dislikes_percentage_df_comp<-as.data.frame(dislikes_percentage_comp)
# Obtaining the relevant columns
dislikes_percentage_df_comp<-dislikes_percentage_df_comp[,c("state","tag","dislikes_percentage","likes","dislikes")]
# Converting dataframe to to data.table
dt4<-data.table(dislikes_percentage_df_comp)
# Obtaining maximums from dislikes column
dislikes_percentage1<-dt4[, .SD[dislikes %in% max(dislikes)], by=state]
# Displaying first six rows
head(dislikes_percentage1)
## state tag dislikes_percentage likes dislikes
## 1: AK asp-classic 100 0 3
## 2: AL vb6 100 0 5
## 3: AR asp-classic 100 0 3
## 4: AZ cobol 100 0 6
## 5: CA internet-explorer 100 0 43
## 6: CO cobol 100 0 8
# Writing the *.csv file
write.csv(dislikes_percentage1,"dislikes_compared_percentage_by_states.csv")