Users on Stack Overflow careers are allowed to create customized profiles (“CVs”) to show to the perspective employers. As a part of these profiles, they have the option of specifying specific technologies they like or dislike.
There are many ways to know which technologies people tend to like, but this provide a rare way to find out what technologies people tend to dislike.
The dataset used in this report is provided by Dave Robinson from StackOverflow and contains around 20,000 records with variables. The variables include:
We tried to answer the below mentioned questions:
The analysis is done using R and results are mapped in Tableau.
# reading data
MyData_df <- read.csv(file="like_dislike_by_state.csv", header=TRUE, sep=",")
# displaying data
head(MyData_df)
## tag state likes dislikes total X
## 1 python AK 15 1 16 93.75000
## 2 javascript AK 14 0 14 100.00000
## 3 linux AK 13 1 14 92.85714
## 4 c# AK 11 2 13 84.61538
## 5 mysql AK 12 0 12 100.00000
## 6 jquery AK 11 0 11 100.00000
# looking at the structure od data
str(MyData_df)
## 'data.frame': 19962 obs. of 6 variables:
## $ tag : Factor w/ 721 levels ".net","3d","access-vba",..: 478 297 340 90 393 308 454 294 141 234 ...
## $ state : Factor w/ 50 levels "AK","AL","AR",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ likes : int 15 14 13 11 12 11 10 6 7 7 ...
## $ dislikes: int 1 0 1 2 0 0 1 3 0 0 ...
## $ total : int 16 14 14 13 12 11 11 9 7 7 ...
## $ X : num 93.8 100 92.9 84.6 100 ...
# loading data.table package
library(data.table)
# convert to data.table with key=state
dt1<-data.table(MyData_df,key="state")
# group the data by same state and then apply the condition in which it
# takes all the likes of that state and matches it with the maximum likes
# of that particular state and subset that data.
likes<-dt1[, .SD[likes %in% max(likes)], by=state]
# checking class
class(likes)
## [1] "data.table" "data.frame"
# converting datatable to data frame
likes_df<-as.data.frame(likes)
# checking class
class(likes_df)
## [1] "data.frame"
# taking some relevant columns
likes_df<-likes_df[,c("state","tag","likes")]
# displaying top 6 rows of final data
head(likes_df)
## state tag likes
## 1 AK python 15
## 2 AL c# 78
## 3 AR java 40
## 4 AZ javascript 183
## 5 CA javascript 2131
## 6 CO javascript 320
# writing a csv file
write.csv(likes_df, "likes_by_states.csv")
The most liked technologies are javascript, c#, python, java with California being on top with 2131 likes for “Javascript”
For the state Wyoming, there are two tags that have got maximum likes C# = 10 javascript = 10 and therefore it has got mixed color for C# and Javascript.
# group the data by same state and then apply the condition in which it
# takes all the dislikes of that state and matches it with the maximum dislikes
# of that particular state and subset that data.
dislikes<-dt1[, .SD[dislikes %in% max(dislikes)], by=state]
# checking class
class(dislikes)
## [1] "data.table" "data.frame"
# converting datatable to dataframe
dislikes_df<-as.data.frame(dislikes)
# checking class
class(dislikes_df)
## [1] "data.frame"
# taking some columns
dislikes_df<-dislikes_df[,c("state","tag","dislikes")]
# writing a csv file
write.csv(dislikes_df, "dislikes_by_states.csv")
The most disliked technology is PHP wiith California being on top with 312 dislikes.
For some of states, there are more than one tags that have got same number of maximum dislikes and therefore they have got mixed color. These states are:
Two new columns has been added to the data set namely, likes_percentage (likes/total) and dislikes_percentage (dislikes/total) to do the percentage analysis.
# reading csv file
Data_df <- read.csv(file="like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")
# converting it to data.table
dt1<-data.table(Data_df,key="state")
# group the data by same state and then apply the condition in which it
# takes all the likes_percentage of that state and matches it with the maximum likes_percentage
# of that particular state and subset that data.
likes_percentage<-dt1[, .SD[likes%in% max(likes)], by=state]
# converting it to data frame
likes_percentage_df<-as.data.frame(likes_percentage)
# taking some columns
likes_percentage_df<-likes_percentage_df[,c("state","tag","likes_percentage","likes","dislikes")]
#display some
head(likes_percentage_df)
## state tag likes_percentage likes dislikes
## 1 AK python 93.75 15 1
## 2 AL c# 96.30 78 3
## 3 AR java 95.24 40 2
## 4 AZ javascript 99.46 183 1
## 5 CA javascript 98.29 2131 37
## 6 CO javascript 97.26 320 9
# writing a csv file ( It will show the tags which have got maximum likes only)
write.csv(likes_percentage_df, "likes_percentage_by_states.csv")
# it will take only those tags which have max percentage
likes_percentage_comp<-dt1[, .SD[likes_percentage%in% max(likes_percentage)], by=state]
# converting to data frame
likes_percentage_df_comp<-as.data.frame(likes_percentage_comp)
#taking some columns
likes_percentage_df_comp<-likes_percentage_df_comp[,c("state","tag","likes_percentage","likes","dislikes")]
# converting to data.table
dt2<-data.table(likes_percentage_df_comp)
# taking out the maximum from likes column
likes_percentage1<-dt2[, .SD[likes %in% max(likes)], by=state]
# display some
head(likes_percentage1)
## state tag likes_percentage likes dislikes
## 1: AK javascript 100 14 0
## 2: AL javascript 100 70 0
## 3: AR jquery 100 31 0
## 4: AZ jquery 100 118 0
## 5: CA html5 100 624 0
## 6: CO jquery 100 172 0
# writing csv
write.csv(likes_percentage1,"likes_compared_percentage_by_states.csv")
# reading csv file
Data_df <- read.csv(file="like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")
# converting it to data.table
dt3<-data.table(Data_df,key="state")
# group the data by same state and then apply the condition in which it
# takes all the dislikes_percentage of that state and matches it with the maximum dislikes_percentage
# of that particular state and subset that data.
dislikes_percentage<-dt3[, .SD[dislikes%in% max(dislikes)], by=state]
# converting it to data frame
dislikes_percentage_df<-as.data.frame(dislikes_percentage)
# taking some columns
dislikes_percentage_df<-dislikes_percentage_df[,c("state","tag","dislikes_percentage","likes","dislikes")]
#display some
head(dislikes_percentage_df)
## state tag dislikes_percentage likes dislikes
## 1 AK java 33.33 6 3
## 2 AK windows 75.00 1 3
## 3 AK asp-classic 100.00 0 3
## 4 AL java 13.56 51 8
## 5 AL vb.net 44.44 10 8
## 6 AR windows 62.50 3 5
# writing a csv file ( It will show the tags which have got maximum dislikes only)
write.csv(dislikes_percentage_df, "dislikes_percentage_by_states.csv")
# it will take only those tags which have max percentage
dislikes_percentage_comp<-dt3[, .SD[dislikes_percentage%in% max(dislikes_percentage)], by=state]
# converting to data frame
dislikes_percentage_df_comp<-as.data.frame(dislikes_percentage_comp)
#taking some columns
dislikes_percentage_df_comp<-dislikes_percentage_df_comp[,c("state","tag","dislikes_percentage","likes","dislikes")]
# converting to data.table
dt4<-data.table(dislikes_percentage_df_comp)
# taking out the maximum from dislikes column
dislikes_percentage1<-dt4[, .SD[dislikes %in% max(dislikes)], by=state]
# display some
head(dislikes_percentage1)
## state tag dislikes_percentage likes dislikes
## 1: AK asp-classic 100 0 3
## 2: AL vb6 100 0 5
## 3: AR asp-classic 100 0 3
## 4: AZ cobol 100 0 6
## 5: CA internet-explorer 100 0 43
## 6: CO cobol 100 0 8
# writing csv
write.csv(dislikes_percentage1,"dislikes_compared_percentage_by_states.csv")