Users on Stack Overflow careers are allowed to create customized profiles (“CVs”) to show to the perspective employers. As a part of these profiles, they have the option of specifying specific technologies they like or dislike.

There are many ways to know which technologies people tend to like, but this provide a rare way to find out what technologies people tend to dislike.

The dataset used in this report is provided by Dave Robinson from StackOverflow and contains around 20,000 records with variables. The variables include:

We tried to answer the below mentioned questions:

  1. Which tag/technology is most liked and disliked in each state?
  2. There are many tags/technologies which have got zero dislikes but because their total number of likes are less, they remain behind in the race of “most liked” for each state. So, for this we calculated like percentage to see which tag is most liked while considering their dislikes as well. Similarly we calculated dislike percentage to see which tag is most disliked while considering their likes as well.

The analysis is done using R and results are mapped in Tableau.

Answer-1(a) Finding out the most liked tag/technology state wise:

# reading data
MyData_df <- read.csv(file="like_dislike_by_state.csv", header=TRUE, sep=",")

# displaying data
head(MyData_df)
##          tag state likes dislikes total         X
## 1     python    AK    15        1    16  93.75000
## 2 javascript    AK    14        0    14 100.00000
## 3      linux    AK    13        1    14  92.85714
## 4         c#    AK    11        2    13  84.61538
## 5      mysql    AK    12        0    12 100.00000
## 6     jquery    AK    11        0    11 100.00000
# looking at the structure od data
str(MyData_df)
## 'data.frame':    19962 obs. of  6 variables:
##  $ tag     : Factor w/ 721 levels ".net","3d","access-vba",..: 478 297 340 90 393 308 454 294 141 234 ...
##  $ state   : Factor w/ 50 levels "AK","AL","AR",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ likes   : int  15 14 13 11 12 11 10 6 7 7 ...
##  $ dislikes: int  1 0 1 2 0 0 1 3 0 0 ...
##  $ total   : int  16 14 14 13 12 11 11 9 7 7 ...
##  $ X       : num  93.8 100 92.9 84.6 100 ...
# loading data.table package
library(data.table)

# convert to data.table with key=state
dt1<-data.table(MyData_df,key="state")

# group the data by same state and then apply the condition in which it 
# takes all the likes of that state and matches it with the maximum likes 
# of that particular state and subset that data.
likes<-dt1[, .SD[likes %in% max(likes)], by=state]

# checking class 
class(likes)
## [1] "data.table" "data.frame"
# converting datatable to data frame
likes_df<-as.data.frame(likes)

# checking class
class(likes_df)
## [1] "data.frame"
# taking some relevant columns
likes_df<-likes_df[,c("state","tag","likes")]

# displaying top 6 rows of final data
head(likes_df)
##   state        tag likes
## 1    AK     python    15
## 2    AL         c#    78
## 3    AR       java    40
## 4    AZ javascript   183
## 5    CA javascript  2131
## 6    CO javascript   320
# writing a csv file
write.csv(likes_df, "likes_by_states.csv")

Map-1 : Map showing maximum likes for each state

Map showing maximum likes for each state

Highlights from the Map:

  1. The most liked technologies are javascript, c#, python, java with California being on top with 2131 likes for “Javascript”

  2. For the state Wyoming, there are two tags that have got maximum likes C# = 10 javascript = 10 and therefore it has got mixed color for C# and Javascript.

Answer-1(b) Finding out the most disliked tag/technology state wise:

# group the data by same state and then apply the condition in which it 
# takes all the dislikes of that state and matches it with the maximum dislikes 
# of that particular state and subset that data.
dislikes<-dt1[, .SD[dislikes %in% max(dislikes)], by=state]


# checking class
class(dislikes)
## [1] "data.table" "data.frame"
# converting datatable to dataframe
dislikes_df<-as.data.frame(dislikes)

# checking class
class(dislikes_df)
## [1] "data.frame"
# taking some columns
dislikes_df<-dislikes_df[,c("state","tag","dislikes")]

# writing a csv file
write.csv(dislikes_df, "dislikes_by_states.csv")

Map-2: Map showing maximum dislikes for each state

Map showing maximum dislikes for each state

Highlights from the Map:

  1. The most disliked technology is PHP wiith California being on top with 312 dislikes.

  2. For some of states, there are more than one tags that have got same number of maximum dislikes and therefore they have got mixed color. These states are:

Answer-2(a)

Data Manipulation for percentage calculation:

Two new columns has been added to the data set namely, likes_percentage (likes/total) and dislikes_percentage (dislikes/total) to do the percentage analysis.

# reading csv file   
Data_df <- read.csv(file="like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")

# converting it to data.table   
dt1<-data.table(Data_df,key="state")

# group the data by same state and then apply the condition in which it 
# takes all the likes_percentage of that state and matches it with the maximum likes_percentage
# of that particular state and subset that data.
likes_percentage<-dt1[, .SD[likes%in% max(likes)], by=state]

# converting it to data frame
likes_percentage_df<-as.data.frame(likes_percentage)

# taking some columns
likes_percentage_df<-likes_percentage_df[,c("state","tag","likes_percentage","likes","dislikes")]

#display some
head(likes_percentage_df)
##   state        tag likes_percentage likes dislikes
## 1    AK     python            93.75    15        1
## 2    AL         c#            96.30    78        3
## 3    AR       java            95.24    40        2
## 4    AZ javascript            99.46   183        1
## 5    CA javascript            98.29  2131       37
## 6    CO javascript            97.26   320        9
# writing a csv file ( It will show the tags which have got maximum likes only)
write.csv(likes_percentage_df, "likes_percentage_by_states.csv")

Map-3: Map showing maximum likes along with their percents for each state

Map showing maximum likes along with their percents for each state

# it will take only those tags which have max percentage
likes_percentage_comp<-dt1[, .SD[likes_percentage%in% max(likes_percentage)], by=state]

# converting to data frame
likes_percentage_df_comp<-as.data.frame(likes_percentage_comp)

#taking some columns
likes_percentage_df_comp<-likes_percentage_df_comp[,c("state","tag","likes_percentage","likes","dislikes")]

# converting to data.table
dt2<-data.table(likes_percentage_df_comp)

# taking out the maximum from likes column
likes_percentage1<-dt2[, .SD[likes %in% max(likes)], by=state]

# display some
head(likes_percentage1)
##    state        tag likes_percentage likes dislikes
## 1:    AK javascript              100    14        0
## 2:    AL javascript              100    70        0
## 3:    AR     jquery              100    31        0
## 4:    AZ     jquery              100   118        0
## 5:    CA      html5              100   624        0
## 6:    CO     jquery              100   172        0
# writing csv
write.csv(likes_percentage1,"likes_compared_percentage_by_states.csv")

Map-4: Map showing maximum likes on the basis of their percent for each state

Map showing maximum likes on the basis of their percent for each state

Highlight from the above two maps:

  1. It is clearly shown from Map-3 that for California state javascript has got the maximun number of likes (equal to 2131) but while considering the percentages we can see from Map-4 that html5 is on top because it has 0 dislikes which is giving it a percent of 100. Similarly for other states also.

Answer-2(b):

# reading csv file   
Data_df <- read.csv(file="like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")

# converting it to data.table   
dt3<-data.table(Data_df,key="state")

# group the data by same state and then apply the condition in which it 
# takes all the dislikes_percentage of that state and matches it with the maximum dislikes_percentage
# of that particular state and subset that data.
dislikes_percentage<-dt3[, .SD[dislikes%in% max(dislikes)], by=state]

# converting it to data frame
dislikes_percentage_df<-as.data.frame(dislikes_percentage)

# taking some columns
dislikes_percentage_df<-dislikes_percentage_df[,c("state","tag","dislikes_percentage","likes","dislikes")]

#display some
head(dislikes_percentage_df)
##   state         tag dislikes_percentage likes dislikes
## 1    AK        java               33.33     6        3
## 2    AK     windows               75.00     1        3
## 3    AK asp-classic              100.00     0        3
## 4    AL        java               13.56    51        8
## 5    AL      vb.net               44.44    10        8
## 6    AR     windows               62.50     3        5
# writing a csv file ( It will show the tags which have got maximum dislikes only)
write.csv(dislikes_percentage_df, "dislikes_percentage_by_states.csv")

Map5: showing maximum dislikes along with their percents for each state

Map showing maximum dislikes along with their percents for each state

# it will take only those tags which have max percentage
dislikes_percentage_comp<-dt3[, .SD[dislikes_percentage%in% max(dislikes_percentage)], by=state]

# converting to data frame
dislikes_percentage_df_comp<-as.data.frame(dislikes_percentage_comp)

#taking some columns
dislikes_percentage_df_comp<-dislikes_percentage_df_comp[,c("state","tag","dislikes_percentage","likes","dislikes")]

# converting to data.table
dt4<-data.table(dislikes_percentage_df_comp)

# taking out the maximum from dislikes column
dislikes_percentage1<-dt4[, .SD[dislikes %in% max(dislikes)], by=state]


# display some
head(dislikes_percentage1)
##    state               tag dislikes_percentage likes dislikes
## 1:    AK       asp-classic                 100     0        3
## 2:    AL               vb6                 100     0        5
## 3:    AR       asp-classic                 100     0        3
## 4:    AZ             cobol                 100     0        6
## 5:    CA internet-explorer                 100     0       43
## 6:    CO             cobol                 100     0        8
# writing csv
write.csv(dislikes_percentage1,"dislikes_compared_percentage_by_states.csv")  

Map6: Map showing maximum dislikes on the basis of their percent for each state

Map showing maximum dislikes on the basis of their percent for each state

Highlight from the above two maps:

  1. It is clearly shown from above Map-5 that for California state php has got the maximun number of dislikes (equal to 312) but while considering the percentages we can see from map-6 that internet-explorer is on top because it has 0 likes which is giving it a percent of 100. Similarly for other states also.