When job seekers create their profiles on Stack Overflow Careers, they have the option of specifying specific programming languages and technologies that they like or dislike. Examining this data allows us unique insight into the programming languages and technologies that IT professionals like and dislike.

The dataset used in this report was provided by Dave Robinson from StackOverflow and contains around 20,000 records with variables. The variables include:

We used this data to answer the following research questions:

  1. Which programming languages are the most liked in each state?
  2. Which programming languages are the most disliked in each state? Note: We used both raw scores and percentages in our analysis. In cases where the percentages were tied, we used total number or likes or dislikes to determined the most liked or most disliked programming language.

The analysis was done using R and the results were mapped using Tableau.

Answer 1: Finding out the most liked tag/programming language by state:

# Reading the data
MyData_df <- read.csv(file="C:\\Users\\jgarg\\OneDrive\\Documents\\like_dislike_by_state.csv", header=TRUE, sep=",")

# Displaying the data
head(MyData_df)
##          tag state likes dislikes total         X
## 1     python    AK    15        1    16  93.75000
## 2 javascript    AK    14        0    14 100.00000
## 3      linux    AK    13        1    14  92.85714
## 4         c#    AK    11        2    13  84.61538
## 5      mysql    AK    12        0    12 100.00000
## 6     jquery    AK    11        0    11 100.00000
# Looking at the structure of the data
str(MyData_df)
## 'data.frame':    19962 obs. of  6 variables:
##  $ tag     : Factor w/ 721 levels ".net","3d","access-vba",..: 478 297 340 90 393 308 454 294 141 234 ...
##  $ state   : Factor w/ 50 levels "AK","AL","AR",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ likes   : int  15 14 13 11 12 11 10 6 7 7 ...
##  $ dislikes: int  1 0 1 2 0 0 1 3 0 0 ...
##  $ total   : int  16 14 14 13 12 11 11 9 7 7 ...
##  $ X       : num  93.8 100 92.9 84.6 100 ...
# Loading data.table package
library(data.table)
## Warning: package 'data.table' was built under R version 3.2.3
# Converting data to data.table with key=state
dt1<-data.table(MyData_df,key="state")

# Grouping the data by state 
# Then finding maximum likes per state 
# Finally, subsetting the data.
likes<-dt1[, .SD[likes %in% max(likes)], by=state]

# Checking the class of likes
class(likes)
## [1] "data.table" "data.frame"
# Converting the datatable to a dataframe
likes_df<-as.data.frame(likes)

# Checking the class of likes (again)
class(likes_df)
## [1] "data.frame"
# Obtaining the relevant columns
likes_df<-likes_df[,c("state","tag","likes")]

# Displaying top 6 rows of the data
head(likes_df)
##   state        tag likes
## 1    AK     python    15
## 2    AL         c#    78
## 3    AR       java    40
## 4    AZ javascript   183
## 5    CA javascript  2131
## 6    CO javascript   320
# Writing the csv file
write.csv(likes_df, "likes_by_states.csv")

Map 1: Map showing maximum likes for each state

Map showing maximum likes for each state

Highlights from the Map:

  1. The most liked programming languages are JavaScript, C#, Python and Java. California has the most likes for any programming language with 2131 likes for JavaScript.

  2. For the state Wyoming, there are two programming languages with the highest total of likes C# = 10 JavaScript = 10 therefore its color is mixed representing both C# and JavaScript.

Answer 2: Finding the most disliked tag/programming language by state:

# Grouping the data by state 
# Then finding maximum dislikes per state 
# Finally, subsetting the data.
dislikes<-dt1[, .SD[dislikes %in% max(dislikes)], by=state]


# Checking the class of dislikes
class(dislikes)
## [1] "data.table" "data.frame"
# Converting the data.table to a dataframe
dislikes_df<-as.data.frame(dislikes)

# Checking the class of dislikes (again)
class(dislikes_df)
## [1] "data.frame"
# Obtaining the relevant columns
dislikes_df<-dislikes_df[,c("state","tag","dislikes")]

# Writing the csv file
write.csv(dislikes_df, "dislikes_by_states.csv")

Map 2.: Map showing maximum dislikes for each state

Map showing maximum dislikes for each state

Highlights from the Map:

  1. The most disliked programing language/technology is PHP. California has the most dislikes for any programming language with 312 dislikes for PHP.

  2. For some of states, there are more than one tags that have got same number of maximum dislikes and therefore they have got mixed color. These states are:

Answer 1.1:

Data Manipulation for percentage calculation:

Two new columns were added to the data – likes_percentage (likes/total) and dislikes_percentage (dislikes/total) – to facilitate the percentage analysis.

# Reading csv file   
Data_df <- read.csv(file="C:\\Users\\jgarg\\OneDrive\\Documents\\like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")

# Converting data from file to a data.table   
dt1<-data.table(Data_df,key="state")

# Grouping the data by state 
# Then finding maximum number of likes per state 
# Finally, subsetting the data.
likes_percentage<-dt1[, .SD[likes%in% max(likes)], by=state]

# Converting the data.table to a dataframe
likes_percentage_df<-as.data.frame(likes_percentage)

# Obtaining the relevant columns
likes_percentage_df<-likes_percentage_df[,c("state","tag","likes_percentage","likes","dislikes")]

# Displaying first six rows
head(likes_percentage_df)
##   state        tag likes_percentage likes dislikes
## 1    AK     python            93.75    15        1
## 2    AL         c#            96.30    78        3
## 3    AR       java            95.24    40        2
## 4    AZ javascript            99.46   183        1
## 5    CA javascript            98.29  2131       37
## 6    CO javascript            97.26   320        9
# Writing a *.csv file ( It will only show the tags which have the maximum number of likes)
write.csv(likes_percentage_df, "likes_percentage_by_states.csv")

Map 3: Map showing maximum likes along with percentages by state

Map showing maximum likes along with percentages by state

# Obtain tags that have max percentages
likes_percentage_comp<-dt1[, .SD[likes_percentage%in% max(likes_percentage)], by=state]

# Convert data to a dataframe
likes_percentage_df_comp<-as.data.frame(likes_percentage_comp)

# Obtaining the relevant columns
likes_percentage_df_comp<-likes_percentage_df_comp[,c("state","tag","likes_percentage","likes","dislikes")]

# Converting the dataframe to a data.table
dt2<-data.table(likes_percentage_df_comp)

# Obtaining the maximums from the likes column
likes_percentage1<-dt2[, .SD[likes %in% max(likes)], by=state]

# Displaying first six rows
head(likes_percentage1)
##    state        tag likes_percentage likes dislikes
## 1:    AK javascript              100    14        0
## 2:    AL javascript              100    70        0
## 3:    AR     jquery              100    31        0
## 4:    AZ     jquery              100   118        0
## 5:    CA      html5              100   624        0
## 6:    CO     jquery              100   172        0
# Writing the *.csv file
write.csv(likes_percentage1,"likes_compared_percentage_by_states.csv")

Map 4: Map showing maximum likes by percentage for each state

Map showing maximum likes by percentage for each state

Highlight from the above two maps:

  1. Map 3 shows that in the state of California JavaScript received the maximum number of likes (2131). However, Map 4 shows that HTML5 has the highest percentage of likes. HTML5 has 0 dislikes, giving it 100 percent of likes. We find similar results for other states.

Answer 2.1:

# Reading the *.csv file   
Data_df <- read.csv(file="C:\\Users\\jgarg\\OneDrive\\Documents\\like_dislike_by_state_with_percent.csv", header=TRUE, sep=",")

# Converting file data to a data.table   
dt3<-data.table(Data_df,key="state")

# Grouping the data by state 
# Then finding maximum number of dislikes (by percentage) per state 
# Finally, subsetting the data.
dislikes_percentage<-dt3[, .SD[dislikes%in% max(dislikes)], by=state]

# Converting the data.table to a dataframe
dislikes_percentage_df<-as.data.frame(dislikes_percentage)

# Obtaining the relevant columns
dislikes_percentage_df<-dislikes_percentage_df[,c("state","tag","dislikes_percentage","likes","dislikes")]

# Displaying first six rows
head(dislikes_percentage_df)
##   state         tag dislikes_percentage likes dislikes
## 1    AK        java               33.33     6        3
## 2    AK     windows               75.00     1        3
## 3    AK asp-classic              100.00     0        3
## 4    AL        java               13.56    51        8
## 5    AL      vb.net               44.44    10        8
## 6    AR     windows               62.50     3        5
# Writing the *.csv file ( It will only show the tags with the maximum number of dislikes)
write.csv(dislikes_percentage_df, "dislikes_percentage_by_states.csv")

Map 5: Map showing maximum dislikes along with percentages for each state

Map showing maximum dislikes along with percentages for each state

# Only take tags with max percentages
dislikes_percentage_comp<-dt3[, .SD[dislikes_percentage%in% max(dislikes_percentage)], by=state]

# Converting data to dataframe
dislikes_percentage_df_comp<-as.data.frame(dislikes_percentage_comp)

# Obtaining the relevant columns
dislikes_percentage_df_comp<-dislikes_percentage_df_comp[,c("state","tag","dislikes_percentage","likes","dislikes")]

# Converting dataframe to to data.table
dt4<-data.table(dislikes_percentage_df_comp)

# Obtaining maximums from dislikes column
dislikes_percentage1<-dt4[, .SD[dislikes %in% max(dislikes)], by=state]

# Displaying first six rows
head(dislikes_percentage1)
##    state               tag dislikes_percentage likes dislikes
## 1:    AK       asp-classic                 100     0        3
## 2:    AL               vb6                 100     0        5
## 3:    AR       asp-classic                 100     0        3
## 4:    AZ             cobol                 100     0        6
## 5:    CA internet-explorer                 100     0       43
## 6:    CO             cobol                 100     0        8
# Writing the *.csv file
write.csv(dislikes_percentage1,"dislikes_compared_percentage_by_states.csv")  

Map6: Map showing maximum dislikes by percentage for each state

Map showing maximum dislikes by percentage for each state

Highlight from the above two maps:

  1. Map 5 shows that PHP received the maximum number of dislikes (312) in the state of California. However, Map 6 shows that Internet Explorer is actually to most disliked programming language/technology by percentage. Internet Explorer has 0 likes, giving it 100 percent of dislikes. We find similar results for other states.