File Location “/Users/xutongzhang/Downloads/archive/Software_Professional_Salaries.csv”
#Intro
TThis project explores a dataset of software professional salaries from 2022, sourced from Glassdoor, containing over 22,700 records from India’s bustling tech industry. Through this exploration, I seek to understand the landscape of software salaries and what factors might influence them, an endeavor that holds personal significance as I navigate my own future career path in tech.
#Libraries and Data cleaning
# Loading Libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
#Data cleaning
# Read the dataset
data <- read.csv("/Users/xutongzhang/Downloads/archive/Software_Professional_Salaries.csv")
# Handle missing values
data <- data %>%
na.omit() %>%
filter(Salary > 0) # Ensuring salary data is valid
#Exploring the data
summary(data)
## Rating Company.Name Job.Title Salary
## Min. :1.000 Length:22774 Length:22774 Min. : 2112
## 1st Qu.:3.700 Class :character Class :character 1st Qu.: 300000
## Median :3.900 Mode :character Mode :character Median : 500000
## Mean :3.918 Mean : 695361
## 3rd Qu.:4.200 3rd Qu.: 900000
## Max. :5.000 Max. :90000000
## Salaries.Reported Location
## Min. : 1.000 Length:22774
## 1st Qu.: 1.000 Class :character
## Median : 1.000 Mode :character
## Mean : 1.856
## 3rd Qu.: 1.000
## Max. :361.000
str(data)
## 'data.frame': 22774 obs. of 6 variables:
## $ Rating : num 3.8 4.5 4 3.8 4.4 4.2 3.7 3.1 3.7 3.6 ...
## $ Company.Name : chr "Sasken" "Advanced Millennium Technologies" "Unacademy" "SnapBizz Cloudtech" ...
## $ Job.Title : chr "Android Developer" "Android Developer" "Android Developer" "Android Developer" ...
## $ Salary : int 400000 400000 1000000 300000 600000 100000 192000 400000 300000 600000 ...
## $ Salaries.Reported: int 3 3 3 3 3 3 3 3 3 3 ...
## $ Location : chr "Bangalore" "Bangalore" "Bangalore" "Bangalore" ...
#Data manipulation
data_grouped <- data %>%
group_by(Location) %>%
summarize(Average_Salary = mean(Salary, na.rm = TRUE))
#Data analysis
# Histogram to see the distribution of salaries
ggplot(data, aes(x = Salary)) +
geom_histogram(binwidth = 50000, fill = "blue", color = "black") +
theme_minimal() +
labs(title = "Distribution of Salaries", x = "Salary (INR)", y = "Frequency")
# Boxplot to identify outliers
ggplot(data, aes(x = Job.Title, y = Salary)) +
geom_boxplot(fill = "orange", color = "darkred") +
theme_minimal() +
labs(title = "Salary Ranges by Job Title", x = "Job Title", y = "Salary (INR)")
#Visualizations
We will do a scatter plot that relates to company rating and salary
library(scales)
# Corrected ggplot code with formatted salary axis
ggplot(data, aes(x = Rating, y = Salary, color = Rating)) +
geom_point(alpha = 0.6) +
scale_color_gradient(low = "red", high = "green") +
theme_bw() +
labs(title = "Company Rating vs. Salary", x = "Company Rating", y = "Salary (Indian Rupees)") +
scale_y_continuous(labels = label_number_si()) +
labs(caption = "Data source: Glassdoor 2022")
## Warning: `label_number_si()` was deprecated in scales 1.2.0.
## ℹ Please use the `scale_cut` argument of `label_number()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#Background Research
The Indian software industry has seen exponential growth over the past decades, becoming a global hub for IT services and software development. According to a report by NASSCOM, the trade association of Indian IT BPM industry, India’s tech industry generated $194 billion in revenue in 2021, marking a 2.3% year-on-year growth. The sector employed around 4.5 million individuals by March 2021. Salary trends within this sector are indicative of the industry’s health and trajectory. With the advent of new technologies and the digital transformation of businesses worldwide, software professionals’ salary scales have been a topic of much discussion, reflecting both the demand for skilled labor and the economic realities of outsourcing.
#Conclusion and Reflection
This exploration of software professional salaries in India has provided valuable insights into the IT industry’s compensation landscape for 2022. From the histogram of salary distribution, it’s evident that a large proportion of the industry’s salaries are clustered at the lower end of the spectrum, with a steep drop-off as salaries increase. This could indicate a high entry-level population or a market that’s top-heavy with a small number of high earners.
The boxplot by job title revealed considerable variation in salaries within specific roles, likely reflecting factors such as company size, location, and individual experience levels. Certain job titles exhibit a wide range of salaries, suggesting that there is potential for upward mobility within those roles.
The scatter plot correlating company ratings with salaries painted a more nuanced picture. It appears that there isn’t a simple, linear relationship between company rating and salary. While higher-rated companies do offer high salaries, there are also high salaries reported within lower-rated companies. This might suggest that while a good company rating might be a factor, it’s not the sole determinant of salary, and other factors like specific skill sets, negotiation skills, and market demand play crucial roles.
Reflecting on the findings, it’s intriguing to see the dynamics at play within the software industry’s salary trends. As a student entering this field, these visualizations underscore the importance of considering multiple factors when evaluating potential employers and job offers—not just the company’s market reputation but also the role’s growth potential and the individual skills I bring to the table.
This analysis has not only provided a clearer picture of the current state of software industry salaries but has also highlighted the complexity of factors influencing compensation. It serves as a foundation for further study, perhaps delving into the impact of education level, company size, or specific programming skills on salary, which could be potential avenues for future projects.