myData = read.csv("Test.csv")
First I need to convert the Salary in Rupees column from characters to a double. R cannot make histograms of columns that are not numeric.
myData$Salary_In_Rupees = as.numeric(gsub(",","", myData$Salary_In_Rupees))
Salary in Rupees will be the column I choose for the histogram to see the distribution of salaries.The graph seems to imply that most of the salaries are on the lower end.
hist(myData$Salary_In_Rupees, main = "Histogram of Salaries in Rupees", xlab="Salary in Rupees", ylab = "Frequency")
## Scatter plot I will check if remote working ratio and salary in
rupees have a correlation between each other. There is no apparent
correlation.
library(ggplot2)
ggplot(myData,aes(x=Remote_Working_Ratio,y=Salary_In_Rupees)) +
geom_point(color = "black")+
labs(title = " Remote Working Ratio vs Salary In Rupees", x = "Remote Working Ratio", y = "Salary in Rupees")
#Calculations I will use the average salary calculated as a standard for comparing salaries in different locations.
mean(myData$Salary_In_Rupees)
## [1] 8935485
The regression test oddly enough seems to imply there is a positive correlation since the line is rising as the remote working ratio increases. It could mean that people who work more remotely get paid more.
library(ggplot2)
ggplot(myData,aes(x=Remote_Working_Ratio,y=Salary_In_Rupees)) +
geom_point(color = "black")+
geom_smooth(method = "lm", se = FALSE, color = "blue")+
labs(title = " Remote Working Ratio vs Salary In Rupees", x = "Remote Working Ratio", y = "Salary in Rupees")
## `geom_smooth()` using formula = 'y ~ x'
# T Test I would like to see if Japan pays out better than other
countries, so I will compare the salary entries in all countries besides
Japan and all entries including Japan. There are much fewer entries than
I imagine and based on the P value, we fail to reject the null
hypothesis. We cannot support the hypothesis that japan pays out better
than other countries.
japan = subset(myData, Company_Location=="JP")
notJapan = subset(myData, Company_Location!="JP")
test_result = t.test(japan$Salary_In_Rupees, notJapan$Salary_In_Rupees)
print(test_result)
##
## Welch Two Sample t-test
##
## data: japan$Salary_In_Rupees and notJapan$Salary_In_Rupees
## t = 0.053955, df = 5.0721, p-value = 0.959
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -6827737 7121782
## sample estimates:
## mean of x mean of y
## 9081055 8934032