Loading Dataset

myData = read.csv("Test.csv")

Converting character to double

First I need to convert the Salary in Rupees column from characters to a double. R cannot make histograms of columns that are not numeric.

myData$Salary_In_Rupees = as.numeric(gsub(",","", myData$Salary_In_Rupees))

Histogram

Salary in Rupees will be the column I choose for the histogram to see the distribution of salaries.The graph seems to imply that most of the salaries are on the lower end.

hist(myData$Salary_In_Rupees, main = "Histogram of Salaries in Rupees", xlab="Salary in Rupees", ylab = "Frequency")

## Scatter plot I will check if remote working ratio and salary in rupees have a correlation between each other. There is no apparent correlation.

library(ggplot2)
ggplot(myData,aes(x=Remote_Working_Ratio,y=Salary_In_Rupees)) + 
  geom_point(color = "black")+
  labs(title = " Remote Working Ratio vs Salary In Rupees", x = "Remote Working Ratio", y = "Salary in Rupees")

#Calculations I will use the average salary calculated as a standard for comparing salaries in different locations.

mean(myData$Salary_In_Rupees)

## [1] 8935485

Regression

The regression test oddly enough seems to imply there is a positive correlation since the line is rising as the remote working ratio increases. It could mean that people who work more remotely get paid more.

library(ggplot2)
ggplot(myData,aes(x=Remote_Working_Ratio,y=Salary_In_Rupees)) + 
  geom_point(color = "black")+
   geom_smooth(method = "lm", se = FALSE, color = "blue")+
  labs(title = " Remote Working Ratio vs Salary In Rupees", x = "Remote Working Ratio", y = "Salary in Rupees")

## `geom_smooth()` using formula = 'y ~ x'

# T Test I would like to see if Japan pays out better than other countries, so I will compare the salary entries in all countries besides Japan and all entries including Japan. There are much fewer entries than I imagine and based on the P value, we fail to reject the null hypothesis. We cannot support the hypothesis that japan pays out better than other countries.

japan = subset(myData, Company_Location=="JP")
notJapan = subset(myData, Company_Location!="JP")
test_result = t.test(japan$Salary_In_Rupees, notJapan$Salary_In_Rupees)
print(test_result)

## 
##  Welch Two Sample t-test
## 
## data:  japan$Salary_In_Rupees and notJapan$Salary_In_Rupees
## t = 0.053955, df = 5.0721, p-value = 0.959
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -6827737  7121782
## sample estimates:
## mean of x mean of y 
##   9081055   8934032

DataSalary

Kazuki

2024-09-03

Loading Dataset

Converting character to double

Histogram

Regression