2024-03-20

INTRODUCTION

For this project, I am exploring the Statistics topic of Simple Linear Regression using R. The goal for this project is to interpret the Salary data set retrieved from https://www.kaggle.com/, which a sample size of 30 and relates an individuals years of experience in a certain field to the their corresponding annual salary. In this project, I will plot the data into a graph and then add a trend line to study the if the graph produces a positive or negative slope.

IMPORTING THE NECESSARY LIBRARIES

Let’s import the necessary libraries that are being used throughout the project.

library(plotly)
library(ggplot2)
library(knitr)
library(dplyr)

IMPORTING THE DATASET

Now, let’s import the data set that is being utilized using the read.csv command

data <- read.csv("Salary_dataset.csv")
data
##    X. Years_of_Experience Salary
## 1   0                 1.2  39344
## 2   1                 1.4  46206
## 3   2                 1.6  37732
## 4   3                 2.1  43526
## 5   4                 2.3  39892
## 6   5                 3.0  56643
## 7   6                 3.1  60151
## 8   7                 3.3  54446
## 9   8                 3.3  64446
## 10  9                 3.8  57190
## 11 10                 4.0  63219
## 12 11                 4.1  55795
## 13 12                 4.1  56958
## 14 13                 4.2  57082
## 15 14                 4.6  61112
## 16 15                 5.0  67939
## 17 16                 5.2  66030
## 18 17                 5.4  83089
## 19 18                 6.0  81364
## 20 19                 6.1  93941
## 21 20                 6.9  91739
## 22 21                 7.2  98274
## 23 22                 8.0 101303
## 24 23                 8.3 113813
## 25 24                 8.8 109432
## 26 25                 9.1 105583
## 27 26                 9.6 116970
## 28 27                 9.7 112636
## 29 28                10.4 122392
## 30 29                10.6 121873

SALARY VS YEARS OF EXPERIENCE PLOTLY PLOT CODE

This slide shows the code that was used to generate the Scatter plot in the next slide.

plot <- plot_ly(data, x = ~Years_of_Experience, y = ~Salary, z = ~1,
        type = "scatter3d", colors = "blue",mode = "markers", marker = list(size = 5))

SALARY VS YEARS OF EXPERIENCE 3D SCATTER PLOT

This graph shows a Salary vs. Years of Experience 3D Scatter Plot. The x-axis represents the years of experience an individual has and the y-axis shows their corresponding annual salary.

plot

SALARY VS YEARS OF EXPERIENCE GGPLOT PLOT CODE

plot2 <- ggplot(data, aes(x = Years_of_Experience, y = Salary)) +
  geom_point(color = 'blue') + theme_minimal()
  labs(title = "Scatter plot of Salary vs. Years of Experience", x = "Years of Experience", y = "Salary")
## $x
## [1] "Years of Experience"
## 
## $y
## [1] "Salary"
## 
## $title
## [1] "Scatter plot of Salary vs. Years of Experience"
## 
## attr(,"class")
## [1] "labels"

SALARY VS YEARS OF EXPERIENCE SCATTER PLOT

plot2

SALARY VS YEARS OF EXPERIENCE TRENDLINE

plot3 <- ggplot(data, aes(x = Years_of_Experience, y = Salary)) +
  geom_point(color = 'red') +            
  geom_smooth(method = "lm", se = FALSE) + 
  theme_minimal() +                        
  labs(title = "Scatter plot of Salary vs. Years of Experience", 
       x = "Years of Experience", 
       y = "Salary") 

SALARY VS YEARS OF EXPERIENCE SCATTER PLOT WITH TRENDLINE

plot3

LINEAR REGRESSION FORMULA

\[ \hat{y} = \beta_0 + \beta_1 x + \epsilon \]

EXPLAINING WHAT THE PARAMETERS MEAN

  • \(\hat{y}\) : Predicted (estimated) value of the dependent variable (response variable) - \(\beta_0\) : Intercept, the value of the dependent variable when all independent variables are zero.
  • \(\beta_1\) : Slope coefficient, the change in the dependent variable for a one-unit change in the independent variable.
  • \(x\) : Independent variable
  • \(\epsilon\) : Error term, represents the difference between the observed and predicted values of the dependent variable.

CONCLUSION

From the graphs plotted above, we can see that there is a linear trendline with a positive slope. Hence, we can conclude that an individual with more years of experience will also earn a higher salary.