P2-Koefisien Determinasi dan Korelasi

Data

Download data here.

library(readxl)
salary<-read.csv("D:/Salary_dataset.csv")
salary<-salary[,-1]
head(salary)
##   YearsExperience Salary
## 1             1.2  39344
## 2             1.4  46206
## 3             1.6  37732
## 4             2.1  43526
## 5             2.3  39892
## 6             3.0  56643

Eksplorasi Data

plot(salary$YearsExperience, salary$Salary,xlab="Years Experience",ylab="Salary",pch=16)

Persamaan Regresi

Persamaan Regresi dengan MKT manual

salary$xdif <- salary$YearsExperience-mean(salary$YearsExperience)
salary$ydif <- salary$Salary-mean(salary$Salary)
salary$crp <- salary$xdif*salary$ydif
salary$xsq <- salary$xdif^2
#estimator b0 dan b1
b1 <- sum(salary$crp)/sum(salary$xsq)
b1
## [1] 9449.962
#Parameter Estimates
b0 <- mean(salary$Salary) - b1 * mean(salary$YearsExperience)
b0
## [1] 24848.2

Persamaan Regresi dengan R function

#atau gunakan perintah
model1<-lm(Salary~YearsExperience,data=salary)
summary(model1)
## 
## Call:
## lm(formula = Salary ~ YearsExperience, data = salary)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7958.0 -4088.5  -459.9  3372.6 11448.0 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      24848.2     2306.7   10.77 1.82e-11 ***
## YearsExperience   9450.0      378.8   24.95  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5788 on 28 degrees of freedom
## Multiple R-squared:  0.957,  Adjusted R-squared:  0.9554 
## F-statistic: 622.5 on 1 and 28 DF,  p-value: < 2.2e-16

Plot Regresi

plot(salary$YearsExperience, salary$Salary,xlab="YearsExperience",ylab="Salary",pch=16)
abline(model1)

Koefisien Determinasi

JKG<-sum((salary$Salary-model1$fitted.values)^2)
JKG
## [1] 938128552
JKR<-sum(((model1$fitted.values-mean(salary$Salary))^2))
JKR
## [1] 20856849300
JKT<-sum((salary$Salary-mean(salary$Salary))^2)
JKT
## [1] 21794977852
#keofisien determinasi
r_sqr<-(JKT-JKG)/JKT
r_sqr
## [1] 0.9569567

Korelasi

#keofisien korelasi
sqrt(r_sqr)
## [1] 0.9782416

ANOVA

anova(model1)
## Analysis of Variance Table
## 
## Response: Salary
##                 Df     Sum Sq    Mean Sq F value    Pr(>F)    
## YearsExperience  1 2.0857e+10 2.0857e+10  622.51 < 2.2e-16 ***
## Residuals       28 9.3813e+08 3.3505e+07                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##koefisien korelasi
x1_x1bar<-salary$YearsExperience-mean(salary$YearsExperience)
x2_x2bar<-salary$Salary-mean(salary$Salary)
A<-sum(x1_x1bar*x2_x2bar)
varx1<-sum((x1_x1bar)^2)
varx2<-sum((x2_x2bar)^2)
B<-sqrt(varx1*varx2)
corr<-A/B
corr
## [1] 0.9782416
#atau gunakan fungsi cor
cor(salary$YearsExperience,salary$Salary)
## [1] 0.9782416

Uji Signifikansi Korelasi

#Pearson Correlation test, alpha 1%
cor.test( ~ YearsExperience + Salary, data=salary, method = "pearson", continuity = FALSE, conf.level = 0.99)
## 
##  Pearson's product-moment correlation
## 
## data:  YearsExperience and Salary
## t = 24.95, df = 28, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 99 percent confidence interval:
##  0.9424207 0.9918711
## sample estimates:
##       cor 
## 0.9782416

Excercise

  1. For a random sample of Indian states, the ANOVA table shown refers to hypothetical data on x = tax revenue in Indian rupees and y = agricultural subsidies in Indian rupees. Fill in the blanks in the table.

  1. The table here shows the ANOVA table for a regression analysis of y = the selling price (in thousands of dollars) and x = the size of house (in thousands of square feet). The prediction equation is \(\hat{y} = 9.2 + 77x\).

  • What was the sample size?

  • The sample mean house size was 1.53 thousand square feet. What was the sample mean selling price? (Hint: What does \(\hat{y}\) equal when \(x = \bar{x}\) ?)

  1. Create a regression equation based on the data in this link give interpretation for the regression equation, determine the determination coefficient, correlation, and do the hypothesis testing for correlation.