mpg1 = read.csv("mpg1.csv", stringsAsFactors = F)
t.test(data = mpg1, cty ~ trans) # t.test(mpg1$cty~mpg1$trans)도 같음
##
## Welch Two Sample t-test
##
## data: cty by trans
## t = -4.5375, df = 132.32, p-value = 1.263e-05
## alternative hypothesis: true difference in means between group auto and group manual is not equal to 0
## 95 percent confidence interval:
## -3.887311 -1.527033
## sample estimates:
## mean in group auto mean in group manual
## 15.96815 18.67532
mpg1.csv를 mpg1로 불러옴
mpg1에 있는 trans 변수의 범주에 따라 drv 범주의 비율에 차이가 있는지 알아봄
trans : 기어 변속방식
drv : 구동방식
귀무가설 : trans에 따라 drv의 차이가 없다
대립가설 : trans에 따라 drv의 차이가 있다
table() 함수와 prop.table() 함수로 교차분석을 해서 trans에 따른 drv의 빈도와 비율을 구함
mpg1 = read.csv("mpg1.csv", stringsAsFactors = F)
table(mpg1$trans, mpg1$drv) # trans와 drv의 교차분석
##
## 4 f r
## auto 75 65 17
## manual 28 41 8
prop.table(table(mpg1$trans, mpg1$drv), 1) #auto와 manual의 drv 비율 분석
##
## 4 f r
## auto 0.4777070 0.4140127 0.1082803
## manual 0.3636364 0.5324675 0.1038961
auto는 4륜구동이 47.8%로 가장 많음
manual에서는 전륜구동이 53.2%로 가장 많음
trans에 따라 drv에 차이가 있어보이나 검정을 해봐야 함
chisq.test() 함수 이외에도 summary() 함수와 table() 함수를 조합해서 구할 수 있음
chisq.test() 함수
chisq.test(mpg1$trans, mpg1$drv)
##
## Pearson's Chi-squared test
##
## data: mpg1$trans and mpg1$drv
## X-squared = 3.1368, df = 2, p-value = 0.2084
chisq.test(table(mpg1$trans, mpg1$drv))
##
## Pearson's Chi-squared test
##
## data: table(mpg1$trans, mpg1$drv)
## X-squared = 3.1368, df = 2, p-value = 0.2084
summary(table(mpg1$trans, mpg1$drv))
## Number of cases in table: 234
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 3.1368, df = 2, p-value = 0.2084
mpg1 = read.csv("mpg1.csv", stringsAsFactors = F)
cor.test(mpg1$cty, mpg1$hwy) # 상관관계분석
##
## Pearson's product-moment correlation
##
## data: mpg1$cty and mpg1$hwy
## t = 49.585, df = 232, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9433129 0.9657663
## sample estimates:
## cor
## 0.9559159
lm(data = mtcars, mpg ~ disp)
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Coefficients:
## (Intercept) disp
## 29.59985 -0.04122
# lm(mpg ~ disp, data = mtcars), lm(mtcars$mpg ~ mtcars$disp)의 결과도 같음
RA = lm(data = mtcars, mpg ~ disp) # 회귀분석 결과를 RA에 넣기기
summary(RA) # 상세한 분석 결과 출력력
##
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8922 -2.2022 -0.9631 1.6272 7.2305
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.599855 1.229720 24.070 < 2e-16 ***
## disp -0.041215 0.004712 -8.747 9.38e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared: 0.7183, Adjusted R-squared: 0.709
## F-statistic: 76.51 on 1 and 30 DF, p-value: 9.38e-10
lm(data = mtcars, mpg ~ disp + hp + wt)
##
## Call:
## lm(formula = mpg ~ disp + hp + wt, data = mtcars)
##
## Coefficients:
## (Intercept) disp hp wt
## 37.105505 -0.000937 -0.031157 -3.800891
#lm(mtcars$mpg ~ mtcars$disp + mtcars$hp + mtcars$wt)의 결과도 같음
RA = lm(data = mtcars, mpg ~ disp + hp + wt) # 회귀분석 결과를 RA에 넣기
summary(RA)
##
## Call:
## lm(formula = mpg ~ disp + hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.891 -1.640 -0.172 1.061 5.861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.105505 2.110815 17.579 < 2e-16 ***
## disp -0.000937 0.010350 -0.091 0.92851
## hp -0.031157 0.011436 -2.724 0.01097 *
## wt -3.800891 1.066191 -3.565 0.00133 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8083
## F-statistic: 44.57 on 3 and 28 DF, p-value: 8.65e-11