8장 통계 분석

3. 통계 분석 사례

1) 두 집단의 평균 차이 검정 - 독립표본 t검정(t.test())

독립변수는 명목척도, 종속변수는 등간척도 또는 비율척도이어야 함

귀무가설: auto와 manual의 cty평균은 차이가 없다.

mpg1 <- read.csv("mpg1.csv")
str(mpg1)

## 'data.frame':    234 obs. of  5 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ trans       : chr  "auto" "manual" "manual" "auto" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...

t.test(data=mpg1, cty~trans)

## 
##  Welch Two Sample t-test
## 
## data:  cty by trans
## t = -4.5375, df = 132.32, p-value = 1.263e-05
## alternative hypothesis: true difference in means between group auto and group manual is not equal to 0
## 95 percent confidence interval:
##  -3.887311 -1.527033
## sample estimates:
##   mean in group auto mean in group manual 
##             15.96815             18.67532

p-value = 1.263e-05, 귀무가설 기각(유의수준 .05에서 유의미한 차이가 있음)

2) 교차분석 - 카이제곱 검정(chisq.test())

귀무가설: trans에 따라 drv의 차이가 없다.

mpg1 <- read.csv("mpg1.csv")
str(mpg1)

## 'data.frame':    234 obs. of  5 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ trans       : chr  "auto" "manual" "manual" "auto" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...

table(mpg1$trans, mpg1$drv)

##         
##           4  f  r
##   auto   75 65 17
##   manual 28 41  8

prop.table(table(mpg1$trans, mpg1$drv),1)

##         
##                  4         f         r
##   auto   0.4777070 0.4140127 0.1082803
##   manual 0.3636364 0.5324675 0.1038961

# 1
chisq.test(mpg1$trans, mpg1$drv)

## 
##  Pearson's Chi-squared test
## 
## data:  mpg1$trans and mpg1$drv
## X-squared = 3.1368, df = 2, p-value = 0.2084

# 2
chisq.test(table(mpg1$trans, mpg1$drv))

## 
##  Pearson's Chi-squared test
## 
## data:  table(mpg1$trans, mpg1$drv)
## X-squared = 3.1368, df = 2, p-value = 0.2084

# 3
summary(table(mpg1$trans, mpg1$drv))

## Number of cases in table: 234 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 3.1368, df = 2, p-value = 0.2084

p-value = 0.2084, 귀무가설 채택(유의수준 .05에서 유의미한 차이가 없음)

3) 상관관계분석 - cor.test()

귀무가설: cty와 hwy는 상관관계가 없다.

mpg1 <- read.csv("mpg1.csv")
str(mpg1)

## 'data.frame':    234 obs. of  5 variables:
##  $ manufacturer: chr  "audi" "audi" "audi" "audi" ...
##  $ trans       : chr  "auto" "manual" "manual" "auto" ...
##  $ drv         : chr  "f" "f" "f" "f" ...
##  $ cty         : int  18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int  29 29 31 30 26 26 27 26 25 28 ...

cor.test(mpg1$cty, mpg1$hwy)

## 
##  Pearson's product-moment correlation
## 
## data:  mpg1$cty and mpg1$hwy
## t = 49.585, df = 232, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9433129 0.9657663
## sample estimates:
##       cor 
## 0.9559159

p-value < 2.2e-16, 귀무가설 기각(유의수준 .05에서 상관관계가 있음)

상관계수 r = 0.9559159 (매우 높은 상관관계)

4) 단순회귀분석 - lm()

독립변수와 종속변수가 모두 등간척도 또는 비율척도이어야 함

귀무가설: disp는 mpg에 영향을 주지 않는다.

RA <- lm(data=mtcars, mpg~disp)
summary(RA)

## 
## Call:
## lm(formula = mpg ~ disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8922 -2.2022 -0.9631  1.6272  7.2305 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.599855   1.229720  24.070  < 2e-16 ***
## disp        -0.041215   0.004712  -8.747 9.38e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.251 on 30 degrees of freedom
## Multiple R-squared:  0.7183, Adjusted R-squared:  0.709 
## F-statistic: 76.51 on 1 and 30 DF,  p-value: 9.38e-10

p-value = 9.38e-10, 귀무가설 기각(유의수준 .05에서 회귀모형이 적합함)

절편(Intercept) = 29.599855 (유의수준 .05에서 유의함)

회귀계수(Estimate) = -0.041215 (유의수준 .05에서 유의함)

회귀식: mpg = -0.041215 * disp + 29.599855

수정된 결정계수(Adjusted R-Squared) = .709

5) 다중회귀분석 - lm()

RA <- lm(data=mtcars, mpg~disp+hp+wt)
summary(RA)

## 
## Call:
## lm(formula = mpg ~ disp + hp + wt, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.891 -1.640 -0.172  1.061  5.861 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.105505   2.110815  17.579  < 2e-16 ***
## disp        -0.000937   0.010350  -0.091  0.92851    
## hp          -0.031157   0.011436  -2.724  0.01097 *  
## wt          -3.800891   1.066191  -3.565  0.00133 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8083 
## F-statistic: 44.57 on 3 and 28 DF,  p-value: 8.65e-11

p-value = 8.65e-11, 귀무가설 기각(유의수준 .05에서 회귀모형이 적합함)

절편(Intercept) = 29.599855 (유의수준 .05에서 유의함)

dist의 계수 = -0.000937 (유의수준 .05에서 통계적으로 유의하지 않음)

hp의 계수 = -0.031157 (유의수준 .05에서 유의함)

wt의 계수 = -3.800891 (유의수준 .05에서 유의함)

회귀식: mpg = -0.041215 * disp + 29.599855

수정된 결정계수(Adjusted R-Squared) = .8083

8장 통계 분석

강지원

2022-03-15

1. 분석 방법

1) 기술통계(discriptive Statistics)

2) 추론통계(inferential Statistics)

(1) 평균 차이 검정: 집단별 평균의 차이가 실제로 있는가를 검정하는 것

(2) 교차분석: 범주형 변수로 구성된 집단들의 관련성을 검정하는 것

(3) 상관관계분석: 변수 간의 상관관계(correlation)를 알아보는 것

(4) 회귀분석: 독립변수와 종속변수 간의 인과관계를 분석하는 것

2. 통계 검정

1) 가설(hypothesis)

2) 유의수준(significance level, p값)

3) 척도(scale)

(1) 명목척도: 측정대상의 특성이나 범주를 구분하는 척도

(2) 서열척도: 측정대상의 등급순위를 나타내는 척도

(3) 등간척도: 측정대상을 일정한 간격으로 구분한 척도

(4) 비율척도 : 측정대상을 비율로 나타낼 수 있는 척도

3. 통계 분석 사례

1) 두 집단의 평균 차이 검정 - 독립표본 t검정(t.test())

2) 교차분석 - 카이제곱 검정(chisq.test())

3) 상관관계분석 - cor.test()

4) 단순회귀분석 - lm()

5) 다중회귀분석 - lm()