w5homework

1.데이터를 불러온 뒤 데이터 살펴보기 (Hint: head, str, summary 등 함수 사용)

knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

dat<-ToothGrowth
head(dat)

##    len supp dose
## 1  4.2   VC  0.5
## 2 11.5   VC  0.5
## 3  7.3   VC  0.5
## 4  5.8   VC  0.5
## 5  6.4   VC  0.5
## 6 10.0   VC  0.5

str(dat)

## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

summary(dat)

##       len        supp         dose      
##  Min.   : 4.20   OJ:30   Min.   :0.500  
##  1st Qu.:13.07   VC:30   1st Qu.:0.500  
##  Median :19.25           Median :1.000  
##  Mean   :18.81           Mean   :1.167  
##  3rd Qu.:25.27           3rd Qu.:2.000  
##  Max.   :33.90           Max.   :2.000

2.len에 대한 boxplot과 histogram 그려서 눈으로 확인하기

knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(dplyr)

#boxplot
boxplot(ToothGrowth$len, main = "Boxplot of Tooth Length",
        ylab = "Tooth Length", col = "lightblue")

#histogram
hist(ToothGrowth$len, main = "Histogram of Tooth Length",
     xlab = "Tooth Length", col = "lightgreen", breaks=10)

## 3-8기니피그의 이빨 길이가 17 이상인지 유의수준 0.05에서 검정하고자 한다. 아래 물음에 답하시오.

knitr::opts_chunk$set(echo = TRUE)
#Q3. 귀무가설과 대립가설은 무엇인가? H0:mu=17, H1:mu>17, 따라서 귀무가설은 "기니피그의 이빨길이 평군은 17". 대립가설은 "기니피그의 이빨길이 평균은 17보다 크다". 
#Q4. 단측검정을 해야 하는가? 양측검정을 해야 하는가? (이유도 함께): 단측검정(우측)을 해야한다. 대립가설 H1이 "mu>17" 이기 때문에 오직 한방향만 검정한다 (right tailed test)
#Q5. Z-value를 구해야 하는가? 아니면 t-value를 구해야 하는가?: t-value를 구해야 한다. 모집단의 표준편차를 모르기 때문에 샘플에서 계산해야하기 때문에 t-test 를 해야 한다. 
#Q6. 이빨 길이의 표본평균, 표본 표준편차, 모집단 평균, 표본 크기를 각각 sample_mean, sample_sd, pop_mean,sample_size이라는 객체에 담으시오.

data("ToothGrowth")
tooth_len<-ToothGrowth$len
sample_mean<-mean(tooth_len)
sample_sd<-sd(tooth_len)
pop_mean<-17
sample_size<-length(tooth_len)

#Q7. pt() 함수를 사용하여 p-value를 구하시오.
t_value<- (sample_mean-pop_mean)/(sample_sd / sqrt(sample_size))
df<-sample_size-1
p_value<-pt(t_value, df=df, lower.tail = FALSE)
t_value

## [1] 1.836245

p_value

## [1] 0.0356806

#Q8.귀무가설을 기각할 수 있는가? 이빨 길이에 대해 어떠한 결론을 내릴 수 있는가?: p_value는 0.0357 정도로 유의수준인 0.05보다 적다. 따라서 귀무가설을 기각한다 (reject the null hypothesis), 따라서 유의수중 0.05에서 기니피그의 이빨길이의 평균은 17보다 유의하게 크다(conclusion: given that the null hypothesis is true, the probability of average teeth length of guniea pig being as extreme as the observed data is 0.0357)

w5homework

2025-07-02

1.데이터를 불러온 뒤 데이터 살펴보기 (Hint: head, str, summary 등 함수 사용)

2.len에 대한 boxplot과 histogram 그려서 눈으로 확인하기