lab6answers

Use ANOVA to compare whether snowfall differs among the four cities in the dataset. State your hypothesis, test statistics, p-value and a sentence interpreting your result. Can use built in function or manual method - either are OK. (out of 5)

So students need to use the csv file I already gave them. These are the steps for that (according to the lab)

library(stringr)

x <- read.csv("https://raw.githubusercontent.com/am2222/GESC258/master/lab6/snow.csv",fileEncoding="UTF-8-BOM")
#now a little stringr magic:
x$snowcm <- str_trim(str_sub(str_trim(x$Snow), start = 1, end = str_locate(str_trim(x$Snow), " ")[,1]))
x$snowcm <- as.numeric(x$snowcm)

Qustion asks if the snowfall differs among all 4 cities. They have to use ANOVA since it is comparing more than two cities. So this is the hypothesis of ANOVA test “Toronto” “Hamilton” “Montreal” “Ottawa” \[ H_0 : \mu_{Toronto} = \mu_{Hamilton} = \mu_{Montreal}= \mu_{Ottawa}\\ H_a : \mu_{Toronto} \ne \mu_{Hamilton} \ne \mu_{Montreal} \ne \mu_{Ottawa} \] So considering level of significance 0.05. we run anova test, if p-value is less than 0.05 we reject null. it means that based on our sample snowfall in at least one of the four cities is different than others.

If p-value is more than 0.05 we fail to reject null. so it means that snowfall in all the cities are not statistically different from eachother

anova(lm(snowcm~City, data=x))

## Analysis of Variance Table
## 
## Response: snowcm
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## City       3 218232   72744  23.776 1.364e-11 ***
## Residuals 96 293721    3060                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-value is equal to 1.364e-11 which is very close to 0. It is also less than 0.05 so we reject null and as a reasult one of the cities should have a statically different mean value of snowfall

Test statistic is : 23.776 Marking breakdown - State hypothesis -> 1 mark

test statistics -> 1 mark
p-value -> 1 mark
a sentence interpreting your result -> 1 mark
code -> 1 mark

Use ANOVA to compare whether snowfall differs among (any) three of the four cities in the dataset. State your hypothesis, test statistics, p-value and a sentence interpreting your result. Can use built in function or manual method - either are OK. (out of 5)

This question the same. But this time they need to repeat above test for any combination of 3 cities. So this is first one: “Toronto” “Hamilton” “Montreal” \[ H_0 : \mu_{Toronto} = \mu_{Hamilton} = \mu_{Montreal}\\ H_a : \mu_{Toronto} \ne \mu_{Hamilton} \ne \mu_{Montreal} \] They need to filter data first

thm <- x[x$City!="Ottawa",] #only keep records that does not have Ottawa in the City column

now thm only includes data for “Toronto” “Hamilton” “Montreal”

anova(lm(snowcm~City, data=thm))

## Analysis of Variance Table
## 
## Response: snowcm
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## City       2 121995   60997  22.958 1.937e-08 ***
## Residuals 72 191294    2657                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

They need to interpret the results same as question one. if p-value<0.05 reject null otherwise fail to reject null

“Toronto” “Hamilton” “Ottawa” \[ H_0 : \mu_{Toronto} = \mu_{Hamilton} = \mu_{Ottawa}\\ H_a : \mu_{Toronto} \ne \mu_{Hamilton} \ne \mu_{Ottawa} \]

tho <- x[x$City!="Montreal",] 
anova(lm(snowcm~City, data=tho))

## Analysis of Variance Table
## 
## Response: snowcm
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## City       2 173355   86678   31.17 1.773e-10 ***
## Residuals 72 200220    2781                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

“Toronto” “Montreal” “Ottawa” \[ H_0 : \mu_{Toronto} = \mu_{Montreal}= \mu_{Ottawa}\\ H_a : \mu_{Toronto} \ne \mu_{Montreal} \ne \mu_{Ottawa} \]

tmo <- x[x$City!="Hamilton",] 
anova(lm(snowcm~City, data=tmo))

## Analysis of Variance Table
## 
## Response: snowcm
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## City       2 186331   93166   27.98 1.021e-09 ***
## Residuals 72 239737    3330                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

“Hamilton” “Montreal” “Ottawa” $$ H_0 : {Hamilton} = {Montreal}= {Ottawa}\ H_a : {Hamilton} {Montreal} {Ottawa}

hmo <- x[x$City!="Toronto",] 
anova(lm(snowcm~City, data=hmo))

## Analysis of Variance Table
## 
## Response: snowcm
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## City       2 100271   50136  14.444 5.317e-06 ***
## Residuals 72 249911    3471                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Marking breakdown - State hypothesis -> 1 mark in total (0.25 per each time they run above code)

test statistics -> 1 mark in total (0.25 per each time they run above code)
p-value -> 1 mark in total (0.25 per each time they run above code)
a sentence interpreting your result -> 1 mark in total (0.25 per each time they run above code)
code -> 1 mark

Conduct some analysis of your choosing on the snow dataset to explore a question you find interesting. You can incorporate external data if you want. You can use any approach you want including but not limited to hypothesis testing (t-test), graphs, confidence intervals etc. Write up a paragraph interepreting your analysis - discuss any limitations (out of 10)

This question can have vary answers. you can ask them include graphs, run t.tests. some example t.tests they can do

\[ H_0 : \mu_{Hamilton} = \mu_{Montreal} \\ H_a : \mu_{Hamilton} \ne \mu_{Montreal} \]

\[ H_0 : \mu_{Hamilton} \geq \mu_{Montreal}\\ H_a : \mu_{Hamilton} \lt \mu_{Montreal} \]

\[ H_0 : \mu_{Toronto} \geq 122\\ H_a : \mu_{Toronto} \lt 200 \]

\[ H_0 : \mu_{Hamilton} = 122\\ H_a : \mu_{Hamilton} \ne 200 \] 4. One of the foundational assumptions of classical statistical inference is that observations are indepenent. Can you think of any potential violations to this assumption with the snow dataset? (out of 5)

It is a bit of triky question, but they can for example discuss that snowfall is result of a weather front. so as a result they snowfall in the cities can be dependent to eachother if it is result of the same weather front.

Bonus question: Explain in detail (i.e., what each function is doing) what is happening in this step below

#now a little stringr magic:
x$snowcm <- str_trim(str_sub(str_trim(x$Snow), start = 1, end = str_locate(str_trim(x$Snow), " ")[,1]))

lab6answers

2022-03-23