GEOG 6000: Advanced Geographic Data Analysis
Date: 9/16/21

Exercise 1

The dataset on body temperatures includes a column coding male (1) and female (2) subjects. Use a t-test to look for differences in temperature between males and females. Hint: you can do this by either splitting the temperature values into two new vectors or you can use the formula syntax (see the ANOVA example)

State the null and alternate hypotheses

Null H0: The true difference in means between male (1) and female (2) is 0.

Alternative Ha: The true difference in means is not equal to 0.

Carry out the t-test and report the t-statistic and the p-value obtained

Set working directory, read normtemp.csv into a data frame, create fsex column by using the factor() function to categorize the sex column:

setwd("N:/Projects/geog6000/lab02")
normtemp = read.csv("../datafiles/normtemp.csv")
normtemp$fsex = factor(normtemp$sex,
                       levels = c(1,2),
                       labels = c("male", "female"))

Create a t-test using the formula syntax previously used in the ANOVA test from Lab 02:

t.test(normtemp$temp ~ normtemp$fsex, data = normtemp)
## 
##  Welch Two Sample t-test
## 
## data:  normtemp$temp by normtemp$fsex
## t = -2.2854, df = 127.51, p-value = 0.02394
## alternative hypothesis: true difference in means between group male and group female is not equal to 0
## 95 percent confidence interval:
##  -0.53964856 -0.03881298
## sample estimates:
##   mean in group male mean in group female 
##             98.10462             98.39385

The t-statistic from this test is: -2.2854345
The p-value from this test is: 0.0239383

On the basis of this state whether or not you have evidence for a difference in body temperature between men and women

Because the p-value for our t-test is relatively low (< .05), we can reject the null hypothesis and conclude there is a body temperature difference between men and women. However, bear in mind there is still a 2% chance we could arrive at these results due to sampling error within the same population.

Exercise 2

The file gapC.csv contains socio-economic information for 173 countries from the GapMinder dataset. Each country has been assigned to one of seven geographical regions, roughly corresponding to the continents. Carry out a one-way analysis of variance of the life expectancy variable to look for differences across the different regions.

State the null and alternate hypotheses

Null H0: The true difference in means across any two continents is 0. Alternative Ha: the true difference in means across any two continents is not equal to 0.

Make a boxplot of life expectancy per continent

gapc = read.csv("../datafiles/gapC.csv")
LE = gapc$lifeexpectancy
country = gapc$country
continent = gapc$continent

boxplot(LE ~ continent,
         data = gapc,
         xlab = "Continent",
         ylab = "Life Expectancy")

Carry out the ANOVA and give the F-statistic and the p-value obtained

aov(LE ~ continent, data = gapc)
## Call:
##    aov(formula = LE ~ continent, data = gapc)
## 
## Terms:
##                 continent Residuals
## Sum of Squares   9757.236  7141.470
## Deg. of Freedom         6       165
## 
## Residual standard error: 6.578878
## Estimated effects may be unbalanced
## 1 observation deleted due to missingness
summary(aov(LE ~ continent, data = gapc))
##              Df Sum Sq Mean Sq F value Pr(>F)    
## continent     6   9757  1626.2   37.57 <2e-16 ***
## Residuals   165   7141    43.3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 1 observation deleted due to missingness

From this test, we have found the F-statistic to be 37.57 and the p-value to be <2e-16. #### On the basis of this state whether or not life expectancy varies across continents

Life expectancy almost certainly varies across continents.