Course Engagement 3

Practice Section:

In this engagement, we will practice how to perform the paired samples, independent samples t-test, ANOVA, and the Chi Square tests in R.

Scenario 1: Paired Samples t-test

A loco medical anthropologist professor, Dr. Bunny, has been struggling with their own aura points. After watching several TikToks of people admitting how they embarrassingly lost 100 aura pts in the most insane way possible, Dr. Bunny knew that they needed to find a medicine that could instantly restore aura. They hypothesized that this new medicine will greater improve peoples’ aura.

Dr. Bunny invited 50 of their students to try out the new pill called yuppie. After some initial “my professor is giving me lsd or something” ahh type reaction, they decided to “fck it” and give it a go. Dr. Bunny measured their aura pts before the medicine and after. The vector aura_pts are the student’s mean aura before consuming yuppie and aura_pts_yuppie are their aura points after consuming yuppie.

First Load Some Libraries then your data. Remember you need to reload these packages every time you open Posit Cloud!

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggthemes)
library(ggpubr)
library(corrplot)

## corrplot 0.92 loaded

library(car)

## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some

Now load in the dataset!

aura_pts = rnorm(50,0,10)

aura_pts_yuppie = aura_pts + rnorm(50, 3)

Do some exploration: Calculate the sample mean and standard deviation for the students’ aura before and after yippie

tibble(Mean_before = mean(aura_pts), sd_before = sd(aura_pts),
Mean_after = mean(aura_pts_yuppie), sd_after = sd(aura_pts_yuppie))

## # A tibble: 1 × 4
##   Mean_before sd_before Mean_after sd_after
##         <dbl>     <dbl>      <dbl>    <dbl>
## 1       -3.40      8.48     -0.656     8.73

State your null and alternate hypotheses

H₀: There is no difference in the students’ aura points before and after taking yuppie.

H₁: Students’ aura points greatly improved after taking yuppie.

Q1.4. Calculate your critical test values. Use an alpha level of 0.05

# remember we are dealing with the right tail so we actually want a positive t-value. The critical value of a right tail should be 1 - alpha
qt(0.95, 49)

## [1] 1.676551

Plot your critical region by modifying the plotting code given below.

```{r} ggplot(data = data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1))+ geom_area(stat = ‘function’, fun = dnorm, args = list(mean = 0, sd = 1), fill = “red”, alpha = 0.4, xlim = c(4, —))+ # modify the xlim argument HERE YES HERE scale_x_continuous(breaks = seq(-4,4,1))+ labs(x = ““, y =”proportion”)+ ggpubr::theme_pubclean()+ ggtitle(“Red = region for rejection of null”)


#### Q1.7. Calculate the test statistic


``` r
differences = aura_pts_yuppie - aura_pts

t_value = mean(differences) / (sd(differences) / sqrt(length(differences)))
t_value # ~ 110.01

## [1] 20.38598

Plot the test statistic against your critical region Then, add a vertical line representing your test statistic.

```{r} ggplot(data = data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm, n = 101, args = list(mean = 0, sd = 1))+ geom_area(stat = ‘function’, fun = dnorm, args = list(mean = 0, sd = 1), fill = “red”, alpha = 0.4, xlim = c(4, —))+ # add the critical value here! geom_vline(xintercept = —) + # add the TEST value here! scale_x_continuous(breaks = seq(-4,4,1))+ labs(x = ““, y =”proportion”) + ggpubr::theme_pubclean()+ ggtitle(“Red = region for rejection of null”)


#### State your conclusions on the null hypothesis:

Well, Idk how it was possible, but yuppie seemed to work. What Dr. Bunny didn't know is they literally created the chemical compound for caffeine soo...that's what it was.

#### Scenario 2: Independent Samples t-test

The data stored in the `law_crime.csv` file contains information on violent crime rate (incidents per 100,000 people), murder rate, robery rate, incarceration rate, and some demographic information for each of the 50 US states. The observations were made in 1999 (quite a bit ago), and they also include the variable `law`, which is coded `yes` if the state had a shall-issue law in effect that year, and `no` if it did not. Shall-issue laws are laws that give residents of a state the right to carry guns - essentially, these force the state to give residents access to permits as long as the applicant meets the basic requirements set out by the law.

Some people argue that shall-issue laws affect crime because residents can stay strapped or get capped. We are interested in testing whether this is true, using violent crime rates given in `guns$violent`.

#### Read in the dataset as well as the necessary packages


``` r
library(tidyverse)
library(ggpubr)
library(ggthemes)
library(car)
library(corrplot)
guns <- read.csv("law_crime.csv")

There are two separate samples for this test: one from states without these specific gun laws and states with these specific gun laws.

State your null and alternate hypotheses

$H_0$: There is no difference in violent crime rates between states without shall-issue gun laws than states with these gun laws.

$H_1$: The violent crime rates of states with shall-issue gun laws is significantly lower than states without these gun laws.

Test the assumptions:

Independent observations

This is difficult to know just from the dataset alone. However, this dataset does not repeat states (the unit of observation), so we know that each observation is independent of each other.

Normality of the data

Try replacing --- with each group. remember we are interested in the variables law and violent . I will do the first one for you then you do the next one, okies?

qqPlot(x = guns$violent[guns$law == "yes"])

## [1]  4 20

qqPlot(x = guns$violent[guns$law == "no"])

## [1] 17 10

As you can see, the observations are within the error range of the normal distribution. This tells us that both variables are normally distributed. Let’s use another test just in case. I will do the first one and you will do the next by replacing the ---.

shapiro.test(guns$violent[guns$law == "no"])

## 
##  Shapiro-Wilk normality test
## 
## data:  guns$violent[guns$law == "no"]
## W = 0.92884, p-value = 0.1305

shapiro.test(guns$violent[guns$law == "yes"])

## 
##  Shapiro-Wilk normality test
## 
## data:  guns$violent[guns$law == "yes"]
## W = 0.96435, p-value = 0.4183

So we met the requirements.

Equal Variances Test

leveneTest(guns$violent ~ guns$law)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.2908 0.5922
##       48

With a non-significant p-value at the 0.05 alpha, we can be assured that both groups have equal variance.

Calculate the test using the appropriate version of the test.

test = t.test(guns$violent[guns$law == "yes"], 
       guns$violent[guns$law == "no"], 
       alternative = "less")

test

## 
##  Welch Two Sample t-test
## 
## data:  guns$violent[guns$law == "yes"] and guns$violent[guns$law == "no"]
## t = -1.1308, df = 46.719, p-value = 0.132
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##      -Inf 31.33241
## sample estimates:
## mean of x mean of y 
##  407.9207  472.6476

State your conclusions on the null hypothesis:

With a p-value of 0.868, we fail to reject the null hypothesis.

Scenario 3: ANOVA test

THEY ARE BACK! Yes, you already know which data to load in :D

library(palmerpenguins)
data(penguins)

Let’s say we are interested in understanding whether flipper length differs between penguin species. Since there are three penguin species, we cannot simply do a t-test so we need to perform an ANOVA!

First, we should visualize this somehow. Perhaps a boxplot. I know you know how to do this already so I will let you handle this.

library(ggplot2)
ggplot(penguins, aes(x = species, y = flipper_length_mm)) +
  geom_boxplot() +
  labs(title = "Flipper Length by Penguin Species")

## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

So yeah there are some differences. But are they significant?

Let’s make our hypotheses

$H_0$: There is no difference in the means of flipper lengths between penguin species.

$H_1$: There is A difference in the means of flipper lengths between penguin species.

Okay, we need to be sure we are meeting the basic assumptions of the test

Independence of observations

Since we dealt with these penguins for a long time, we know that each penguin is only recorded once and can only have 1 species name so we are goooood.

Normality

Remember that shapiro.wilks test is only good for up to 50 observations, then it tends to be a little bit biased (only a tiny bit). It would probably be okay for our purposes, but maybe in this case we should do qq (quantile) plots instead.

I also believe that you know how to do this too. Let’s see if you can make three qqplots for each penguin species.

library(car)
qqPlot(penguins$flipper_length_mm[penguins$species == "Adelie"])

## [1] 130  96

qqPlot(penguins$flipper_length_mm[penguins$species == "Chinstrap"])

## [1]  7 48

qqPlot(penguins$flipper_length_mm[penguins$species == "Gentoo"])

## [1] 99 64

Equal variances?

Well let’s plug in that levene test and find out! How about I help you here. Just replace my --- part.

{r} leveneTest(flipper_length_mm ~ species, data = penguins)

If you look at the p-value you will notice that we are in the clear! Let the ANOVA commence! Just don’t forget to replace my --- with something that is appropriate.

anova = aov(flipper_length_mm ~ species, data = penguins)
summary(anova)

##              Df Sum Sq Mean Sq F value Pr(>F)    
## species       2  52473   26237   594.8 <2e-16 ***
## Residuals   339  14953      44                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness

Since we met all of the requirements and can clearly see that 0.0000000000000002 is much less than 0.025, we should reject that null hypothesis.

But what if we wanted to know which groups are significant in this case? Let’s use the Tukey test!

TukeyHSD(anova)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = flipper_length_mm ~ species, data = penguins)
## 
## $species
##                       diff       lwr       upr p adj
## Chinstrap-Adelie  5.869887  3.586583  8.153191     0
## Gentoo-Adelie    27.233349 25.334376 29.132323     0
## Gentoo-Chinstrap 21.363462 19.000841 23.726084     0

Tukey has spoken. All of them are!

Scenario 4: Chi Square test

Anthropologists interested in communication skills between couples in managing household chores have surveyed 1744 couples to learn about who manages certain tasks in the household. The couples were asked about a specific task, and they could answer that in their household, the task is typically done by one partner, typically done by the other, typically done jointly, or that it was done sometimes by one and sometimes by the other (alternating).

The data is available online – I’ve written the code here for you:

housetasks <- read.csv("housetasks.csv")

Visualize the structure or the first few rows of the data frame.

Its important to see what a frequency (contingency) table looks like. These are counts for each column and row, not a data table.

head(housetasks)

##   Partner1 Alternating Partner2 Jointly
## 1      156          14        2       4
## 2      124          20        5       4
## 3       77          11        7      13
## 4       82          36       15       7
## 5       53          11        1      57
## 6       32          24        4      53

We are interested in whether the attribution of tasks to a person varies across the various housetasks. We can use a Chi square test for this. What is the null and alternate hypothesis?

$H_0$: The attribution of tasks does not differ by group.

$H_1$: The attribution of tasks does differ by groups.

Run the Chi squared test, and save the object as chi_test

```{r}chi_test <- chisq.test(housetasks) chi_test — # print it out by replacing the — and writing chi_test


Now, let's take a look at some posthoc analysis to see which groups and tasks were relevant to the chi square test. You should calculate the contribution scores for each group by task. Do that by replacing the `---` with something appropriate.

```{r}contribut_chisq = 100 * chi_test$residuals ^ 2 / chi_test$statistic

corrplot(contribut_chisq, is.corr = FALSE)

Check the assumption that none of the expected counts should be less than 5:

{r}chi_test$expected[chi_test$expected < 6]

Q1.7. Given your test results and your assumption check, conclude on your null hypothesis:

We reject the null hypothesis.

Graded Section

In the following section, you will be presented with three scenarios to test your amazing statistical skills that you have been practicing! Each of them will be focusing on a different type of statistical test we have discussed in this unit.

Got More Bone?

Have you ever wondered what happens to your bones as you habitually work out or perform the same daily tasks? Well, one prominent hypothesis is that your bone reacts to the stress you place on it and,overtime, will produce more bone to reinforce the regions you use more frequently. The actual process is a bit more nuanced and complex depending on the bone we are looking at, the this is the general idea.

Biological anthropologists were interested to know if living an active lifestyle increases your bone density. Therefore, they collected data on two groups of women: 35 who were not very active and 35 who were very active. They calculated the bone density for each person and now want to see if the group means are different.

They hypothesize that women who are more active will have more bone density, on average, than women who are more inactive.

Question1 What type of statistical test should these anthropologists use to help answer their research question?

Set up the hypotheses

Question 2 Based on the information you were presented with, please select on Brightspace which hypotheses make the most sense. Here, fill in the hypotheses.

$H_0$: There is no difference in bone density between active and inactive women.

$H_1$: Active women have higher bone density than inactive women.

Select an alpha value for significance

For this example, let’s choose a 0.05 alpha.

Load the proper libraries and the data

library(tidyverse)
library(ggthemes)
library(ggpubr)
library(BSDA)

## Loading required package: lattice

## 
## Attaching package: 'BSDA'

## The following objects are masked from 'package:carData':
## 
##     Vocab, Wool

## The following object is masked from 'package:datasets':
## 
##     Orange

data(Bones)
# Make sure to run this next code. I didn't like the values so I changed them.
Bones = Bones %>% mutate(group = ifelse(group == "active", "Active", "Inactive"))

Visualize the data somehow

Question3 What type of plot will you use to visualize the distribution of data by group. Select the closest option on Brightspace.

Make your visualization in the code space below. Remember that the facet_wrap function can make two plots and split them up by group.

ggplot(Bones, aes(x = group, y = density)) +
  geom_boxplot(fill = "skyblue") +
  labs(title = "Bone Density by Activity Level", x = "Group", y = "Bone Density") +
  theme_minimal()

Question4 You should accompany this plot by summary statistics. What summary statistics would be relevant based on the plot you chose? Select the best statistic pairs on Brightspace.

In the code below, make a table using the summarise function from the dplyr package that outputs the relevant summary statistics.

 Bones %>%
  group_by(group) %>%
  summarise(
    mean_density = mean(density),
    sd_density = sd(density),
    count = n()
  )

## # A tibble: 2 × 4
##   group    mean_density sd_density count
##   <chr>           <dbl>      <dbl> <int>
## 1 Active           212.       18.7    35
## 2 Inactive         208.       21.4    35

Question5 Based on this initial summary statistics and data visualization, what would you hypothesize about bone density by active and inactive women? Select the most appropriate option in Brightspace.

Test the Assumptions! Independence of Observations

Question6 Does this data, as it has been described to you, meet this requirement? Input TRUE if your answer is yes, and input FALSE if your answer is no on Brightspace.

Test the Assumptions! Normality

In the following code block, conduct your very own normality test!

library(car)
qqPlot(Bones$density[Bones$group == "Active"])

## [1]  5 20

qqPlot(Bones$density[Bones$group == "Inactive"])

## [1] 21  6

Question7 Did your data meet the assumptions for normality? Input TRUE if your answer is yes, and input FALSE if your answer is no on Brightspace.

Test the Assumptions! Equal Variances

In the following code chunk, conduct your very own homogeneity of variances test!

leveneTest(density ~ group, data = Bones)

## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.1954 0.6599
##       68

Question8 Did your data meet the assumptions for equal variances? Input TRUE if your answer is yes, and input FALSE if your answer is no on Brightspace.

Conduct the test!

Question9 Based on all of this information, what test will you conduct if you were to meet all of the assumptions?

In the following code chunk, conduct the amazing t.test function. Don’t forget the argument var.equal = and indicate if that is true or false. Additionally, make sure to change the argument alternative = to one of these options

“lesser” - lower tail test

“greater” - upper tail test

“two-sided” - non-directional test

t.test(density ~ group, data = Bones, alternative = "greater", var.equal = TRUE )

## 
##  Two Sample t-test
## 
## data:  density by group
## t = 0.83732, df = 68, p-value = 0.2027
## alternative hypothesis: true difference in means between group Active and group Inactive is greater than 0
## 95 percent confidence interval:
##  -3.99458      Inf
## sample estimates:
##   mean in group Active mean in group Inactive 
##               211.6286               207.6000

Question10 How would you conclude based on the results of this statistical test. Select the best option on Brightspace.

Are the Neolithic Pottery Types the Same or Different? Example: Neolithic Pottery Shape from Denmark

$Photo of a Neolithic ceramic vessel I saw at a Museum in Denmark this September.$

Photo of a Neolithic ceramic vessel I saw at a Museum in Denmark this September.

Archaeologists could visually assess that beakers, bowls, and flasks were all different than each other (on average) at a Neolithic site in Denmark. However, they knew they needed to test this hypothesis first.

To assess this, they took several 2D coordinate points to draw the shape outline of these ceramic vessels.

To conduct their test, they took the center point of all 2D points called the geometric mean which is more of a mean measurement of all of the variables within a dataset by summing all of the values, taking the log of that, and then taking the exponent of that value.

Are the Neolithic Pottery Types the Same or Different?

$H_0$: The geometric mean of the ceramic vessel types have the same mean.

$H_1$: The geometric mean of the ceramic vessel types do not have the same mean.

Load in the necessary packages and the Data

library(archdata)
library(tidyverse)
library(ggpubr)
library(ggthemes)
library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

data(TRBPottery)

We will need to modify this dataset so that it has the proper variables

geometric_mean <- function(x, zero_replace = 1e-10) { 
  x[x == 0] <- zero_replace 
  # Replace zero values with a small positive number 
  return(exp(mean(log(x)))) 
} 

Pottery = data.frame(Type = TRBPottery$Form, 
                     GM = apply(TRBPottery[,-1],1, geometric_mean))

Let’s Outline the Relationships Between Groups

Does our data meet the necessary assumptions?

Test the Assumptions! Independence of Observations

Question11 Does this data, as it has been described to you, meet this requirement? Input TRUE if your answer is yes, and input FALSE if your answer is no on Brightspace.

Test the Assumptions! Normality

In the following code block, conduct both a shapiro.wilks test and a qqPlot test. You will see why….

Question12 Did your data meet the assumptions for normality? Input TRUE if your answer is yes, and input FALSE if your answer is no on Brightspace.

Test the Assumptions! Equal Variances

In the following code chunk, conduct your very own homogeneity of variances test!

Pottery %>% 
  levene_test(GM ~ Type)

## # A tibble: 1 × 4
##     df1   df2 statistic     p
##   <int> <int>     <dbl> <dbl>
## 1     2   115      1.37 0.258

Question13 Did your data meet the assumptions for equal variances? Input TRUE if your answer is yes, and input FALSE if your answer is no on Brightspace.

Conduct the test!

Question14 Based on all of this information, what test will you conduct? Select the best option on Brightspace.

Note: Despite one of the groups having a p-value of 0.04846, the qqPlot shows that it is due to 1 single individual. If this happened while doing a data analysis, you might select to remove the individual. However, another option is to do two tests and compare the results: a regular one and one that can handle non-parametric (not normally distributed) groups. If both tests show the same result, then you have good evidence to show that your conclusions were not affected by the violation of this assumption.

We are conducting two ANOVA tests:

First, let’s conduct our typical anova test and view the results.

# Your first test
anova_result <- aov(GM ~ Type, data = Pottery)
summary(anova_result)

##              Df Sum Sq Mean Sq F value Pr(>F)
## Type          2   24.4   12.18   1.235  0.295
## Residuals   115 1133.9    9.86

Second, let’s compare that to an ANOVA that can handle non-normal data: the Kruskal-Wallis test.

# your second test
kruskal.test(GM ~ Type, data = Pottery)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  GM by Type
## Kruskal-Wallis chi-squared = 2.7143, df = 2, p-value = 0.2574

Question15 How would you describe the results of both of these tests? Select the best option on Brightspace.

What Do You Eat?

Primatologists Lim and colleagues (2021) published a large study on primate ecology and diet and shared their datasets, one of them being the primate_diet dataset. One of them gives us information on primate species such as apes (gorillas, us), monkeys (macaques), and much more. Additionally, they provided data on what types of plants they ate.

Primatologists want to focus in on one family, the ape family, and test whether or not different apes species have the same frequency of plant types that they consume (i.e., fruits, leaves, stems, nuts). They hypothesize that different ape species do have different frequencies of consuming various plant types.

Their Hypotheses

$H_0$: The frequencies of the parts of plants consumed is not different across ape species.

$H_1$: The frequencies of the parts of plants consumed is not different across ape species.

To answer this question, we will use a subset of their dataset called the primate_diet dataset. Let’s load it in as well as the packages we will use:

library(tidyverse)
library(ggthemes)
library(ggpubr)
library(corrplot)


primate_diet = read_csv("primate_diet.csv")

## Rows: 8981 Columns: 23
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (19): SpecName, PrimateSpecies, Food, PlantFamily, PlantGenus, PlantSpec...
## dbl  (4): RecordID, Study_ID, Percentage_Paper, Percentages_Recalculated
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s take a look at our dataset. I believe you already know how to examine a dataset :D

As you can see, there are a lot of columns that aren’t of much use to us for this analysis. Let’s trim this dataset. We need to know our criteria first:

only ape species: Pan_troglodytes (chimpanzees), Gorilla_beringei (Gorillas), Pongo_pygmaeus (Orangutans),
only relevant variables: species name PrimateSpecies and plant type PartPlant. There is a value called “Unknown” in PartPlant. We really shouldn’t include something that is…unknown because its……unknown.

Filter the dataset so that only the above named ape species are included in a new dataset called ape_diet only includes the relevant variables for this analysis.

```{r} ape_diet <- primate_diet %>% filter(
PrimateSpecies %in% c(“Pan_troglodytes”, “Gorilla_beringei”, “Pongo_pygmaeus”), PartPlant != “Unknown” ) %>% select(PrimateSpecies, PartPlant)

Did you know we could visualize the frequencies of categorical data with a Bar Plot? We only need to change the y-axis to represent frequency counts. Here is how we can do it:

{r}ggplot(ape_diet, aes(x = PrimateSpecies, fill = PartPlant)) + geom_bar(position = "dodge") + labs(x = "Ape Species", y = "Frequency") + ggpubr::theme_pubclean()

Question16 Given that the y-axis of this plot indicates the frequency, what summary statistic does the tallest bar for each species represent? Select the best answer in Brightspace.

Now, transform the trimmed ape species dataset into a frequency table. Call it ape_freq. Make this in the following code chunk:

{r}apape_freq <- ape_diet %>% count(PrimateSpecies, PartPlant) %>% pivot_wider(names_from = PartPlant, values_from = n, values_fill = 0)

Question17 How many columns are in your frequency table? Input your answer on Brightspace.

Question18 Based on the variable types in our hypothesis test, what kind of test should we carry out? Select the best option on Brightspace.

Conduct the test

Conduct the necessary test in the following code block:

```{r}test <- chisq.test(table(ape_diet$PrimateSpecies, ape_diet$PartPlant))

Note: did you notice that when you ran the test that it told you that the approximation may be incorrect? Can you figure out why that might be?

Did your test meet both assumptions?

I will give you the first assumption and inform you that it did meet this assumption of independence of observations.

Another major assumption is that each cell in the expected frequencies are greater than 5. Check this assumption in the following code chunk:

Question19 If there were any, which assumptions were violated in this test? Select all that apply on Brightspace.

Conduct the PostHoc Test of Contribution Scores

Let’s examine which variables in PlantPart was most important in the Chi Square statistic. Replace the --- with the appropriate code.

{r}contribution = 100 * test$residuals^2/test$statistic corrplot::corrplot(contribution, is.cor = FALSE)

Question20 Which variable value had the highest contribution to the Chi Square Statistic? Select the most appropriate option on Brightspace.

Extra Credit Knit your Course Engagement and Submit it to Brightspace! It will be saved as a .html this time because CE 2 had a weird issue with PDF.

Course Engagement 3

Elodie Agath

2024-07-20

Practice Section:

Scenario 1: Paired Samples t-test

First Load Some Libraries then your data. Remember you need to reload these packages every time you open Posit Cloud!

Now load in the dataset!

Do some exploration: Calculate the sample mean and standard deviation for the students’ aura before and after yippie

State your null and alternate hypotheses

Q1.4. Calculate your critical test values. Use an alpha level of 0.05

Plot your critical region by modifying the plotting code given below.

Plot the test statistic against your critical region Then, add a vertical line representing your test statistic.

State your null and alternate hypotheses

Test the assumptions:

Calculate the test using the appropriate version of the test.

State your conclusions on the null hypothesis:

Scenario 3: ANOVA test

Scenario 4: Chi Square test

Visualize the structure or the first few rows of the data frame.

We are interested in whether the attribution of tasks to a person varies across the various housetasks. We can use a Chi square test for this. What is the null and alternate hypothesis?

Run the Chi squared test, and save the object as chi_test

Check the assumption that none of the expected counts should be less than 5:

Q1.7. Given your test results and your assumption check, conclude on your null hypothesis:

Graded Section

Got More Bone?

Set up the hypotheses

Select an alpha value for significance

Load the proper libraries and the data

Visualize the data somehow

Test the Assumptions! Independence of Observations

Test the Assumptions! Normality

Test the Assumptions! Equal Variances

Conduct the test!

Are the Neolithic Pottery Types the Same or Different? Example: Neolithic Pottery Shape from Denmark

Load in the necessary packages and the Data

We will need to modify this dataset so that it has the proper variables

Does our data meet the necessary assumptions?

Test the Assumptions! Independence of Observations

Test the Assumptions! Normality

Test the Assumptions! Equal Variances

Conduct the test!

We are conducting two ANOVA tests:

What Do You Eat?

Their Hypotheses

Conduct the test

Did your test meet both assumptions?

Conduct the PostHoc Test of Contribution Scores

Congrats!! You completed yet ANOTHER CE! :D