Total: 40 points

Instructions:
1. Rename this file by replacing “LASTNAME” with your last name. This can be done via the RStudio menu (File >> Rename).
2. Write your full name in the chunk above beside author:.
3. Before beginning, it is good practice to create a directory that contains your R scripts as well as any data you will need. This can be done in the console directly with the setwd() function or via the RStudio menu (Session >> Set Working Directory).
4. Write R code to answer the questions below. The code should be written within the chunks provided for each question. These chunks begin with three back ticks and the letter r in curly brackets (```{r}) and end with three back ticks. You can add as much space as you need within the chunks but do not delete the back ticks or otherwise modify the chunks in any way or the file will cause errors when compiled.
5. When you have answered all of the questions, click the Knit button. This will create an HTML file in your working directory.
6. Upload the HTML file to Moodle.

Data description:
The stay-abroad.csv data set contains fictional data from British students studying French at university. A subset of the British students participated in a year abroad to France during their studies. Specifically, they could go to France for a year to do either an internship at a French company, an erasmus exchange at a French university, or a language assistantship, where they would teach English to French high school students. In addition to the stay abroad, the British students could also choose to participate in book club, where they would read and discuss French books. At the end of their studies, each student wrote one short argumentative essay and took a vocabulary test. The variables are described below:

  • ID: anonymous ID corresponding to the student who wrote the argumentative essay and took the vocabulary test
  • ABROAD: whether the student participated in a stay abroad in France
  • BOOKCLUB: whether the student participated in a voluntary French book club
  • PLACEMENT: the type of stay of the student in their year abroad.
  • LEX.SOPH: a measure of lexical sophistication
  • ECON.VOCAB: the sophistication of “economics-related” vocabulary in the text
  • VOCAB.TEST: the score on the vocabulary test

Hint:
While completing the exam, it may be helpful to keep the following questions in mind:

  • What kinds of variables are involved in your hypothesis (integer, ordinal, categorical etc.) and how many?
  • Are data points in your data related such that you can associate them to each other in a meaningful way?
  • What is the statistic of the dependent variable in the statistical hypothesis?
  • What does the distribution of the data of your test statistic look like?
  • How big are the samples you collected?
  • What assumptions must be met before running a particular statistical test?

1 General Questions (Total: 5 points)

1.1 Load any packages you will need and then load the data set (stay-abroad.csv) into a dataframe called “sa”. 1 point

setwd("~/Desktop/Statistics Exam")
library (readxl)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ lubridate 1.9.3     ✔ stringr   1.5.0
## ✔ purrr     1.0.2     ✔ tibble    3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:purrr':
## 
##     some
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
sa<- read.csv("stay-abroad.csv", header = TRUE, stringsAsFactors = TRUE)
attach(sa)

1.4 Which students received a score greater than 98 on the vocabulary test? 1 point

higher_scorev <- filter(sa, VOCAB.TEST >= 98)
View(higher_scorev)

2 You want to test whether learners who participated in a stay abroad (ABROAD = yes) were also more likely to participate in the book club (BOOKCLUB = yes) as compared to learners who did not participate in the stay abroad (ABROAD = no). (Total: 7 points)

2.1 Formulate hypotheses. 1 point

Answer: H0= Students who did not participate in the stay abroad were as eager to join the book club as their counterparts who participated in the stay abroad H1= Students who did not participate in the stay abroad were not as eager to join the book club as their counterparts who participated in the stay abroad

2.2 Summarize the data numerically and represent the data graphically. 2 points

sa_abroad_book <- table(ABROAD,BOOKCLUB)
View(sa_abroad_book)
plot(sa_abroad_book)

2.3 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 2 points

2.4 Summarize the result(s). 2 points

Answer:

3 A literature review indicates that reading has a positive impact on vocabulary. You therefore want to test whether students who participated in the book club (BOOKCLUB = “yes”) used a more sophisticated vocabulary (LEX.SOPH) as compared to students who did not participate in the book club (BOOKCLUB = “no”). (Total: 10 points)

3.1 Formulate hypotheses. 1 point

Answer: H0 = Students who participated in the book club do not use more sophisticated vocabulary in comparison to those students that did not participate in the book club H1= Students who participated in the book club use more sophisticated vocabulary in comparison to those students that did not participate in the book club

3.2 Is your alternative hypothesis one-tailed or two-tailed? Explain. 1 point

Answer: the alternate hypothesis is one-tailed provided that that a higher level of sophistication is expected from the students that were part of book club.

3.3 Are the samples dependent or independent? Explain. 1 point

Answer: the samples collected are independent provided that the population studied are students that had different experiences in their learning, therefore, despite the fact that they may belong to the same university, they are considered independent from each other.

3.4 Calculate descriptive statistics and represent the data graphically. 2 points

BookVSVoc <- table(BOOKCLUB, LEX.SOPH)
summary(BookVSVoc)
## Number of cases in table: 169 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 169, df = 164, p-value = 0.3783
##  Chi-squared approximation may be incorrect

3.5 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 3 points

NSL.aov <- aov(BOOKCLUB ~ LEX.SOPH, data = sa)
## Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
## response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

3.6 Summarize the result(s). 2 points

Answer: