Total: 40 points
Instructions:
1. Rename this file by replacing “LASTNAME” with your last name. This
can be done via the RStudio menu (File >> Rename).
2. Write your full name in the chunk above beside
author:.
3. Before beginning, it is good practice to create a directory that
contains your R scripts as well as any data you will need. This can be
done in the console directly with the setwd() function or
via the RStudio menu (Session >> Set Working Directory).
4. Write R code to answer the questions below. The code should be
written within the chunks provided for each question. These chunks begin
with three back ticks and the letter r in curly brackets
(```{r}) and end with three back ticks. You can add as much
space as you need within the chunks but do not delete the back ticks or
otherwise modify the chunks in any way or the file will cause errors
when compiled.
5. When you have answered all of the questions, click the
Knit button. This will create an HTML file in your working
directory.
6. Upload the HTML file to Moodle.
Data description:
The stay-abroad.csv data set contains fictional data from
British students studying French at university. A subset of the British
students participated in a year abroad to France during their studies.
Specifically, they could go to France for a year to do either an
internship at a French company, an erasmus exchange at a French
university, or a language assistantship, where they would teach English
to French high school students. In addition to the stay abroad, the
British students could also choose to participate in book club, where
they would read and discuss French books. At the end of their studies,
each student wrote one short argumentative essay and took a vocabulary
test. The variables are described below:
ID: anonymous ID corresponding to the student who wrote
the argumentative essay and took the vocabulary testABROAD: whether the student participated in a stay
abroad in FranceBOOKCLUB: whether the student participated in a
voluntary French book clubPLACEMENT: the type of stay of the student in their
year abroad.LEX.SOPH: a measure of lexical sophisticationECON.VOCAB: the sophistication of “economics-related”
vocabulary in the textVOCAB.TEST: the score on the vocabulary testHint:
While completing the exam, it may be helpful to keep the following
questions in mind:
stay-abroad.csv) into
a dataframe called “sa”. 1 pointsetwd("~/Desktop/Statistics Exam")
library (readxl)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ lubridate 1.9.3 ✔ stringr 1.5.0
## ✔ purrr 1.0.2 ✔ tibble 3.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:purrr':
##
## some
##
## The following object is masked from 'package:dplyr':
##
## recode
sa<- read.csv("stay-abroad.csv", header = TRUE, stringsAsFactors = TRUE)
attach(sa)
PLACEMENT? 1 pointstr(sa)
## 'data.frame': 169 obs. of 7 variables:
## $ ID : Factor w/ 169 levels "S101","S102",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ ABROAD : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
## $ BOOKCLUB : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ PLACEMENT : Factor w/ 3 levels "business","erasmus",..: NA 2 1 2 NA 3 1 NA 3 1 ...
## $ LEX.SOPH : num 0.714 2.488 0.97 1.717 1.123 ...
## $ ECON.VOCAB: num 245 274 215 325 245 ...
## $ VOCAB.TEST: num 87.7 86 80.6 67.6 44.3 ...
Answer: Placement corresponds to a categorical variable
summary(sa)
## ID ABROAD BOOKCLUB PLACEMENT LEX.SOPH
## S101 : 1 no :91 no :141 business :24 Min. :0.7144
## S102 : 1 yes:78 yes: 28 erasmus :27 1st Qu.:1.8628
## S103 : 1 lang_assist:27 Median :2.6117
## S104 : 1 NA's :91 Mean :2.5689
## S105 : 1 3rd Qu.:3.2213
## S106 : 1 Max. :4.8240
## (Other):163
## ECON.VOCAB VOCAB.TEST
## Min. :159.4 Min. :25.61
## 1st Qu.:245.5 1st Qu.:59.07
## Median :275.2 Median :71.33
## Mean :276.4 Mean :70.06
## 3rd Qu.:303.9 3rd Qu.:81.32
## Max. :420.8 Max. :99.54
##
Answer: according to the information collected from the data only 28 students participated in the book club.
higher_scorev <- filter(sa, VOCAB.TEST >= 98)
View(higher_scorev)
print(higher_scorev)
## ID ABROAD BOOKCLUB PLACEMENT LEX.SOPH ECON.VOCAB VOCAB.TEST
## 1 S111 yes no lang_assist 1.643452 303.9004 99.24295
## 2 S122 yes no lang_assist 1.723281 275.1969 99.28448
## 3 S222 no no <NA> 3.214286 253.2977 99.36443
## 4 S230 no no <NA> 3.158168 219.6031 99.54392
Answer: according to the data filtered we can see that only two students (S222 and S230) that obtaianed a score greater than 98 did not participate in the stay abroad.
Answer: H0= Students who did not participate in the stay abroad were as eager to join the book club as their counterparts who participated in the stay abroad H1= Students who did not participate in the stay abroad were not as eager to join the book club as their counterparts who participated in the stay abroad
sa_abroad_book <- table(ABROAD,BOOKCLUB)
View(sa_abroad_book)
plot(sa_abroad_book)
Answer:
Answer: H0 = Students who participated in the book club do not use more sophisticated vocabulary in comparison to those students that did not participate in the book club H1= Students who participated in the book club use more sophisticated vocabulary in comparison to those students that did not participate in the book club
Answer: the alternate hypothesis is one-tailed provided that that a higher level of sophistication is expected from the students that were part of book club.
Answer: the samples collected are independent provided that the population studied are students that had different experiences in their learning, therefore, despite the fact that they may belong to the same university, they are considered independent from each other.
BookVSVoc <- table(BOOKCLUB, LEX.SOPH)
summary(BookVSVoc)
## Number of cases in table: 169
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 169, df = 164, p-value = 0.3783
## Chi-squared approximation may be incorrect
NSL.aov <- aov(BOOKCLUB ~ LEX.SOPH, data = sa)
## Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
## response will be ignored
## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
Answer: