Total: 40 points

Instructions:
1. Rename this file by replacing “LASTNAME” with your last name. This can be done via the RStudio menu (File >> Rename).
2. Write your full name in the chunk above beside author:.
3. Before beginning, it is good practice to create a directory that contains your R scripts as well as any data you will need. This can be done in the console directly with the setwd() function or via the RStudio menu (Session >> Set Working Directory).
4. Write R code to answer the questions below. The code should be written within the chunks provided for each question. These chunks begin with three back ticks and the letter r in curly brackets (```{r}) and end with three back ticks. You can add as much space as you need within the chunks but do not delete the back ticks or otherwise modify the chunks in any way or the file will cause errors when compiled.
5. When you have answered all of the questions, click the Knit button. This will create an HTML file in your working directory.
6. Upload the HTML file to Moodle.

Data description:
The stay-abroad.csv data set contains fictional data from British students studying French at university. A subset of the British students participated in a year abroad to France during their studies. Specifically, they could go to France for a year to do either an internship at a French company, an erasmus exchange at a French university, or a language assistantship, where they would teach English to French high school students. In addition to the stay abroad, the British students could also choose to participate in book club, where they would read and discuss French books. At the end of their studies, each student wrote one short argumentative essay and took a vocabulary test. The variables are described below:

ID: anonymous ID corresponding to the student who wrote the argumentative essay and took the vocabulary test
ABROAD: whether the student participated in a stay abroad in France
BOOKCLUB: whether the student participated in a voluntary French book club
PLACEMENT: the type of stay of the student in their year abroad.
LEX.SOPH: a measure of lexical sophistication
ECON.VOCAB: the sophistication of “economics-related” vocabulary in the text
VOCAB.TEST: the score on the vocabulary test

Hint:
While completing the exam, it may be helpful to keep the following questions in mind:

What kinds of variables are involved in your hypothesis (integer, ordinal, categorical etc.) and how many?
Are data points in your data related such that you can associate them to each other in a meaningful way?
What is the statistic of the dependent variable in the statistical hypothesis?
What does the distribution of the data of your test statistic look like?
How big are the samples you collected?
What assumptions must be met before running a particular statistical test?

1 General Questions (Total: 5 points)

1.1 Load any packages you will need and then load the data set (`stay-abroad.csv`) into a dataframe called “sa”. 1 point

setwd("~/Desktop/Statistics Exam")
library (readxl)
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ lubridate 1.9.3     ✔ stringr   1.5.0
## ✔ purrr     1.0.2     ✔ tibble    3.2.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(car)

## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:purrr':
## 
##     some
## 
## The following object is masked from 'package:dplyr':
## 
##     recode

sa<- read.csv("stay-abroad.csv", header = TRUE, stringsAsFactors = TRUE)
attach(sa)

1.2 Print an overview of the data frame to check if all variables have been imported properly. What type of variable is `PLACEMENT`? 1 point

str(sa)

## 'data.frame':    169 obs. of  7 variables:
##  $ ID        : Factor w/ 169 levels "S101","S102",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ ABROAD    : Factor w/ 2 levels "no","yes": 1 2 2 2 1 2 2 1 2 2 ...
##  $ BOOKCLUB  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ PLACEMENT : Factor w/ 3 levels "business","erasmus",..: NA 2 1 2 NA 3 1 NA 3 1 ...
##  $ LEX.SOPH  : num  0.714 2.488 0.97 1.717 1.123 ...
##  $ ECON.VOCAB: num  245 274 215 325 245 ...
##  $ VOCAB.TEST: num  87.7 86 80.6 67.6 44.3 ...

Answer: Placement corresponds to a categorical variable

1.3 Print a summary of the data frame. How many students participated in the book club? 1 point

summary(sa)

##        ID      ABROAD   BOOKCLUB        PLACEMENT     LEX.SOPH     
##  S101   :  1   no :91   no :141   business   :24   Min.   :0.7144  
##  S102   :  1   yes:78   yes: 28   erasmus    :27   1st Qu.:1.8628  
##  S103   :  1                      lang_assist:27   Median :2.6117  
##  S104   :  1                      NA's       :91   Mean   :2.5689  
##  S105   :  1                                       3rd Qu.:3.2213  
##  S106   :  1                                       Max.   :4.8240  
##  (Other):163                                                       
##    ECON.VOCAB      VOCAB.TEST   
##  Min.   :159.4   Min.   :25.61  
##  1st Qu.:245.5   1st Qu.:59.07  
##  Median :275.2   Median :71.33  
##  Mean   :276.4   Mean   :70.06  
##  3rd Qu.:303.9   3rd Qu.:81.32  
##  Max.   :420.8   Max.   :99.54  
##

Answer: according to the information collected from the data only 28 students participated in the book club.

1.4 Which students received a score greater than 98 on the vocabulary test? 1 point

higher_scorev <- filter(sa, VOCAB.TEST >= 98)
View(higher_scorev)

1.5 Print all the data for those students only. Which of those students did not participate in the stay abroad? 1 point

print(higher_scorev)

##     ID ABROAD BOOKCLUB   PLACEMENT LEX.SOPH ECON.VOCAB VOCAB.TEST
## 1 S111    yes       no lang_assist 1.643452   303.9004   99.24295
## 2 S122    yes       no lang_assist 1.723281   275.1969   99.28448
## 3 S222     no       no        <NA> 3.214286   253.2977   99.36443
## 4 S230     no       no        <NA> 3.158168   219.6031   99.54392

Answer: according to the data filtered we can see that only two students (S222 and S230) that obtaianed a score greater than 98 did not participate in the stay abroad.

2 You want to test whether learners who participated in a stay abroad (ABROAD = yes) were also more likely to participate in the book club (BOOKCLUB = yes) as compared to learners who did not participate in the stay abroad (ABROAD = no). (Total: 7 points)

2.1 Formulate hypotheses. 1 point

Answer: H0= Students who did not participate in the stay abroad were as eager to join the book club as their counterparts who participated in the stay abroad H1= Students who did not participate in the stay abroad were not as eager to join the book club as their counterparts who participated in the stay abroad

2.2 Summarize the data numerically and represent the data graphically. 2 points

sa_abroad_book <- table(ABROAD,BOOKCLUB)
View(sa_abroad_book)
plot(sa_abroad_book)

2.3 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 2 points

2.4 Summarize the result(s). 2 points

Answer:

3 A literature review indicates that reading has a positive impact on vocabulary. You therefore want to test whether students who participated in the book club (BOOKCLUB = “yes”) used a more sophisticated vocabulary (LEX.SOPH) as compared to students who did not participate in the book club (BOOKCLUB = “no”). (Total: 10 points)

3.1 Formulate hypotheses. 1 point

Answer: H0 = Students who participated in the book club do not use more sophisticated vocabulary in comparison to those students that did not participate in the book club H1= Students who participated in the book club use more sophisticated vocabulary in comparison to those students that did not participate in the book club

3.2 Is your alternative hypothesis one-tailed or two-tailed? Explain. 1 point

Answer: the alternate hypothesis is one-tailed provided that that a higher level of sophistication is expected from the students that were part of book club.

3.3 Are the samples dependent or independent? Explain. 1 point

Answer: the samples collected are independent provided that the population studied are students that had different experiences in their learning, therefore, despite the fact that they may belong to the same university, they are considered independent from each other.

3.4 Calculate descriptive statistics and represent the data graphically. 2 points

BookVSVoc <- table(BOOKCLUB, LEX.SOPH)
summary(BookVSVoc)

## Number of cases in table: 169 
## Number of factors: 2 
## Test for independence of all factors:
##  Chisq = 169, df = 164, p-value = 0.3783
##  Chi-squared approximation may be incorrect

3.5 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 3 points

NSL.aov <- aov(BOOKCLUB ~ LEX.SOPH, data = sa)

## Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
## response will be ignored

## Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors

3.6 Summarize the result(s). 2 points

Answer:

LFIAL2260 Resit Exam 2024

Mejia Vargas Gabriel Alfonso

2024-08-23

1 General Questions (Total: 5 points)

1.1 Load any packages you will need and then load the data set (`stay-abroad.csv`) into a dataframe called “sa”. 1 point

1.2 Print an overview of the data frame to check if all variables have been imported properly. What type of variable is `PLACEMENT`? 1 point

1.3 Print a summary of the data frame. How many students participated in the book club? 1 point

1.4 Which students received a score greater than 98 on the vocabulary test? 1 point

1.5 Print all the data for those students only. Which of those students did not participate in the stay abroad? 1 point

2 You want to test whether learners who participated in a stay abroad (ABROAD = yes) were also more likely to participate in the book club (BOOKCLUB = yes) as compared to learners who did not participate in the stay abroad (ABROAD = no). (Total: 7 points)

2.1 Formulate hypotheses. 1 point

2.2 Summarize the data numerically and represent the data graphically. 2 points

2.3 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 2 points

2.4 Summarize the result(s). 2 points

3.1 Formulate hypotheses. 1 point

3.2 Is your alternative hypothesis one-tailed or two-tailed? Explain. 1 point

3.3 Are the samples dependent or independent? Explain. 1 point

3.4 Calculate descriptive statistics and represent the data graphically. 2 points

3.5 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 3 points

3.6 Summarize the result(s). 2 points

LFIAL2260 Resit Exam 2024

Mejia Vargas Gabriel Alfonso

2024-08-23

1 General Questions (Total: 5 points)

1.1 Load any packages you will need and then load the data set (stay-abroad.csv) into a dataframe called “sa”. 1 point

1.2 Print an overview of the data frame to check if all variables have been imported properly. What type of variable is PLACEMENT? 1 point

1.3 Print a summary of the data frame. How many students participated in the book club? 1 point

1.4 Which students received a score greater than 98 on the vocabulary test? 1 point

1.5 Print all the data for those students only. Which of those students did not participate in the stay abroad? 1 point

2 You want to test whether learners who participated in a stay abroad (ABROAD = yes) were also more likely to participate in the book club (BOOKCLUB = yes) as compared to learners who did not participate in the stay abroad (ABROAD = no). (Total: 7 points)

2.1 Formulate hypotheses. 1 point

2.2 Summarize the data numerically and represent the data graphically. 2 points

2.3 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 2 points

2.4 Summarize the result(s). 2 points

3.1 Formulate hypotheses. 1 point

3.2 Is your alternative hypothesis one-tailed or two-tailed? Explain. 1 point

3.3 Are the samples dependent or independent? Explain. 1 point

3.4 Calculate descriptive statistics and represent the data graphically. 2 points

3.5 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 3 points

3.6 Summarize the result(s). 2 points

4.1 Formulate hypotheses. 1 point

4.2 Are the samples dependent or independent? Explain. 1 point

4.3 Calculate descriptive statistics and represent the data graphically. 2 points

4.4 Test your hypothesis with analytical statistics and calculate an effect size if the test is significant. 4 points

4.5 Summarize the result(s). 3 points

5 You want to test whether the sophistication of economics-related vocabulary (ECON.VOCAB) in the argumentative essay is somehow correlated with the scores that students received on their vocabulary test (VOCAB.TEST). (Total: 7 points)

5.1 Formulate hypotheses. 1 point

5.2 Is your alternative hypothesis one-tailed or two-tailed? Explain. 1 point

5.3 Represent the data graphically. 1 point

5.4 Test your hypothesis with analytical statistics. 3 points

5.5 Summarize the result(s). 1 point

1.1 Load any packages you will need and then load the data set (`stay-abroad.csv`) into a dataframe called “sa”. 1 point

1.2 Print an overview of the data frame to check if all variables have been imported properly. What type of variable is `PLACEMENT`? 1 point