You should submit the R file (R script or Rmd-file) by the end of a midterm via email.
Note 1: you can use either basic plotting functions in R or ggplot2
for creating graphs. If your graphs look pretty (accurate titles, labels, colors, etc), you can earn bonus 1-2 points for each problem: 1 for pretty basic graphs, 2 for ggplot2
ones.
Note 2: you can use either basic R functions for data handling or dplyr
ones. If you use dplyr
, you can get 1 extra point.
Data stored in rep_Cowles.csv
contains the results of a survey organized at the bachelour programme on psychology. First year students were asked to answer different questions and then the level of extraversion and neurotism was computed.
Variables:
sex
: student’s sex (female
and male
);volunteer
: whether a student volunteers regularly or not (no
, yes
);extra
: index of extraversion;neuro
: index of neurotism;lie
: index of insincerity in answers;surv
.surv
.Keep only those rows that correspond to female students. Save changes to surv
.
Exclude rows that correspond to students with lie
more than 2. Save changes to surv
.
Create a histogram of neuro
. Describe this distribution in words: say whether it is symmetric or not and if it is not symmetric, state whether it is right-skewed or left-skewed.
Judging by this histogram, can we say that there are outliers in this sample of students? Explain your answer.
Create a boxplot of extra
. Are there outliers in data? If yes, are they ‘natural’ (really extreme values) or might have occured as a result of mistake?
Imagine that you are asked to check whether the level of neurotism is different for students who volunteer and who do not. You are going to perform formal hypothesis testing.
Formulate the null hypothesis you are going to test. Write it as a comment.
Choose a suitable test and perform it in R. Report the R code and the output.
Make a statistical conclusion: decide whether your null hypothesis is rejected at the 1% statistical level. Write it as a comment.
Make a substantial conclusion: decide whether the level of neurotism differs for volunteers and non-volunteers. Write it as a comment.
Calculate a 99% confidence interval for the mean value of extra
.
Provide an interpretaion of a confidence interval. Calculate its length and report it. Write it as a comment.
Create a scatterplot that will show the association between extra
and neuro
. Comment on the direction of association between variables and its strength. Write it as a comment.
Which correlation coefficient seems to be suitable to measure the assosiation between extra
and neuro
? Choose an appropriate coefficient, calculate it and test its significance. Report the R code you used.
Can you conclude that these two variables are associated? Explain you answer. Write it as a comment.
Imagine you have to check whether the participation in volunteering depends on the people’s sex.
Choose an appropriate test you should use to check this, perform it and make a statistical (reject/not reject) and a substantial conclusion (depends/not depends).