Submission

You should submit the R file (R script or Rmd-file) by the end of a midterm via email.

General notes

Note 1: you can use either basic plotting functions in R or ggplot2 for creating graphs. If your graphs look pretty (accurate titles, labels, colors, etc), you can earn bonus 1-2 points for each problem: 1 for pretty basic graphs, 2 for ggplot2 ones.

Note 2: you can use either basic R functions for data handling or dplyr ones. If you use dplyr, you can get 1 extra point.

Data description

Data stored in rep_Cowles.csv contains the results of a survey organized at the bachelour programme on psychology. First year students were asked to answer different questions and then the level of extraversion and neurotism was computed.

Variables:

Problem 1 (2 points)

  1. Load data using this link and save it as surv.
  2. Get descriptive statistics for all variables in a dataset. Choose one quantitative variable and interpret all the statistics for this variable.
  3. Delete rows that contain missing values and save changes to surv.

Problem 2 (3 points)

  1. Keep only those rows that correspond to female students. Save changes to surv.

  2. Exclude rows that correspond to students with lie more than 2. Save changes to surv.

Problem 3 (6 points)

  1. Create a histogram of neuro. Describe this distribution in words: say whether it is symmetric or not and if it is not symmetric, state whether it is right-skewed or left-skewed.

  2. Judging by this histogram, can we say that there are outliers in this sample of students? Explain your answer.

  3. Create a boxplot of extra. Are there outliers in data? If yes, are they ‘natural’ (really extreme values) or might have occured as a result of mistake?

Problem 4 (8 points)

Imagine that you are asked to check whether the level of neurotism is different for students who volunteer and who do not. You are going to perform formal hypothesis testing.

  1. Formulate the null hypothesis you are going to test. Write it as a comment.

  2. Choose a suitable test and perform it in R. Report the R code and the output.

  3. Make a statistical conclusion: decide whether your null hypothesis is rejected at the 1% statistical level. Write it as a comment.

  4. Make a substantial conclusion: decide whether the level of neurotism differs for volunteers and non-volunteers. Write it as a comment.

Problem 5 (4 points)

  1. Calculate a 99% confidence interval for the mean value of extra.

  2. Provide an interpretaion of a confidence interval. Calculate its length and report it. Write it as a comment.

Problem 6 (7 points)

  1. Create a scatterplot that will show the association between extra and neuro. Comment on the direction of association between variables and its strength. Write it as a comment.

  2. Which correlation coefficient seems to be suitable to measure the assosiation between extra and neuro? Choose an appropriate coefficient, calculate it and test its significance. Report the R code you used.

  3. Can you conclude that these two variables are associated? Explain you answer. Write it as a comment.

Problem 7 (5 points)

Imagine you have to check whether the participation in volunteering depends on the people’s sex.

Choose an appropriate test you should use to check this, perform it and make a statistical (reject/not reject) and a substantial conclusion (depends/not depends).