Submission

You should submit the R file (R script or Rmd-file) by the end of a midterm via this link.

General notes

Note 1: you can use either basic plotting functions in R or ggplot2 for creating graphs. If your graphs look pretty (accurate titles, labels, colors, etc), you can earn bonus 1-2 points for each problem: 1 for pretty basic graphs, 2 for ggplot2 ones.

Note 2: you can use either basic R functions for data handling or dplyr ones. If you use dplyr, you can get 1 extra point.

Data description

Data stored in survey01.csv contains the results of a survey organized at the bachelour programme on psychology. First year students were asked to provide some personal information (except name) and to participate in a small experiment. Participants had to evaluate the length of the interval and the size of the angle shown at the whiteboard. Then the absolute values of deviations from the correct answers (for length and angle) were recorded.

Variables:

Problem 1 (2 points)

  1. Load data using this link and save it as surv.
  2. Get descriptive statistics for all variables in a dataset. Choose one quantitative variable and interpret all the statistics for this variable.
  3. Delete rows that contain missing values and save changes to surv.

Problem 2 (3 points)

  1. Keep only those rows that correspond to students from Moscow and not from Moscow. Save changes to surv.

  2. Keep only those rows that correspond to students who specified their sex correctly (only 1 and 2, not 3 or 22). Save changes to surv.

  3. Exclude rows that correspond to students who claimed that the length of an interval and the angle provided equal 0. Save changes to surv.

Problem 3 (6 points)

  1. Create a histogram of height. Describe this distribution in words: say whether it is symmetric or not and if it is not symmetric, state whether it is right-skewed or left-skewed.

  2. Judging by this histogram, can we say that there are outliers in this sample of students? Explain your answer.

  3. Create a boxplot of length. Are there outliers in data? If yes, are they ‘natural’ (really extreme values) or might have occured as a result of mistake?

Problem 4 (8 points)

Imagine that you are asked to check whether the deviation from the correct answer to the question about an interval length is different for students who chose R and for those who chose SPSS. You are going to perform formal hypothesis testing.

  1. Formulate the null hypothesis you are going to test. Write it as a comment.

  2. Choose a suitable test and perform it in R. Report the R code and the output.

  3. Make a statistical conclusion: decide whether your null hypothesis is rejected at the 5% statistical level. Write it as a comment.

  4. Make a substantial conclusion: decide whether deviation from the correct answer differs for R-users and SPSS-users. Write it as a comment.

Problem 5 (4 points)

  1. Calculate a 90% confidence interval for the mean value of length.

  2. Provide an interpretaion of a confidence interval. Calculate its length and report it. Write it as a comment.

Problem 6 (7 points)

  1. Create a scatterplot that will show the association between length and angle. Comment on the direction of association between variables and its strength. Write it as a comment.

  2. Which correlation coefficient seems to be suitable to measure the assosiation between length and angle? Choose an appropriate coefficient, calculate it and test its significance. Report the R code you used.

  3. Can you conclude that these two variables are associated? Explain you answer. Write it as a comment.

Problem 7 (5 points)

Imagine you have to check whether the soft prefered (R or SPSS) depends on the favourite subject at school.
Choose an appropriate test you should use to check this, perform it and make a statistical (reject/not reject) and a substantial conclusion (depends/not depends).