Dear students,

For the following task we will use household data adapted from the Russian Longitudinal Monitoring Survey (RLMS-HSE). A full dataset can be downloaded from here: https://www.hse.ru/rlms/spss

The data set “potatoes.csv” (in the Data folder) contains data on the households’ crops harvest - potatoes, tomatoes, cucumbers, beetroots, carrots, and cabbages. There is also a variable on whether the household bought any potatoes on the week previous to the survey. The last variable shows whether this household is situated in a village, town, city, or the regional capital.

Solve the data problems below, write down your interpretations, and submit them via the following form (open it in a new window).

Problem Set

df <- read.csv("potatoes.csv")
df <- df[ ,-1]
library(tidyverse)
library(sjPlot)
view_df(df)
Data frame: df
ID Name Label Values Value Labels
1 potato range: 0.5-1250.0
2 tomato range: 0.5-1200.0
3 cucumber range: 0.5-1200.0
4 beetroot range: 0.5-2048.0
5 carrot range: 0.5-1030.0
6 cabbage range: 1.0-20000.0
7 bought_potatoes_last_week No
Yes
8 settlement_type gorod
obl center
pgt
selo
library(psych)
df %>% 
  describe() %>% 
  knitr::kable(digits = 2)
vars n mean sd median trimmed mad min max range skew kurtosis se
potato 1 23608 415.12 314.29 320 381.60 296.52 0.5 1250 1249.5 0.78 -0.36 2.05
tomato 2 23608 75.78 72.22 50 63.35 44.48 0.5 1200 1199.5 3.44 24.57 0.47
cucumber 3 23608 57.71 52.49 49 49.31 31.13 0.5 1200 1199.5 4.28 41.32 0.34
beetroot 4 23608 31.37 53.42 20 24.66 14.83 0.5 2048 2047.5 17.76 477.57 0.35
carrot 5 23608 40.40 36.41 30 34.84 22.24 0.5 1030 1029.5 5.07 68.74 0.24
cabbage 6 23608 76.15 152.62 50 62.31 44.48 1.0 20000 19999.0 95.29 12311.00 0.99
bought_potatoes_last_week* 7 23608 1.02 0.15 1 1.00 0.00 1.0 2 1.0 6.59 41.40 0.00
settlement_type* 8 23608 2.70 1.27 3 2.75 1.48 1.0 4 3.0 -0.19 -1.66 0.01

Task 1

Did households that had bought potatoes the week before survey had had a smaller average harvest of potatoes? Compare the average harvest between those who had bought and those who hadn’t.

Visualize the relationship, run the formal test(s), and interpret the results. If the test is statistically significant, report the effect size.

Task 2

Does household’s potato harvest depend on where the household resides? Compare the potato harvest (potato) by where the household lives (settlement_type).

Calculate the mean harvest by settlement type, visualise the difference in a plot, then calculate and interpret a formal test. If the test is significant, run pairwise comparisons between groups, calculate the effect size, and make your conclusions.

Task 3

Are households more likely to grow some vegetables together? Check the relationships of all six varieties of harvested vegetables.

To perform this task, think whether you will need all the data or it will be easier to select only the six variables involved in the task.

First, sum up the data in a table in order to estimate the skew and kurtosis of all six harvested vegetables. Are they normally distributed? Decide which correlation coefficient is better to use here, then run a correlation matrix and report your results. Show a scatterplot of the strongest correlation in the data.

Task 4

Lastly, are city households more likely to buy potatoes? Compare whether those who bought and didn’t buy potatoes (bought_potatoes_last_week) are evenly spread between the various types of settlements (settlement_type). If they are not independent, name where households are more likely to buy potatoes.

The end.