Dear students,

For the following task we will use household data adapted from the Russian Longitudinal Monitoring Survey (RLMS-HSE). A full dataset can be downloaded from here: https://www.hse.ru/rlms/spss

The data set “potatoes.csv” (in the Data folder) contains data on the households’ crops harvest - potatoes, tomatoes, cucumbers, beetroots, carrots, and cabbages. There is also a variable on whether the household bought any potatoes on the week previous to the survey. The last variable shows whether this household is situated in a village, town, city, or the regional capital.

Problem Set

df <- read.csv("/Users/olesyavolchenko/Downloads/potatoes.csv")
df <- df[ ,-1]
library(tidyverse)
library(sjPlot)
view_df(df)
Data frame: df
ID Name Label Values Value Labels
1 potato range: 0.5-1250.0
2 tomato range: 0.5-1200.0
3 cucumber range: 0.5-1200.0
4 beetroot range: 0.5-2048.0
5 carrot range: 0.5-1030.0
6 cabbage range: 1.0-20000.0
7 bought_potatoes_last_week <output omitted>
8 settlement_type <output omitted>
 df %>% 
   head() %>% 
   knitr::kable()
potato tomato cucumber beetroot carrot cabbage bought_potatoes_last_week settlement_type
100 8 50 100 20 70 Yes selo
30 35 2 2 7 5 Yes obl center
100 300 200 10 10 100 Yes obl center
40 10 10 5 5 20 Yes obl center
50 40 20 12 12 6 Yes gorod
50 20 60 20 1 60 Yes obl center

Task 1

Are households more likely to grow some vegetables together? Check the relationships of all six varieties of harvested vegetables (potato, tomato, cucumber, beetroot, carrot, cabbage).

To perform this task, think whether you will need all the data or it will be easier to select only the six variables involved in the task.

First, conclude about normality of all six variables. Are they normally distributed? Decide which correlation coefficient is better to use here, then run a correlation matrix and report your results. Show a scatterplot of the strongest correlation in the data.

Task 2

Now, let’s see whether people are growing more tomatoes or cucumbers. Perform a test that will show you whether respondents are harvesting more tomatoes or cucumber. Make a plot, report the result, establish effect size.

Task 3

Did households that had bought potatoes the week before survey had had a smaller average harvest of potatoes? Compare the average harvest between those who had bought and those who hadn’t.

Visualize the relationship, run the formal test(s), and interpret the results. If the test is statistically significant, report the effect size.

Task 4

Does household’s potato harvest depend on where the household resides? Compare the potato harvest (potato) by where the household lives (settlement_type).

Calculate the mean harvest by settlement type, visualise the difference in a plot, then calculate and interpret a formal test. If the test is significant, run pairwise comparisons between groups, calculate the effect size, and make your conclusions.

Task 5

Lastly, are city households more likely to buy potatoes? Compare whether those who bought and didn’t buy potatoes (bought_potatoes_last_week) are evenly spread between the various types of settlements (settlement_type). If they are not independent, name where households are more likely to buy potatoes.

The end.