For the following task we will use household data adapted from the Russian Longitudinal Monitoring Survey (RLMS-HSE). A full dataset can be downloaded from here: https://www.hse.ru/rlms/spss

The data set “potatoes.csv” (in the Data folder) contains data on the households’ crops harvest - potatoes, tomatoes, cucumbers, beetroots, carrots, and cabbages. There is also a variable on whether the household bought any potatoes on the week previous to the survey. The last variable shows whether this household is situated in a village, town, city, or the regional capital.

Problem Set

df <- read.csv("/Users/olesyavolchenko/Downloads/potatoes.csv")
df <- df[ ,-1]
library(tidyverse)
library(sjPlot)
view_df(df)

Data frame: df
ID	Name	Values	Value Labels
1	potato	range: 0.5-1250.0
2	tomato	range: 0.5-1200.0
3	cucumber	range: 0.5-1200.0
4	beetroot	range: 0.5-2048.0
5	carrot	range: 0.5-1030.0
6	cabbage	range: 1.0-20000.0
7	bought_potatoes_last_week		<output omitted>
8	settlement_type		<output omitted>

 df %>% 
   head() %>% 
   knitr::kable()

potato	tomato	cucumber	beetroot	carrot	cabbage	bought_potatoes_last_week	settlement_type
100	8	50	100	20	70	Yes	selo
30	35	2	2	7	5	Yes	obl center
100	300	200	10	10	100	Yes	obl center
40	10	10	5	5	20	Yes	obl center
50	40	20	12	12	6	Yes	gorod
50	20	60	20	1	60	Yes	obl center

Task 1

Are households more likely to grow some vegetables together? Check the relationships of all six varieties of harvested vegetables (potato, tomato, cucumber, beetroot, carrot, cabbage).

To perform this task, think whether you will need all the data or it will be easier to select only the six variables involved in the task.

First, conclude about normality of all six variables. Are they normally distributed? Decide which correlation coefficient is better to use here, then run a correlation matrix and report your results. Show a scatterplot of the strongest correlation in the data.

Task 2

Now, let’s see whether people are growing more tomatoes or cucumbers. Perform a test that will show you whether respondents are harvesting more tomatoes or cucumber. Make a plot, report the result, establish effect size.

Task 3

Did households that had bought potatoes the week before survey had had a smaller average harvest of potatoes? Compare the average harvest between those who had bought and those who hadn’t.

Visualize the relationship, run the formal test(s), and interpret the results. If the test is statistically significant, report the effect size.

Task 4

Does household’s potato harvest depend on where the household resides? Compare the potato harvest (potato) by where the household lives (settlement_type).

Calculate the mean harvest by settlement type, visualise the difference in a plot, then calculate and interpret a formal test. If the test is significant, run pairwise comparisons between groups, calculate the effect size, and make your conclusions.

Task 5

Lastly, are city households more likely to buy potatoes? Compare whether those who bought and didn’t buy potatoes (bought_potatoes_last_week) are evenly spread between the various types of settlements (settlement_type). If they are not independent, name where households are more likely to buy potatoes.

Practice Session 5

Olesya Volchenko, Anna Shirokanova