Dear students,
For the following task we will use household data adapted from the Russian Longitudinal Monitoring Survey (RLMS-HSE). A full dataset can be downloaded from here: https://www.hse.ru/rlms/spss
The data set “potatoes.csv” (in the Data folder) contains data on the households’ crops harvest - potatoes, tomatoes, cucumbers, beetroots, carrots, and cabbages. There is also a variable on whether the household bought any potatoes on the week previous to the survey. The last variable shows whether this household is situated in a village, town, city, or the regional capital.
df <- read.csv("/Users/olesyavolchenko/Downloads/potatoes.csv")
df <- df[ ,-1]
library(tidyverse)
library(sjPlot)
view_df(df)
ID | Name | Label | Values | Value Labels |
---|---|---|---|---|
1 | potato | range: 0.5-1250.0 | ||
2 | tomato | range: 0.5-1200.0 | ||
3 | cucumber | range: 0.5-1200.0 | ||
4 | beetroot | range: 0.5-2048.0 | ||
5 | carrot | range: 0.5-1030.0 | ||
6 | cabbage | range: 1.0-20000.0 | ||
7 | bought_potatoes_last_week | <output omitted> | ||
8 | settlement_type | <output omitted> |
df %>%
head() %>%
knitr::kable()
potato | tomato | cucumber | beetroot | carrot | cabbage | bought_potatoes_last_week | settlement_type |
---|---|---|---|---|---|---|---|
100 | 8 | 50 | 100 | 20 | 70 | Yes | selo |
30 | 35 | 2 | 2 | 7 | 5 | Yes | obl center |
100 | 300 | 200 | 10 | 10 | 100 | Yes | obl center |
40 | 10 | 10 | 5 | 5 | 20 | Yes | obl center |
50 | 40 | 20 | 12 | 12 | 6 | Yes | gorod |
50 | 20 | 60 | 20 | 1 | 60 | Yes | obl center |
Are households more likely to grow some vegetables together? Check the relationships of all six varieties of harvested vegetables (potato, tomato, cucumber, beetroot, carrot, cabbage).
To perform this task, think whether you will need all the data or it will be easier to select only the six variables involved in the task.
First, conclude about normality of all six variables. Are they normally distributed? Decide which correlation coefficient is better to use here, then run a correlation matrix and report your results. Show a scatterplot of the strongest correlation in the data.
Now, let’s see whether people are growing more tomatoes or cucumbers. Perform a test that will show you whether respondents are harvesting more tomatoes or cucumber. Make a plot, report the result, establish effect size.
Did households that had bought potatoes the week before survey had had a smaller average harvest of potatoes? Compare the average harvest between those who had bought and those who hadn’t.
Visualize the relationship, run the formal test(s), and interpret the results. If the test is statistically significant, report the effect size.
Does household’s potato harvest depend on where the household resides? Compare the potato harvest (potato
) by where the household lives (settlement_type
).
Calculate the mean harvest by settlement type, visualise the difference in a plot, then calculate and interpret a formal test. If the test is significant, run pairwise comparisons between groups, calculate the effect size, and make your conclusions.
Lastly, are city households more likely to buy potatoes? Compare whether those who bought and didn’t buy potatoes (bought_potatoes_last_week
) are evenly spread between the various types of settlements (settlement_type
). If they are not independent, name where households are more likely to buy potatoes.