What is the correct mean and standard deviation of the quantity of pasta purchased by time unit by household?

q1 = c(mean(df$PASTA), sd(df$PASTA))
q1
## [1] 1.841545 1.025911

In which area are located (i) the poorest household and (ii) the wealthiest household?

order.scores = order(df$INCOME)
df = df[order.scores,]
df$rank = rank(df$INCOME)
head(df, n = 1)
##       HHID TIME    PASTA EXPOS      AGE   INCOME AREA rank
## 35241 1763    1 2.332987     0 32.21623 609.1072    2 10.5
tail(df, n = 1)
##       HHID TIME    PASTA EXPOS      AGE INCOME AREA    rank
## 19240  962   20 3.201276     0 28.54533 141067    5 39990.5

What is the maximum pasta quantity a household has bought over the whole time period? (Sum the quantity of pasta by household over time and indicate the maximum)

q3 = df %>% group_by(HHID) %>% summarize(pasta_qty = sum(PASTA) )

max(q3$pasta_qty)
## [1] 55.36193

What is the average income of households living in area 4?

q4 = subset(df, AREA == 4, select = c(HHID, INCOME))
q41 = distinct(q4, HHID, .keep_all = TRUE)
mean(q41$INCOME)
## [1] 29260.13

How many households live in area 2, earn more than 20k, and have purchased more than 30 units of pasta over the whole time period?

q5 = df %>% filter(AREA == 2) %>% filter(INCOME > 20000) %>%
      group_by(HHID) %>% 
      summarize(total_qty = sum(PASTA) ) %>% filter(total_qty > 30)
nrow(q5)
## [1] 218

What is the correlation between the purchases of pasta and the exposures?

cor(df$PASTA, df$EXPOS)
## [1] 0.3266174

Which of the following graphs reports the correct histogram by household of the total purchase of pasta made by the household over the whole period? (Sum the purchases by household and make a histogram.)

Note that the color or exact representation may be different in your version.

q7 = df %>% group_by(HHID) %>%
      summarize(total = sum(PASTA))
q71 = distinct(q7, HHID, .keep_all = TRUE)
hist(q71$total, ylim = c(0,700), main = "Total purchase of pasta by Household", xlab = "Purchases", ylab = "Frequencies", col = "blue")

Which of the following graphs reports the correct time series of the overall total purchase of pasta? (Sum the purchases by time units and plot the quantity by time unit.)

q8 = df %>% group_by(TIME) %>% summarize( total_qty = sum(PASTA))
plot(q8$total_qty, main = "Time series of overall total purchase of pasta", col = "blue", xlab = "Time", ylab = "Quantity")