title: "Problem Set 2 Part 2: R" output: pdf_document date: "2025-09-10" ---
{r setup, include=FALSE} knitr::opts_chunk$set(echo = T)
This part of the problem set involves the usage of R and the submission of R codes. Please fill in your R codes and answers in the indicated code chunks/areas. When finished, please knit the document to PDF and submit the PDF file to Moodle.
In the case that your document does not knit to PDF, this is the time to debug and ask me questions before the review. You may try the following:
Before we begin, we will need to load the necessary packages:
{r, message=FALSE} library(tidyverse)
\newpage
When we talk about the dispersion of a data set we are concerned with how spread out it is. Both standard deviation and IQR are measurements of dispersion. However, they are not the only measurements of dispersion.
One alternative statistic we can use is the mean absolute deviation. We can start from
$$ \bar X - X_i\,, $$
which is the deviation. Then the absolute deviation is
$$ |\bar x - x_i| $$
which takes the absolute value of $\bar x - x_i$. The mean absolute deviation is defined by taking the mean, which is
$$ \frac{\sum{i=1}^n |\bar x -xi|}{n}\,. $$
Suppose we have the following data set:
{r} x <- c(2, 0, 6, 28, 19, 6)
Write code to find the mean of x and save it in a
variable called mean_x.
{r} ## Your code goes here mean_x <- mean(x)
Find the vector $\bar x - \boldsymbol x$. (Note $\bar x$ is a scalar
number and $\boldsymbol x$ is a vector. In R, you can subtract a scalar
from a vector directly. Try x-1 and see what happens).
```{r}
mean_x - x
```
Use your codes from the previous question and find the absolute value
to obtain $|\bar x - \boldsymbol x|$. In R, you can find the absolute
value using the abs() function.
```{r}
mean(abs(mean_x - x))
```
Use your codes from the previous question and find
$$ \frac{\sum{i=1}^n |\bar x -xi|}{n}\,. $$
Note that the form of $\frac{\sumi yi}{n}$ is simply the
arithmatic mean. So, you may consider using the mean()
function.
```{r}
mean(x, trim = 0, na.rm = F)
```
In the datasets folder on Moodle, download a dataset named
food_access.csv. Load the dataset into R and name it
food.
If you are using Jupyter Hub, put the dataset in the same folder as
this notebook so that you can load the dataset directly by its name
without using file.choose(). If you need help with the
read_csv() function, use the ?read_csv
command.
{r} ## Your code goes here library(tidyverse) food <- read.csv("food_access.csv")
Filter the data using the State variable to include only
observations from North Carolina. Name the output dataset
Food again.
```{r}
food <- food %>% filter(State == "North Carolina")
```
Use your codes from Question 4 to the find the mean
absolute deviation for the NUMGQTRS variable. (Note: To
extract a column as a vector, use the df$column
syntax.)
```{r}
mean(abs(food$NUMGQTRS))
```
Use the data dictionary for the dataset (named
food_access_dictionary.csv on Moodle) to determine the
meaning of the NUMGQTRS variable.
Your answer:
The amount of population residing in group quarters in each county of North Carolina).