The United Nations Food and Agriculture Organization publication, The State of Food Security and Nutrition in the World 2022 (https://www.fao.org/documents/card/en/c/cc0639en) might lead one to the conclusion that it’s an elsewhere problem. That the people who are suffering malnutrition and starvation are “elsewhere”, not in our backyard. For this assignment you will need to take a closer look here at home (the US)
Notes:
You will need to locate and source data that reflects food security and nutrition by state broken down by men, women, children and by age groups
Your analysis should demonstrate correlations that exist between level of poverty and food insecurity, malnutrition and starvation.
Your data and analysis should also indicate what happens to the children as they mature into adults. Will they become fully functional citizens or will they require continued support?
You data visualizations need to tell the story for a political audience that you were lobbying to address the issue of food insecurity in the US
Loading household dataset
#Loading data from CSV file
data <- read.csv("C:/Users/aleja/Desktop/food-data.csv")
#View of the first few rows of dataset
head(data)
## Year Category Subcategory Sub.subcategory
## 1 2001 All households
## 2 2001 Household composition With children < 18 years
## 3 2001 Household composition With children < 18 years With children < 6 years
## 4 2001 Household composition With children < 18 years Married-couple families
## 5 2001 Household composition With children < 18 years Female head, no spouse
## 6 2001 Household composition With children < 18 years Male head, no spouse
## Total Food.secure.1.000 Food.secure.percent Food.insecure.1.000
## 1 107824 96303 89.3 11521
## 2 38330 32141 83.9 6189
## 3 16858 13920 82.6 2938
## 4 26182 23389 89.3 2793
## 5 9080 6185 68.1 2895
## 6 2389 2009 84.1 380
## Food.insecure.percent Low.food.security.1.000 Low.food.security.percent
## 1 10.7 8010 7.4
## 2 16.1 4744 12.4
## 3 17.4 2304 13.7
## 4 10.7 2247 8.6
## 5 31.9 2101 23.1
## 6 15.9 298 12.5
## Very.low.food.security.1.000 Very.low.food.security.percent
## 1 3511 3.3
## 2 1445 3.8
## 3 634 3.8
## 4 546 2.1
## 5 794 8.7
## 6 82 3.4
## 'data.frame': 660 obs. of 13 variables:
## $ Year : int 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 ...
## $ Category : chr "All households" "Household composition" "Household composition" "Household composition" ...
## $ Subcategory : chr "" "With children < 18 years" "With children < 18 years" "With children < 18 years" ...
## $ Sub.subcategory : chr "" "" "With children < 6 years" "Married-couple families" ...
## $ Total : int 107824 38330 16858 26182 9080 2389 678 69495 40791 16513 ...
## $ Food.secure.1.000 : int 96303 32141 13920 23389 6185 2009 555 64163 38328 14915 ...
## $ Food.secure.percent : num 89.3 83.9 82.6 89.3 68.1 84.1 81.9 92.3 94 90.3 ...
## $ Food.insecure.1.000 : int 11521 6189 2938 2793 2895 380 123 5332 2463 1598 ...
## $ Food.insecure.percent : num 10.7 16.1 17.4 10.7 31.9 15.9 18.1 7.7 6 9.7 ...
## $ Low.food.security.1.000 : chr "8010" "4744" "2304" "2247" ...
## $ Low.food.security.percent : num 7.4 12.4 13.7 8.6 23.1 12.5 14.6 4.7 3.9 5.8 ...
## $ Very.low.food.security.1.000 : chr "3511" "1445" "634" "546" ...
## $ Very.low.food.security.percent: num 3.3 3.8 3.8 2.1 8.7 3.4 3.5 3 2.1 3.9 ...
Loading state dataset
#Loading data from CSV file
state <- read.csv("C:/Users/aleja/Desktop/food by state.csv")
#View of the first few rows of dataset
head(state)
## Year State Food.insecurity.prevalence
## 1 2006–2008 U.S. total 12.2
## 2 2006–2008 AK 11.6
## 3 2006–2008 AL 13.3
## 4 2006–2008 AR 15.9
## 5 2006–2008 AZ 13.2
## 6 2006–2008 CA 12.0
## Food.insecurity.margin.of.error Very.low.food.security.prevalence
## 1 0.25 4.6
## 2 1.66 4.4
## 3 1.66 5.4
## 4 3.19 5.6
## 5 1.51 4.9
## 6 0.74 4.3
## Very.low.food.security.margin.of.error
## 1 0.18
## 2 1.31
## 3 1.02
## 4 1.50
## 5 0.84
## 6 0.48
## 'data.frame': 780 obs. of 6 variables:
## $ Year : chr "2006–2008" "2006–2008" "2006–2008" "2006–2008" ...
## $ State : chr "U.S. total" "AK" "AL" "AR" ...
## $ Food.insecurity.prevalence : num 12.2 11.6 13.3 15.9 13.2 12 11.6 11 12.4 9.4 ...
## $ Food.insecurity.margin.of.error : num 0.25 1.66 1.66 3.19 1.51 0.74 1.13 1.53 1.15 0.98 ...
## $ Very.low.food.security.prevalence : num 4.6 4.4 5.4 5.6 4.9 4.3 5 4.1 4.2 3.7 ...
## $ Very.low.food.security.margin.of.error: num 0.18 1.31 1.02 1.5 0.84 0.48 0.67 1.07 0.73 0.73 ...
For household dataset
#Looking for missing values
missing_values <- sapply(data, function(x) sum(is.na(x)))
print(missing_values)
## Year Category
## 0 0
## Subcategory Sub.subcategory
## 0 0
## Total Food.secure.1.000
## 0 0
## Food.secure.percent Food.insecure.1.000
## 0 0
## Food.insecure.percent Low.food.security.1.000
## 0 6
## Low.food.security.percent Very.low.food.security.1.000
## 7 5
## Very.low.food.security.percent
## 7
#Dropping rows with missing values
cleaned_data <- na.omit(data)
#Checking the dimensions of the cleaned dataset
dim(cleaned_data)
## [1] 653 13
#Converting Subcategory and Sub.subcategory to lowercase for case-insensitive matching
cleaned_data$Subcategory <- tolower(cleaned_data$Subcategory)
cleaned_data$Sub.subcategory <- tolower(cleaned_data$Sub.subcategory)
#Function to determine gender based on subcategory
get_gender <- function(subcategory) {
if (grepl("female|woman|girl|women", subcategory)) {
return("F")
} else if (grepl("male|man|boy|men", subcategory)) {
return("M")
} else {
return("Unknown")
}
}
#function to create Gender column
cleaned_data$Gender <- mapply(get_gender, cleaned_data$Subcategory)
#function to Sub.subcategory if Gender is still unknown
unknown_indices <- which(cleaned_data$Gender == "Unknown")
cleaned_data$Gender[unknown_indices] <- mapply(get_gender, cleaned_data$Sub.subcategory[unknown_indices])
#Replacing empty strings with "Unknown"
cleaned_data$Gender[cleaned_data$Gender == ""] <- "Unknown"
#Printing the first few rows to verify the Gender column
head(cleaned_data)
## Year Category Subcategory Sub.subcategory
## 1 2001 All households
## 2 2001 Household composition with children < 18 years
## 3 2001 Household composition with children < 18 years with children < 6 years
## 4 2001 Household composition with children < 18 years married-couple families
## 5 2001 Household composition with children < 18 years female head, no spouse
## 6 2001 Household composition with children < 18 years male head, no spouse
## Total Food.secure.1.000 Food.secure.percent Food.insecure.1.000
## 1 107824 96303 89.3 11521
## 2 38330 32141 83.9 6189
## 3 16858 13920 82.6 2938
## 4 26182 23389 89.3 2793
## 5 9080 6185 68.1 2895
## 6 2389 2009 84.1 380
## Food.insecure.percent Low.food.security.1.000 Low.food.security.percent
## 1 10.7 8010 7.4
## 2 16.1 4744 12.4
## 3 17.4 2304 13.7
## 4 10.7 2247 8.6
## 5 31.9 2101 23.1
## 6 15.9 298 12.5
## Very.low.food.security.1.000 Very.low.food.security.percent Gender
## 1 3511 3.3 Unknown
## 2 1445 3.8 Unknown
## 3 634 3.8 Unknown
## 4 546 2.1 Unknown
## 5 794 8.7 F
## 6 82 3.4 M
## Year Category Subcategory Sub.subcategory
## Min. :2001 Length:653 Length:653 Length:653
## 1st Qu.:2006 Class :character Class :character Class :character
## Median :2011 Mode :character Mode :character Mode :character
## Mean :2011
## 3rd Qu.:2017
## Max. :2022
## Total Food.secure.1.000 Food.secure.percent Food.insecure.1.000
## Min. : 496 Min. : 366 Min. :57.00 Min. : 105
## 1st Qu.: 15613 1st Qu.: 12000 1st Qu.:81.50 1st Qu.: 2426
## Median : 25180 Median : 20717 Median :87.30 Median : 3447
## Mean : 34568 Mean : 30091 Mean :84.26 Mean : 4476
## 3rd Qu.: 43842 3rd Qu.: 38929 3rd Qu.:90.20 3rd Qu.: 5944
## Max. :132730 Max. :118533 Max. :95.10 Max. :17853
## Food.insecure.percent Low.food.security.1.000 Low.food.security.percent
## Min. : 4.90 Length:653 Min. : 3.200
## 1st Qu.: 9.80 Class :character 1st Qu.: 5.900
## Median :12.70 Mode :character Median : 7.800
## Mean :15.74 Mean : 9.984
## 3rd Qu.:18.50 3rd Qu.:13.200
## Max. :43.00 Max. :25.300
## Very.low.food.security.1.000 Very.low.food.security.percent Gender
## Length:653 Min. : 0.800 Length:653
## Class :character 1st Qu.: 3.600 Class :character
## Mode :character Median : 4.700 Mode :character
## Mean : 5.752
## 3rd Qu.: 6.500
## Max. :19.300
For State dataset
#Looking for missing values
na_count <- sum(is.na(state))
cat("Number of missing values:", na_count, "\n")
## Number of missing values: 0
No missing values.
Creating regions for the analysis
#Defining the regions
northeast_states <- c("CT", "ME", "MA", "NH", "RI", "VT", "NJ", "NY", "PA", "DC")
midwest_states <- c("IL", "IN", "IA", "KS", "MI", "MN", "MO", "NE", "ND", "OH", "SD", "WI")
south_states <- c("AL", "AR", "DE", "FL", "GA", "KY", "LA", "MD", "MS", "NC", "OK", "SC", "TN", "TX", "VA", "WV")
west_states <- c("AK", "AZ", "CA", "CO", "HI", "ID", "MT", "NV", "NM", "OR", "UT", "WA", "WY")
#Assigning states to regions
state$Region <- ifelse(state$State %in% northeast_states, "Northeast",
ifelse(state$State %in% midwest_states, "Midwest",
ifelse(state$State %in% south_states, "South",
ifelse(state$State %in% west_states, "West", NA))))
#Loading package
library(ggplot2)
#Exploring trends over time plot
ggplot(cleaned_data, aes(x = Year, y = Food.secure.percent)) +
geom_line(color = "blue") +
geom_line(aes(y = Food.insecure.percent), color = "red") +
labs(title = "Trends in Food Security Over Time",
x = "Year",
y = "Percentage",
color = "Status") +
scale_color_manual(values = c("blue", "red"),
labels = c("Food Secure", "Food Insecure")) +
theme_minimal()
#Plotting food insecurity by household composition
ggplot(cleaned_data, aes(x = Subcategory, y = Food.insecure.percent)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Food Insecurity by Household Composition",
x = "Household Composition",
y = "Percentage of Food Insecure",
fill = "Status") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
#Analyzing overall food security status
summary(cleaned_data[, c("Food.secure.percent", "Food.insecure.percent", "Low.food.security.percent", "Very.low.food.security.percent")])
## Food.secure.percent Food.insecure.percent Low.food.security.percent
## Min. :57.00 Min. : 4.90 Min. : 3.200
## 1st Qu.:81.50 1st Qu.: 9.80 1st Qu.: 5.900
## Median :87.30 Median :12.70 Median : 7.800
## Mean :84.26 Mean :15.74 Mean : 9.984
## 3rd Qu.:90.20 3rd Qu.:18.50 3rd Qu.:13.200
## Max. :95.10 Max. :43.00 Max. :25.300
## Very.low.food.security.percent
## Min. : 0.800
## 1st Qu.: 3.600
## Median : 4.700
## Mean : 5.752
## 3rd Qu.: 6.500
## Max. :19.300
Plots showing the trends in food security over time and another displaying food insecurity by household composition.
#Creating a scatter plot
ggplot(state, aes(x = Food.insecurity.prevalence, y = Very.low.food.security.prevalence)) +
geom_point() +
labs(title = "Food Insecurity vs. Very Low Food Security",
x = "Food Insecurity Prevalence (%)",
y = "Very Low Food Security Prevalence (%)") +
theme_minimal()
In this scatter plot we visualize the relationship between food insecurity and very low food security prevalence across states.
#Checking the dplyr package
if (!requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#Calculating average food insecurity prevalence by region
average_food_insecurity <- state %>%
group_by(Region) %>%
summarize(Average_Food_Insecurity_Prevalence = mean(Food.insecurity.prevalence, na.rm = TRUE))
#Visualization the disparities
ggplot(average_food_insecurity, aes(x = Region, y = Average_Food_Insecurity_Prevalence)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Average Food Insecurity Prevalence by Region",
x = "Region",
y = "Average Food Insecurity Prevalence (%)") +
theme_minimal()
#Extracting starting year from the Year column
state$Start_Year <- as.numeric(substring(state$Year, 1, 4))
#Calculating average very low food security prevalence by region and starting year
average_very_low_food_security_by_start_year <- state %>%
group_by(Region, Start_Year) %>%
summarize(Average_Very_Low_Food_Security_Prevalence = mean(Very.low.food.security.prevalence, na.rm = TRUE))
## `summarise()` has grouped output by 'Region'. You can override using the
## `.groups` argument.
#Visualization the trend by region
ggplot(average_very_low_food_security_by_start_year, aes(x = Start_Year, y = Average_Very_Low_Food_Security_Prevalence, color = Region)) +
geom_line() +
labs(title = "Trend of Very Low Food Security Prevalence by Region",
x = "Year",
y = "Average Very Low Food Security Prevalence (%)",
color = "Region") +
theme_minimal() +
theme(legend.position = "top")
The decrease in very low food security prevalence from 2015 to 2020 may be attributed to overall improvements in economic conditions during that period. Factors such as job growth, wage increases, and government assistance programs could have contributed to reduced food insecurity levels.
Economic Research Service U.S. DEPARTMENT OF AGRICULTURE https://www.ers.usda.gov/topics/food-nutrition-assistance/food-security-in-the-u-s/interactive-charts-and-highlights/