Project 2

library(RCurl)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::complete() masks RCurl::complete()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ dplyr::lag()      masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

For my first dataset I chose a file that contains contains participation and cost data for SNAP. The data is furthered divided by annual, state, and monthly levels categorized by persons participating, households participating, benefits provided, average monthly benefits per person and average monthly benefits per household. I am going to see if there is a relationship between the average amount per person and the total amount of participants. I found this information on data.world as an excel file that I downloaded and saved as a csv file to my GitHub repository.

This information is meaningful to me as my wife is a dietitian and I often hear about the major impact finances has on people’s nutrition decisions.

snap <- read.csv("https://raw.githubusercontent.com/Shayaeng/Data607/main/Project2/SNAPsummary.csv")

First I will name these columns with the proper names and get rid of the extra rows

#snap <- snap %>% mutate_all(~na_if(., ""))

colnames(snap) <- snap[2, ]


new_column_names <- snap[3, 2:4]

new_column_names <- gsub("-", "", new_column_names)
new_column_names <- paste0("(", str_trim(new_column_names), ")")

colnames(snap)[2:4] <- paste0(colnames(snap)[2:4], new_column_names)

snap <- snap |>
  slice(-(1:3))

snap$`Average Participation(Thousands)` <- as.numeric(gsub(",", "", snap$`Average Participation(Thousands)`))
snap$`Average Benefit Per Person(Dollars)` <- as.numeric(gsub(",", "", snap$`Average Benefit Per Person(Dollars)`))

ggplot(data = snap, mapping = aes(x = `Average Participation(Thousands)`, y = `Average Benefit Per Person(Dollars)`)) +
  geom_point() +
  scale_x_continuous(breaks = seq(0, 50000, by = 5000)) + 
  scale_y_continuous(breaks = seq(0, 150, by = 20)) + 
  labs(title = "Scatter Plot with Specified Axis Intervals")

## Warning: Removed 7 rows containing missing values (`geom_point()`).

Project 2

Shaya Engelman

2023-10-10