library(tidyverse)
library(lubridate)K Johann- Assignment 7
NUMBEO Data- Cost of Living in the United States
Purpose & Proposed Question
The data set for my final project in this class pertains to recorded crime occurrences from the past few years, specifically in the United States. I chose to analyze the cost of living in the U.S. and will later compare these results to the data contained in the crime data set.
The question I will be answering is: Which factor has the strongest relationship with a city’s overall cost of living: rent, groceries, or restaurant prices? What is the correlation between them?
Load in the Packages and Data
Cost_of_Living_US <-
read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/johannk_xavier_edu/IQC057mmAhcjR6XFc9S6oBsRAZcGpQ1Wd-jQpyBcbsRzVBY?download=1")Understanding the Data
Now that we have successfully loaded in the data set, we should first make sure we understand what each variable means. This data set consists of 8 variables and 74 unique observations (in the U.S.).
Description of each variable:
- Rank: The city’s ranking based on overall cost of living. Lower rank numbers correlate to more expensive cities.
- City: The U.S. city being analyzed.
- Cost of Living Index: A measure of the overall cost of living in the city compared to New York, which is set to 100. This includes various expenses like groceries, transportation, utilities, and restaurants.
- Rent Index: Measures how expensive housing/rent prices are compared to New York. Higher values correlate to more expensive prices.
- Cost of Living Plus Rent Index: Combines both general living costs and housing costs into one single measure. This is the “total cost” measure.
- Groceries Index: Compares grocery prices to New York prices. Higher values correlate to more expensive prices.
- Restaurant Price Index: Measures the average cost of dining out and restaurant meals compared to New York.
- Local Purchasing Power Index: Measures how much residents can afford with their average salaries in that city. Higher values correlate to greater buying power after wages and prices.
Analysis & Visualizations
1. Scatterplot: Rent vs. Overall Cost of Living
ggplot(Cost_of_Living_US, aes(x = `Rent.Index`, y = `Cost.of.Living.Index`))+
geom_point(alpha = 0.5) +
labs(title = "Rent vs. Overall Cost of Living",
x = "Rent Index",
y = "Cost of Living Index")As depicted in the above visual, there is a clear positive relationship between Rent Index and Cost of Living Index. This was expected, and indicates that cities with higher rent costs also tend to have higher overall living costs due to rent contributing for a large percentage of annual living costs.
2. Scatterplot: Groceries vs. Overall Cost of Living
ggplot(Cost_of_Living_US, aes(x = `Cost.of.Living.Index`, y = `Groceries.Index`))+
geom_point(alpha = 0.5) +
labs(title = "Groceries vs. Overall Cost of Living",
x = "Groceries Index",
y = "Cost of Living Index")Similar to the results of the first visual, groceries and overall cost of living are also positively correlated. This indicates that food costs move in the same direction as broader living expenses, reinforcing the idea that necessities such as groceries contribute meaningfully to cost of living in various cities.
3. Scatterplot: Restaurant Prices vs. Overall Cost of Living
ggplot(Cost_of_Living_US, aes(x = `Cost.of.Living.Index`, y = `Restaurant.Price.Index`))+
geom_point(alpha = 0.5) +
labs(title = "Restaurant Prices Index vs. Overall Cost of Living",
x = "Restaurant Prices Index",
y = "Cost of Living Index")Restaurant prices and overall cost of living are also positively correlated.
4. Histogram: Cost of Living
This visual shows that most cities have a Cost of Living Index between 60-75. Many of the cities in the U.S. are pretty moderately priced when compared to New York. The data is distributed, however, most are centered within this range, which supports the idea that a smaller number of cities actually represent higher cost of living areas.
5. Table: Top 10 Cities based on Cost of Living
top10_cities <- Cost_of_Living_US[1:10, c("City", "Cost.of.Living.Index")]
top10_cities# A tibble: 10 × 2
City Cost.of.Living.Index
<chr> <dbl>
1 Honolulu, HI 102.
2 New York, NY 100
3 San Francisco, CA 95.2
4 Seattle, WA 88.6
5 Washington, DC 88.5
6 San Jose, CA 86.5
7 Boston, MA 84.7
8 Oakland, CA 83.2
9 Miami, FL 83.1
10 Anchorage, AK 82.7
I thought this table would be cool to look at. This allows us to easily examine the top 10 cities based on cost of living scores. It was interesting to see that Honolulu’s cost of living index was higher than that of New York (the baseline = 100). This indicates that New York is not the most expensive city in the data set.
The results show that only a small number of cities have very high cost of living values, as we can see that even Anchorage, AK (Rank 10) is already down to 82.7.
Final Conclusions
Overall, the analysis shows that, rent, groceries, and restaurant prices all have a positive relationship with the cost of living across the listed U.S. cities.
Among these factors, groceries appear to have the strongest and most consistent relationship with cost of living. Although rent and restaurant prices also show positive relationships, they vary more across cities. This suggests that their impact on overall cost of living is less consistent compared to groceries.