Assignment 3

Author

Mohamed Ali

Code

library(tidyverse)
cheese <- read_csv("https://jsuleiman.com/datasets/cheese.csv")
deaths <- read_csv("https://jsuleiman.com/datasets/Injury_Mortality__United_States.csv")

Introduction

Two things aren’t always caused by one another, even if they seem to be related. Looking to help show how statistics can be misused, this project looks at a weird correlation between eating Swiss cheese and injury deaths. The data might show that consuming more Swiss cheese causes more injuries, but that is wrong. This project is intended to demonstrate how data can make things appear connected when they aren’t and why it’s important to take caution while interpreting graphs.

Relationship Visualization

Code

cheese_data <- read_csv("https://jsuleiman.com/datasets/cheese.csv", show_col_types = FALSE)

swiss_cheese <- cheese_data |> 
  select(year, swiss)

print("Columns in swiss_cheese:")

[1] "Columns in swiss_cheese:"

Code

print(names(swiss_cheese))

[1] "year"  "swiss"

Code

mortality_data <- read_csv("https://jsuleiman.com/datasets/Injury_Mortality__United_States.csv", show_col_types = FALSE)

filtered_deaths <- mortality_data |> 
  filter(`Sex` == "Both sexes",
         `Age group (years)` == "All Ages",
         `Race` == "All races",
         `Injury mechanism` == "All Mechanisms",
         `Injury intent` == "All Intentions") |> 
  group_by(Year) |> 
  summarize(total_deaths = sum(Deaths, na.rm = TRUE)) |> 
  ungroup() |> 
  rename(year = Year)

print("Columns in filtered_deaths:")

[1] "Columns in filtered_deaths:"

Code

print(names(filtered_deaths))

[1] "year"         "total_deaths"

Code

combined_data <- swiss_cheese |> 
  inner_join(filtered_deaths, by = "year")

print("Merged dataset preview:")

[1] "Merged dataset preview:"

Code

print(combined_data)

# A tibble: 18 × 3
    year swiss total_deaths
   <dbl> <dbl>        <dbl>
 1  1999  1.09       148286
 2  2000  1.02       148209
 3  2001  1.12       157078
 4  2002  1.09       161269
 5  2003  1.13       164002
 6  2004  1.2        167184
 7  2005  1.24       173753
 8  2006  1.23       179065
 9  2007  1.24       182479
10  2008  1.1        181226
11  2009  1.16       177154
12  2010  1.18       180811
13  2011  1.14       187464
14  2012  1.09       190385
15  2013  1          192945
16  2014  1.02       199752
17  2015  1.05       214008
18  2016  1.06       231991

Code

set.seed(321)
combined_data <- combined_data |> 
  mutate(adjusted_mortality = swiss * 10 + rnorm(nrow(combined_data), mean = 0, sd = 0.5))

Code

correlation_value <- cor(combined_data$swiss, combined_data$adjusted_mortality, use = "complete.obs")

ggplot(combined_data, aes(x = swiss, y = adjusted_mortality)) +
  geom_point(size = 3, color = "darkgreen") +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "purple") +
  labs(
    title = "Exploring the Curious Correlation: Swiss Cheese vs. Injury Mortality",
    subtitle = paste("Correlation coefficient (r):", round(correlation_value, 2)),
    x = "Swiss Cheese Consumption (lbs per capita)",
    y = "Modified Injury Intent Mortality Rate"
  ) +
  theme_minimal()

Analysis Reflection

The data gives an idea that eating Swiss cheese is linked to injury deaths, but this is really a coincidence. The two numbers rise and fall together when the correlation is greater than 0.8, but this doesn’t prove that they’re connected.

I created this graph by comparing the consumption of Swiss cheese with injury deaths throughout time. In order to strengthen the connection, I added random noise and multiplied the cheese numbers by 10, which aligned the two lines even more. In order to give the relationship a realistic appearance, I also included a trend line.

Although because of the graph’s convincing design, there is no real explanation for why cheese could be harmful. In this case, coincidence doesn’t mean cause and effect. Without knowing the whole scenario, someone would assume that eating cheese is harmful when facts are that the numbers match by accident.

Graphs like this one have the potential to mislead people into believing false data. People might truly believe that cheese is connected to injuries if this were presented without any explanation. These deceptive graphs have been used in studies, journalism, and advertisements to convince people to believe fake information. That’s why it’s important for data analysts to be truthful and make sure their graphs don’t intentionally mislead.

AI Help

AI helped me in completing my task. First, I wasn’t sure how to make the link between eating Swiss cheese and getting hurt seem more convincing. AI showed me that a calculated field that uses random noise and multiplication produces a higher correlation. After I made the change, the correlation value went above 0.8, meeting the assignment’s requirements.

I also learned that even when a relationship isn’t real, small changes to the data can make a graphs seem more credible. By giving more understandable explanations of how misleading graphs might deceive readers, AI also assisted me in making my writing simpler. This made it simpler to understand what I was saying. In future assignments, I’ll be careful while looking at connections because I understand how small changes in data can lead to in misinformation.