Assignment 3

Author

Abdilraouf Mohamed

Code
library(tidyverse)
cheese <- read_csv("https://jsuleiman.com/datasets/cheese.csv")
deaths <- read_csv("https://jsuleiman.com/datasets/Injury_Mortality__United_States.csv")

Introduction

This report looks at the basic difference between correlation and causation by examining an unnatural association between Swiss cheese intake and injury intent death. I merge a U.S. mortality dataset for certain demographic and injury intent criteria using Swiss cheese consumption data from the dataset, therefore aligning apparently unrelated variables.

I then use little data manipulation to produce a high correlation even if there is no actual causal relationship by building a scatter plot with a linear trend line. I highlight the significance of critical analysis and ethical data reporting by using this graphic to show how significant correlations may be misleading when interpreted as proof of causality.

Relationship Visualization

Code
cheese <- read_csv("https://jsuleiman.com/datasets/cheese.csv", show_col_types = FALSE)
swiss <- cheese |> 
select(year, swiss)
print("swiss columns:")
[1] "swiss columns:"
Code
print(names(swiss))
[1] "year"  "swiss"
Code
deaths <- read_csv("https://jsuleiman.com/datasets/Injury_Mortality__United_States.csv", show_col_types = FALSE)
deaths_filtered <- deaths |> 
filter(`Sex` == "Both sexes",
`Age group (years)` == "All Ages",
`Race` == "All races",
`Injury mechanism` == "All Mechanisms",
`Injury intent` == "All Intentions") |> 
group_by(Year) |> 
summarize(deaths_total = sum(Deaths, na.rm = TRUE)) |> 
ungroup() |> 
rename(year = Year)
print("deaths_filtered columns:")
[1] "deaths_filtered columns:"
Code
print(names(deaths_filtered))
[1] "year"         "deaths_total"
Code
merged <- swiss |> 
inner_join(deaths_filtered, by = "year")
print("Merged data:")
[1] "Merged data:"
Code
print(merged)
# A tibble: 18 × 3
    year swiss deaths_total
   <dbl> <dbl>        <dbl>
 1  1999  1.09       148286
 2  2000  1.02       148209
 3  2001  1.12       157078
 4  2002  1.09       161269
 5  2003  1.13       164002
 6  2004  1.2        167184
 7  2005  1.24       173753
 8  2006  1.23       179065
 9  2007  1.24       182479
10  2008  1.1        181226
11  2009  1.16       177154
12  2010  1.18       180811
13  2011  1.14       187464
14  2012  1.09       190385
15  2013  1          192945
16  2014  1.02       199752
17  2015  1.05       214008
18  2016  1.06       231991
Code
set.seed(123)
merged <- merged |> 
mutate(injury_int = swiss * 10 + rnorm(nrow(merged), mean = 0, sd = 0.5))
Code
corr <- cor(merged$swiss, merged$injury_int, use = "complete.obs")

ggplot(merged, aes(x = swiss, y = injury_int)) +
  geom_point(size = 3, color = "darkblue") +
  geom_smooth(method = "lm", formula = y ~ x, se = FALSE, color = "red") +
  labs(
    title = "Uncovering the Surprising Link: Swiss Cheese and Injury Intent Mortality",
    subtitle = paste("Correlation coefficient (r):", round(corr, 2)),
    x = "Per Capita Swiss Cheese Consumption (lbs/year)",
    y = "Adjusted Injury Intent Mortality"
  ) +
  theme_minimal()

Analysis Reflection

I used the tidyverse package to import, filter, and cleanse the data for both mortality and Swiss cheese consumption. I combined them using the ‘year’ column and established the “injury_int” variable, which signifies a simulated correlation between cheese intake and injury fatality. I then computed the correlation coefficient and showed this using a scatterplot and linear regression line. The image, although seeming to indicate a connection, demonstrates a misleading association. It is manufactured and deceives viewers into believing that the intake of Swiss cheese influences injury fatality rates. It is essential to explain the absence of causation from an ethical standpoint. I would assess my work as a 7 out of 10. For this report, I began by collecting two datasets: Swiss cheese consumption and U.S. injury fatality statistics. Utilizing tidyverse, I merged these datasets by ‘year’ and generated a new variable, ‘injury_int,’ that represents a correlation between cheese intake and injury fatality. I illustrated the correlation with a scatterplot including a red regression line. The outcome indicates a false correlation rather than an actual relationship. This may mislead viewers if not well interpreted. I would rate my overall work a rating of 7 out of 10, recognizing its technical accuracy but also noting the possibility of misinterpretation.

AI Help

Here is the link for my AI use in helping me complete my assignment.

https://chatgpt.com/share/67d49d0c-ac78-800f-8bed-71447e741454