DS HW

load required packages

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/carla/OneDrive/Desktop/Data 110/DS data")

load dataset

library("dslabs")
library(ggplot2)

load data

data("divorce_margarine")

##Identify variables

head(divorce_margarine)
  divorce_rate_maine margarine_consumption_per_capita year
1                5.0                              8.2 2000
2                4.7                              7.0 2001
3                4.6                              6.5 2002
4                4.4                              5.3 2003
5                4.3                              5.2 2004
6                4.1                              4.0 2005

Create Plot

ggplot(divorce_margarine, aes(x=margarine_consumption_per_capita, y=divorce_rate_maine))+ geom_point()+ geom_smooth(method ="lm", se = FALSE, color= "pink" ) + labs(title= "relationship between margarine consumption and divorce rate", x="maragrine consumption (pounds per capita)", y= "Divorce Rate") + theme_dark()
`geom_smooth()` using formula = 'y ~ x'

Add a third variable and color :D

ggplot(divorce_margarine, aes(x= margarine_consumption_per_capita, y= divorce_rate_maine, color= as.factor(year))) + geom_point()+geom_smooth(method="lm", se = FALSE, color = "pink")+ labs(title = "Relationship between margarine consumption and divorce rate",
           x="Maragrine consumption (pounds per capita)", y="Divorce Rate", color="Year")+ theme_dark()                                                      
`geom_smooth()` using formula = 'y ~ x'

In my data visualization I generated a scatter plot, visualizing the relationship between maragine consumption per capita, and the divorce rate in Maine. The ggplot function creates the plot using the divorce_margarine’ data set. The code “aes” contains both the x-axis and y-axis. The ’geom_point, adds points representing each data point. I decided to add a line which is pink instead of red just because I found it more appealing . I used labs to state the plot title and axis labels and lastly I changed the theme from minimal to dark because I believe a dark background just makes the colors pop more. To wrap up my scatter plot I decided to add the variable year, which was the only variable left in my data set. I included it with my aes function of my ggplot assigning it to the color aesthetic. Making each year a different color so my coding can ensure that the variable year was treated as a categorical variable.