Aline Menezes Assignment Using DSLabs Datasets.

Introduction

In order to do this assignment we had to choose a dataset from DSlab. My first choose was the Trump_tweets. However, it was a lot going on the dataset, even more than I could imagine.

So, the dataset Divorce_margarine got my attention. I started to ponder why divorced and margarine were put together. This dataset have 10 observations and 3 variables.

The question that I have in mind is what is relationship between margarine and divorce in Maine?

Loading the Packages

This first chunk I loaded the package Tidyverse which have the ggplot that I will need to create my plot. I also used ggrepel in order to create my label at sp4 plot.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(ggrepel)

Installed DsLabs

Second chunk I named “divorce”, I installed the DsLabs, and I read the Dataset Divorce_Margarine

divorce <- dslabs::divorce_margarine

First Plot

This chunk I named Plot1. In this plot I called from my data “divorce” the variables: Margarine Consumption per Capita, Divorce rate Maine and The year using ggplot and goem point in order to create my plot.

plot1 <- divorce %>%
  ggplot(aes(x = 'margarine_consumption_per_capita', y = 'divorce_rate_maine', color = 'year')) +
  geom_point() +
  scale_color_brewer(palette = "Set1") +
  theme_classic() +
  ggtitle(" What Magarine Have to Do With Divorce Rate in Maine?")
plot1

As a result, I got a single dot, which was not very helpful.

Second Plot

I want to create a graphic with all the information that I have in the dataset to have an idea of what I want to analyze.

 ggplot(divorce, aes(x = margarine_consumption_per_capita, y = divorce_rate_maine, color = year)) +
  geom_point() 

  ggtitle(" What Magarine Have to Do With Divorce Rate in Maine?")
## $title
## [1] " What Magarine Have to Do With Divorce Rate in Maine?"
## 
## attr(,"class")
## [1] "labels"

Using the base form in how to create a plot with ggplot I was able to visualize my data information.

Filtering Dataset First Try

The chunk “divorce0” I try to filter the divorce margarine dataset by year.

divorce0 <- dslabs::divorce_margarine %>%
  filter('year' %in% c("2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009"))

The result was zero observation which means somethings went wrong or is not what I need it.

Filtering Dataset Second Try

This time I named my chunk “divorce2” and I filter the consumption per capita of margarine that were above five per capita in the dataset.

divorce2 <- divorce %>%
  filter(margarine_consumption_per_capita > 5) 

I got five observations which was a good to start to analyze what is relationship between margarine and divorce in Maine.

Third Plot

Plot3 I used my dataset divorce2, margarine consumption per capita, divorce rate, and the year as my variables. From plot this plot I was able to create a more informative graph. I used the Xlab and Ylab to helped me to display my X and Y in a more clean and organized way.

plot3 <- divorce2 %>%
  ggplot(aes(x = margarine_consumption_per_capita, y = divorce_rate_maine, label = 'year')) +
  geom_point(aes(color= year)) +
  xlab("Margarine Consumption Per Capita") +
  ylab("Divorce Rate Maine") +
  theme_classic() +
  ggtitle("Does Margarine Impacts Divorce Rates in Maine?")
plot3

The results in this time I have a more complete and informative graphic. I like the transition from light blue to dark blue for each year, but I will try to added different colors to better differentiate the years. Here I have the higher consumption of margarine above five per capita and the rate for divorce.

Forth Plot SP4

This plot sp4 I played around moving the data to different positions. I used ggplot and I kept the same variables as the previews graphics. I used the geom point, and I added the size five in order to the dots to be more visible. I changed the color from blue to the color rainbow so each year can have one very different color, also the density I sued number seven not to be very bright neither to much pale. I added the geom label to have a box with the year on each dot and padding to connect the boxes with the dots.

sp4 <- ggplot(divorce2,aes(x= margarine_consumption_per_capita, y= divorce_rate_maine, color= year)) +
  geom_point(size = 5) +
  scale_color_gradientn(colours = rainbow(7)) +
  theme_classic() +
  xlab("Margarine Consumption Per Capita") +
  ylab("Divorce Rate Maine") +
  geom_label_repel(aes(label = year),
                   box.padding = 0.35,
                   point.padding = 0.5,
                   segment.color = 'grey50') +
  ggtitle("Consumption of Magarine May
           Increase the Risk of Get Divorce in Maine")
sp4

Conclusion

The results of plot sp4 is that the years that the consumption of margarine per capita were higher than five were years 2000, 2001, 2002, 2003, and 2004. The highest year in divorce and in the consumption of margarine was the 2000 which appears in red. In 2000 the consumption of margarine was above 8. 00 per capita and the rate of divorce was close to five. In contrast the year 2004 was the lowest year in consumption of margarine and in divorce. My analyzes for those results are that people may use the excuse to buy margarine in other to cheat on others. Another analyzes would be that people who had divorce may tend to eat more. So, they may end up buy more food and they may will consume more margarine.