Project2a

DATASET: POPULATION vs MARRIAGE

Found here: https://raw.githubusercontent.com/theoracley/Data607/master/Project2/national_marriage_divorce_rates.csv

In this dataset, we will see what’s the relationship between Marriage and Poplation rate between 2000 and 2016, as we analyse their growth. We will also see what drives this growth in these specific years.

#Load all required packages
library(DT)
library(tidyr)
library(dplyr)    
library(ggplot2) 
library(tidyverse)

Load the data first

#Reading our data from csv file
data <- read.csv("https://raw.githubusercontent.com/theoracley/Data607/master/Project2/national_marriage_divorce_rates.csv", header=FALSE, sep=",")

#using tibble, convert and check out the data
as.tibble(data)

## # A tibble: 61 x 10
##    V1             V2     V3    V4       V5    V6    V7    V8    V9    V10  
##    <fct>          <fct>  <fct> <fct>    <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
##  1 ï»¿Provisiona~ ""     ""    ""       NA    NA    NA    NA    NA    NA   
##  2 ""             ""     ""    ""       NA    NA    NA    NA    NA    NA   
##  3 Year           Marri~ Popu~ Rate pe~ NA    NA    NA    NA    NA    NA   
##  4 2016           2,245~ 323,~ 6.9      NA    NA    NA    NA    NA    NA   
##  5 2015           2,221~ 321,~ 6.9      NA    NA    NA    NA    NA    NA   
##  6 2014/1         2,140~ 308,~ 6.9      NA    NA    NA    NA    NA    NA   
##  7 2013/1         2,081~ 306,~ 6.8      NA    NA    NA    NA    NA    NA   
##  8 2012           2,131~ 313,~ 6.8      NA    NA    NA    NA    NA    NA   
##  9 2011           2,118~ 311,~ 6.8      NA    NA    NA    NA    NA    NA   
## 10 2010           2,096~ 308,~ 6.8      NA    NA    NA    NA    NA    NA   
## # ... with 51 more rows

Time for clean up

#check out the columns
colnames(data)

##  [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10"

colnames(data) <- c("years","marriage","population","population_rate","X.3","X.4","X.5","X.6","X.7","X.8")

#extract rows we want
data <- data[-c(33:61),]
data <- data[-c(21:32),]
data <- data[-c(1:3),]

#throw away the columns we don't want
data_cols <- c("years","marriage","population","population_rate")

#get a cleaned data
clean_data <- data[data_cols]

clean_data$years <- gsub("/\\d", "", clean_data$years)

#Resetting the index as usual
rownames(clean_data) <- 1:nrow(clean_data)

#finally we get it
clean_data

##    years  marriage  population population_rate
## 1   2016 2,245,404 323,127,513             6.9
## 2   2015 2,221,579 321,418,820             6.9
## 3   2014 2,140,272 308,759,713             6.9
## 4   2013 2,081,301 306,136,672             6.8
## 5   2012 2,131,000 313,914,040             6.8
## 6   2011 2,118,000 311,591,917             6.8
## 7   2010 2,096,000 308,745,538             6.8
## 8   2009 2,080,000 306,771,529             6.8
## 9   2008 2,157,000 304,093,966             7.1
## 10  2007 2,197,000 301,231,207             7.3
## 11  2006 2,193,000 294,077,247             7.5
## 12  2005 2,249,000 295,516,599             7.6
## 13  2004 2,279,000 292,805,298             7.8
## 14  2003 2,245,000 290,107,933             7.7
## 15  2002 2,290,000 287,625,193             8.0
## 16  2001 2,326,000 284,968,955             8.2
## 17  2000 2,315,000 281,421,906             8.2

Let’s plot

qplot(data=clean_data, x=years, y=population, size=I(3), color=I("#388e3c"), main="Population vs Years")

qplot(data=clean_data, x=years, y=population_rate, size=I(3), color=I("#4dd0e1"), main="Population Rate vs Years")

qplot(data=clean_data, x=years, y=marriage, size=I(3), color=I("#ff6d00"), main="Marriage vs Years")

Conclusion

Part of the life, Population is always increasing. From the plats above, we can see that the marriage was declining from 2000 to 2009 (“Marriage vs Years”), then picks up increasinly after 2009. In the same time (“Population Rate vs years”), the population rate was also declining from 2000 to 2009, and then stayed study from 2009 to 2013, then picks up increasinly since then. Therefore there is a strong positive correclation between Population rate and Marriage. In fact, this can be explained easily, as after 9/11, lot of businesses were lost and therfore lot of layoffs, families losing their jobs, which discourage couples from getting married, hence less new borns and the rate of population goes down. Add to that the housing crisis, Enron, banks fraud…etc which leads into a big recession. To bring the economy back on its feet, the government passed lot of new laws and regulations to stop the housing madness. It also added new packages to stimulate the conomy and create jobs.After that happned in 2019, the conomy started to pick up and people start having confidence in the economy. Things become normal again and couples feeling good about the economy, start thinking about making families, leading to an increase in new borns.

Project2a

Abdelmalek Hajjam

10/2/2019

DATASET: POPULATION vs MARRIAGE

In this dataset, we will see what’s the relationship between Marriage and Poplation rate between 2000 and 2016, as we analyse their growth. We will also see what drives this growth in these specific years.

Conclusion