1). Look at lyme disease reporting rates in the U.S to see if we can find any geographical, time, or other patterns.
2). Figure out lyme disease patterns to discover possible prevention methods and where to apply time.
3). Find out which areas are most affected by lyme disease.
Our data was taken from data.cdc.gov
It can be found here(https://data.cdc.gov/NNDSS/NNDSS-Table-II-Lyme-disease-to-Meningococcal/y6uv-t34t)
Variables Used:
1). Reporting Area: area where case was reported
2). MMWR.Week : year of week reported
3). MMWR.Year : year reported
4). Lyme.disease..Current.week
5). Lyme.disease..Current.week.flag
6).Lyme.disease..Cum.2014
7).Lyme.disease..Cum.2014.flag
8).Lyme.disease..Cum.2013
9).Lyme.disease..Cum.2013.flag
# setwd to GitHub repository
setwd("/Users/alex/Documents/GitHub/Data110")
# check to see you're in the right directory
getwd()
## [1] "/Users/alex/Documents/GitHub/Data110"
# readig in our packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble 2.1.1 ✔ purrr 0.3.2
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(tibble)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(Rmisc)
## Loading required package: lattice
## Loading required package: plyr
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following object is masked from 'package:purrr':
##
## compact
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
library(ggpubr)
## Loading required package: magrittr
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
##
## Attaching package: 'ggpubr'
## The following object is masked from 'package:plyr':
##
## mutate
library(ggthemes)
library(plotly)
##
## Attaching package: 'plotly'
## The following objects are masked from 'package:plyr':
##
## arrange, mutate, rename, summarise
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
#reading in the data
data <- read.csv("LymeDisease.csv")
Our data set also has information on Malaria, and on Meningococcal disease. It would be cool to see if reporting of lyme diease are related to reporting of Malaria and Meningococcal, however, for our project we will focus on just lyme diease and rid of the rest of the data we are not focusing on.
#subset our data for only lyme disease data
lyme_disease <- data[c(1:3551), c(1:13)]
lyme_disease <- tbl_df(lyme_disease)
lyme_disease
## # A tibble: 3,551 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 UNITED STATES 2014 1 10 ""
## 2 NEW ENGLAND 2014 1 NA -
## 3 MID. ATLANTIC 2014 1 7 ""
## 4 E.N. CENTRAL 2014 1 1 ""
## 5 W.N. CENTRAL 2014 1 NA -
## 6 S. ATLANTIC 2014 1 2 ""
## 7 E.S. CENTRAL 2014 1 NA -
## 8 W.S. CENTRAL 2014 1 NA -
## 9 MOUNTAIN 2014 1 NA -
## 10 PACIFIC 2014 1 NA -
## # … with 3,541 more rows, and 8 more variables:
## # Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
lyme_disease_region <- subset(lyme_disease, Reporting.Area == "UNITED STATES")
lyme_disease_region
## # A tibble: 53 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 UNITED STATES 2014 1 10 ""
## 2 UNITED STATES 2014 2 16 ""
## 3 UNITED STATES 2014 3 21 ""
## 4 UNITED STATES 2014 4 16 ""
## 5 UNITED STATES 2014 5 31 ""
## 6 UNITED STATES 2014 6 32 ""
## 7 UNITED STATES 2014 7 27 ""
## 8 UNITED STATES 2014 8 24 ""
## 9 UNITED STATES 2014 9 44 ""
## 10 UNITED STATES 2014 10 45 ""
## # … with 43 more rows, and 8 more variables:
## # Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
test <- ggplot(data = lyme_disease, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, group = Reporting.Area, na.rm = TRUE)) +
geom_line(colour = "red")+
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in the U.S 2014")
test2 <- ggplot(data = lyme_disease, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, group = Reporting.Area, na.rm = TRUE)) +
geom_smooth(colour = "red")+
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in the U.S 2014")
multiplot(test, test2, cols = 1)
## Warning: Removed 1999 rows containing missing values (geom_path).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2672 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.845
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 24.155
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 51.194
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 17.845
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 24.155
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 51.194
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 13.87
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 16.13
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 102.62
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 13.87
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 16.13
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 102.62
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 2.85
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 3.15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 737.12
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 2.85
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 3.15
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 737.12
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in sqrt(sum.squares/one.delta): NaNs produced
## Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 18
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in sqrt(sum.squares/one.delta): NaNs produced
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 18
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 15
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
test
## Warning: Removed 1999 rows containing missing values (geom_path).
test2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 2672 rows containing non-finite values (stat_smooth).
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 17.845
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 24.155
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 51.194
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 17.845
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 24.155
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 51.194
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 13.87
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 16.13
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 102.62
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 13.87
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 16.13
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 102.62
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 2.85
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 3.15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 737.12
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : span too small.
## fewer data values than degrees of freedom.
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 2.85
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 3.15
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : There are other
## near singularities as well. 737.12
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in sqrt(sum.squares/one.delta): NaNs produced
## Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 18
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 15
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : Chernobyl! trL>n 6
## Warning in sqrt(sum.squares/one.delta): NaNs produced
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : pseudoinverse used
## at 18
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : neighborhood radius
## 15
## Warning in predLoess(object$y, object$x, newx = if
## (is.null(newdata)) object$x else if (is.data.frame(newdata))
## as.matrix(model.frame(delete.response(terms(object)), : reciprocal
## condition number 0
## Warning in stats::qt(level/2 + 0.5, pred$df): NaNs produced
USA <- ggplot(data = lyme_disease_region, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE)) +
geom_line() +
geom_point() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in the U.S 2014")
USA2 <- ggplot(data = lyme_disease_region, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE)) +
geom_point() +
geom_smooth() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in the U.S 2014")
USA
USA2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
multiplot(USA, USA2, cols = 2)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
USA USA2 multiplot(USA, USA2, cols = 1)``` Now that we have a much smaller and more managable dataset of only lyme disease data, we can continue.
I also want to compare lyme disease reporting my region. So we are doing to subset out data based on States in the U.S. There is information in the dataset that summarizes the data by regions. There is a lot we can do with this data, but for this project we will focus primarily on Maryland vs Virignia Lyme disease reporting.
# VA Subet
VA_lyme_disease <- subset(lyme_disease, Reporting.Area == "VIRGINIA")
VA_lyme_disease
## # A tibble: 53 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 VIRGINIA 2014 1 NA -
## 2 VIRGINIA 2014 2 NA -
## 3 VIRGINIA 2014 3 NA -
## 4 VIRGINIA 2014 4 1 ""
## 5 VIRGINIA 2014 5 1 ""
## 6 VIRGINIA 2014 6 6 ""
## 7 VIRGINIA 2014 7 1 ""
## 8 VIRGINIA 2014 8 1 ""
## 9 VIRGINIA 2014 9 2 ""
## 10 VIRGINIA 2014 10 1 ""
## # … with 43 more rows, and 8 more variables:
## # Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
# MD subset
MD_lyme_disease <- subset(lyme_disease, Reporting.Area == "MARYLAND")
MD_lyme_disease
## # A tibble: 53 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 MARYLAND 2014 1 1 ""
## 2 MARYLAND 2014 2 NA -
## 3 MARYLAND 2014 3 NA -
## 4 MARYLAND 2014 4 1 ""
## 5 MARYLAND 2014 5 NA -
## 6 MARYLAND 2014 6 2 ""
## 7 MARYLAND 2014 7 4 ""
## 8 MARYLAND 2014 8 NA -
## 9 MARYLAND 2014 9 4 ""
## 10 MARYLAND 2014 10 1 ""
## # … with 43 more rows, and 8 more variables:
## # Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
MD <- ggplot(data = MD_lyme_disease, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE)) +
geom_line() +
geom_point() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in MD 2014")
MD2 <- ggplot(data = MD_lyme_disease, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE)) +
geom_point() +
geom_smooth() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in MD 2014")
MD3 <- plot_ly(data = MD_lyme_disease, x = ~MMWR.Week, y = ~Lyme.disease..Current.week, type = 'scatter', mode = 'lines')
MD
## Warning: Removed 6 rows containing missing values (geom_point).
MD2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (stat_smooth).
## Warning: Removed 6 rows containing missing values (geom_point).
MD3
multiplot(MD, MD2, cols = 1)
## Warning: Removed 6 rows containing missing values (geom_point).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (stat_smooth).
## Warning: Removed 6 rows containing missing values (geom_point).
These line plots are a good way to see how cases of lyme disease reporting increases during certain months. Specifically we can see a significant spike between months 20 and 40, we a decrease betweek weeks 40 to 53. These weeks 20-40 fall between June and November, generally when the weather in MD is nicer and more people are active outside, followed by a decrease from November - May, the colder months where the majority of the population is not as active outside.
VA <- ggplot(data = VA_lyme_disease, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE)) +
geom_line() +
geom_point() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in VA 2014")
VA2 <- ggplot(data = VA_lyme_disease, mapping = aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE)) +
geom_point() +
geom_smooth() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in VA 2014")
VA3 <- plot_ly(data = VA_lyme_disease, x = ~MMWR.Week, y = ~Lyme.disease..Current.week, type = 'scatter', mode = 'lines')
VA3
VA
## Warning: Removed 3 rows containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_point).
VA2
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (stat_smooth).
## Warning: Removed 6 rows containing missing values (geom_point).
multiplot(VA, VA2, cols = 1)
## Warning: Removed 3 rows containing missing values (geom_path).
## Warning: Removed 6 rows containing missing values (geom_point).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 6 rows containing non-finite values (stat_smooth).
## Warning: Removed 6 rows containing missing values (geom_point).
These line plots are a good way to see how cases of lyme disease reporting increases during certain months. We see the same spike in weeks 20 and 40, as we did in Maryland, and a decrease between weeks 40 to 53. Virginia is geographically similar to Maryland, so the relationship makes sense.
# U.S Subset
USA_lyme_disease <- subset(lyme_disease, MMWR.Week == 53)
USA_lyme_disease
## # A tibble: 67 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 UNITED STATES 2014 53 47 ""
## 2 NEW ENGLAND 2014 53 2 ""
## 3 MID. ATLANTIC 2014 53 24 ""
## 4 E.N. CENTRAL 2014 53 1 ""
## 5 W.N. CENTRAL 2014 53 NA -
## 6 S. ATLANTIC 2014 53 18 ""
## 7 DIST. OF COL. 2014 53 NA -
## 8 E.S. CENTRAL 2014 53 1 ""
## 9 W.S. CENTRAL 2014 53 NA -
## 10 MOUNTAIN 2014 53 1 ""
## # … with 57 more rows, and 8 more variables:
## # Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
# end of year USA per reion subset
USA_region_lyme_disease <- subset(USA_lyme_disease, Reporting.Area == "NEW ENGLAND" | Reporting.Area == "MID. ATLANTIC" | Reporting.Area == "E.N. CENTRAL" | Reporting.Area == "W.N. CENTRAL" | Reporting.Area == "S. ATLANTIC" | Reporting.Area == "E.S. CENTRAL" | Reporting.Area == "W.S. CENTRAL" | Reporting.Area == "MOUNTAIN" | Reporting.Area == "PACIFIC")
USA_region_lyme_disease
## # A tibble: 9 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 NEW ENGLAND 2014 53 2 ""
## 2 MID. ATLANTIC 2014 53 24 ""
## 3 E.N. CENTRAL 2014 53 1 ""
## 4 W.N. CENTRAL 2014 53 NA -
## 5 S. ATLANTIC 2014 53 18 ""
## 6 E.S. CENTRAL 2014 53 1 ""
## 7 W.S. CENTRAL 2014 53 NA -
## 8 MOUNTAIN 2014 53 1 ""
## 9 PACIFIC 2014 53 NA -
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
# end of year per region condensed
NE_lyme_disease <- subset(USA_region_lyme_disease, Reporting.Area == "NEW ENGLAND" | Reporting.Area == "MID. ATLANTIC")
S_lyme_disease <- subset(USA_region_lyme_disease, Reporting.Area == "W.S. CENTRAL" | Reporting.Area == "E.S. CENTRAL" | Reporting.Area == "S. ATLANTIC")
MW_lyme_disease <- subset(USA_region_lyme_disease, Reporting.Area == "E.N. CENTRAL" | Reporting.Area == "W.N. CENTRAL")
W_lyme_disease <- subset(USA_region_lyme_disease, Reporting.Area == "PACIFIC" | Reporting.Area == "MOUNTAIN")
NE_lyme_disease
## # A tibble: 2 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 NEW ENGLAND 2014 53 2 ""
## 2 MID. ATLANTIC 2014 53 24 ""
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
S_lyme_disease
## # A tibble: 3 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 S. ATLANTIC 2014 53 18 ""
## 2 E.S. CENTRAL 2014 53 1 ""
## 3 W.S. CENTRAL 2014 53 NA -
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
MW_lyme_disease
## # A tibble: 2 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 E.N. CENTRAL 2014 53 1 ""
## 2 W.N. CENTRAL 2014 53 NA -
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
W_lyme_disease
## # A tibble: 2 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 MOUNTAIN 2014 53 1 ""
## 2 PACIFIC 2014 53 NA -
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
We can see the most prevelant areas of lyme disease are the Mid Atlantic and New England Areas. These regions are located in the Northeastern part of the United States, which include states like Maryland and Virgina.
This is the NE part of the United States, that consists of 14states. New York, New Jersey, Pennsylvania, Delaware, Maryland, Washington D.C, Viriginia, West Virginia, Connecticut, Maine, Massachusetts, New Hampsire, Rhode Island and Vermont.
lyme_disease_MA <- filter(lyme_disease, Reporting.Area == "MID. ATLANTIC")
head(lyme_disease_MA)
## # A tibble: 6 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 MID. ATLANTIC 2014 1 7 ""
## 2 MID. ATLANTIC 2014 2 14 ""
## 3 MID. ATLANTIC 2014 3 17 ""
## 4 MID. ATLANTIC 2014 4 11 ""
## 5 MID. ATLANTIC 2014 5 22 ""
## 6 MID. ATLANTIC 2014 6 21 ""
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
lyme_disease_NE <- filter(lyme_disease, Reporting.Area == "NEW ENGLAND")
head(lyme_disease_NE)
## # A tibble: 6 x 13
## Reporting.Area MMWR.Year MMWR.Week Lyme.disease..C… Lyme.disease..C…
## <fct> <int> <int> <int> <fct>
## 1 NEW ENGLAND 2014 1 NA -
## 2 NEW ENGLAND 2014 2 NA -
## 3 NEW ENGLAND 2014 3 NA -
## 4 NEW ENGLAND 2014 4 NA -
## 5 NEW ENGLAND 2014 5 NA -
## 6 NEW ENGLAND 2014 6 NA -
## # … with 8 more variables: Lyme.disease..Previous.52.weeks.Med <int>,
## # Lyme.disease..Previous.52.weeks.Med..flag <fct>,
## # Lyme.disease..Previous.52.weeks.Max <int>,
## # Lyme.disease..Previous.52.weeks.Max..flag <fct>,
## # Lyme.disease..Cum.2014 <int>, Lyme.disease..Cum.2014..flag <fct>,
## # Lyme.disease..Cum.2013 <int>, Lyme.disease..Cum.2013..flag <fct>
MA <- ggplot(data = lyme_disease_MA, aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE))+
geom_point() +
geom_smooth() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in Mid. Atlantic")
NE <- ggplot(data = lyme_disease_NE, aes(x = MMWR.Week, y = Lyme.disease..Current.week, na.rm = TRUE))+
geom_point() +
geom_smooth() +
theme_economist() + scale_fill_economist() +
xlab("Week") + ylab("Number of Cases") +
ggtitle("Cases per week in New England Area")
NE
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 19 rows containing non-finite values (stat_smooth).
## Warning: Removed 19 rows containing missing values (geom_point).
MA
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
multiplot(NE, MA, cols=1)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 19 rows containing non-finite values (stat_smooth).
## Warning: Removed 19 rows containing missing values (geom_point).
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
### Analysis We can see the same spike during weeks 20 to 40 in both New England and the Mid. Atlantic. Consistent with all of our other findings.
NE <- ggplot(data = NE_lyme_disease, mapping = aes(x = Reporting.Area, y = Lyme.disease..Cum.2014)) +
geom_bar(stat="identity") +
xlab("") + ylab("Total") +
ggtitle("Northeastern U.S") +
coord_flip()+
theme_economist() + scale_fill_economist() +
theme(plot.title = element_text(hjust=0.5))
S <- ggplot(data = S_lyme_disease, mapping = aes(x = Reporting.Area, y = Lyme.disease..Cum.2014)) +
geom_bar(stat="identity") +
xlab("") + ylab("Total") +
ggtitle("Southern U.S") +
theme_economist() + scale_fill_economist() +
coord_flip()+
theme(plot.title = element_text(hjust=0.5))
MW <- ggplot(data = MW_lyme_disease, mapping = aes(x = Reporting.Area, y = Lyme.disease..Cum.2014)) +
geom_bar(stat="identity") +
xlab("") + ylab("Total") +
ggtitle("Midwest U.S") +
theme_economist() + scale_fill_economist() +
coord_flip()+
theme(plot.title = element_text(hjust=0.5))
W <- ggplot(data = W_lyme_disease, mapping = aes(x = Reporting.Area, y = Lyme.disease..Cum.2014)) +
geom_bar(stat="identity") +
xlab("") + ylab("Total") +
ggtitle("Western U.S") +
coord_flip()+
theme_economist() + scale_fill_economist() +
theme(plot.title = element_text(hjust=0.5))
multiplot(NE, S, MW, W, cols = 2)
Region <- ggplot(data = USA_region_lyme_disease, mapping = aes(x = Reporting.Area, y = Lyme.disease..Cum.2014, fill = Reporting.Area)) +
geom_bar(stat="identity") +
theme_economist() + scale_fill_economist() +
xlab("Reporting Area") + ylab("Number of Cases") +
ggtitle("Number of Total Cases in 2014") +
coord_flip()
Region2 <- plot_ly(data = USA_lyme_disease, x = ~Reporting.Area, y = ~Lyme.disease..Cum.2014, type = 'bar', text = text)
Region
Region2
## Warning: Ignoring 10 observations
My data was about lyme disease, malaria and meningococcal reporting in the United States.
I chose this topic for two reasons.
1). I personally know two great people who have been affected by lyme disease. It’s a life affecting disease that affects many Americans many year, but with the right initiatives we can mitigate any potential future damage it could casue.
2). There have been interesting recent theories that the Pentagon researched how to use tics and lyme disease as a biological weapon.
Cleanup: The data didn’t have any typos so i was lucky when it came to that aspect, however, it was a large data set and I didn’t need the all of it, should I had to use subsetting and filtering a lot to get subsets of data that I could manage and use for visuializations.
Variables Used: 1). Reporting Area: area where case was reported 2). MMWR.Week : year of week reported 3). MMWR.Year : year reported 4). Lyme.disease..Current.week : 5). Lyme.disease..Current.week.flag
6).Lyme.disease..Cum.2014 7).Lyme.disease..Cum.2014.flag 8).Lyme.disease..Cum.2013 9).Lyme.disease..Cum.2013.flag
All the data I used was numerical, except for the Reporting area which was categorical. I combined the Reporting.Area, with the other numerical data to see how lyme disease reporting differs in different regions.
My visuializations present the increase in lyme disease reporting during weeks 20-40 of the year. This is concurrent with summer time, and we could introduce education methods to try and mitigate this time.
They also show that the highest rates for lyme disease reporting is in the Northeastern area of the United States. This could be used to target specific areas and used to address prevention methods in those high risk areas.
Things I want to do which I couldn’t get to work: 1). Interactive geo heat map - Interactive heat map of the U.S where states with high reporting are highlighted red, and you can hover over states to see how many reports of lyme disease there have been.
2). Interactive line graphs with drop down menu to select which states or regions you want to look at
1). Education Some things that could be done to try and reduce the number of lyme disease cases are to incorporate an education portion of this in students health classes. Adults are usually more aware of checking for tics, however, children, who more frequently play outside and more likely to get bitten by a tic, are not aware of checking themselves for tics. The effects of getting by tics with lyme disease are greatly diminished if treated by a medical professional within 24 hours.
2). Physician outreach We could also have a physician outreach program. We could target pediatricians and ask them to inform parents about lyme disease and the dangers it poses to their children if they get bitten by a tic. We could teach them how to tic check their kids and themselves to mitigate and decrease the negative effects lyme disease could have on them or their family memebers.