Alzheimer’s Disease and Healthy Aging Data

Introduction

The topic that I chose is Alzheimer’s Disease and Healthy Aging Data. I found the “Alzheimer_s_Disease_and_Healthy_Aging_Data.csv” using the website data.gov. This data was maintained by the DPH (Department of Public Health) Public inquiries, and it was published by the Center for Disease Control and Prevention. There is no ReadMe file with that information. This data set have around 39 variables such as RowId is a value that uniquely identifies a row in a table. The Variables that this dataset have: Year Start, Year End, Location Abbr, Location Desc, Data source, Class, Topic, Question, Response, Data_Value_Unit, DataValueTypeID, Data_Value_Type, Data_Value, Data_Value_Alt Data_Value_Footnote_Symbol, Data_Value_Footnote Low_Confidence_Limit High_Confidence_Limit, Sample_Size, StratificationCategory Stratification,Geolocation, ClassIDTopicID Question ID,Response ID,Location ID.

The variables that I will be using are the Year End that is when the search was completed. Data Type that I will be using percentage. Stratification2 in order to divide in genres male and female. Locations that have each state that search was done. Also, Data_ Valeu that is the percentage from each state. My goal is to analyze separate the percentage of females and then males who had Alzheimer in each state from 1025 until 2021 to see which genres may had more diagnostics and which state had the most cases.

I chose this topic, because my father is getting old, and we have been noticing some signs that her mind may not being doings well, and Alzheimer is one of things that we want to get him tested.

Alzheimer’s Disease

Loading Tidyverse and Other Necessary Packages.

The tidyverse package and other packages were used to created the plots.

library(tidyverse)  
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.2.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(leaflet)  
## Warning: package 'leaflet' was built under R version 4.2.3
library(leaflet.extras)  
## Warning: package 'leaflet.extras' was built under R version 4.2.3
library(dplyr)  

library(gganimate) 
## Warning: package 'gganimate' was built under R version 4.2.3
library(gifski) 
## Warning: package 'gifski' was built under R version 4.2.3
library(plotly) 
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(ggthemes) 
## Warning: package 'ggthemes' was built under R version 4.2.3
library(stringr) 

library(DataExplorer) 

library(ggplot2) 

library(ggfortify) 

Creating a Path and Reading the CSV File.

This chunk the settled was used to created the path in order to R to read the file.Also I named “Alzheimer” in order to read from the csv file ““Alzheimer_s_Disease_and_Healthy_Aging_Data.csv” This file have 250937 observation and 39 variables. Pretty much covers most of all states from the United states.

setwd("C:/Users/aline/Downloads/Data Final Project")
alzheimer <- read_csv("Alzheimer_s_Disease_and_Healthy_Aging_Data.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 250937 Columns: 39
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (25): RowId, LocationAbbr, LocationDesc, Datasource, Class, Topic, Quest...
## dbl  (6): YearStart, YearEnd, Data_Value, Data_Value_Alt, Low_Confidence_Lim...
## lgl  (8): Response, Sample_Size, StratificationCategory3, Stratification3, R...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

File Summary

Second chunk. The summary of the file. It shows that the median of years which people were analyzed.

summary(alzheimer)
##     RowId             YearStart       YearEnd     LocationAbbr      
##  Length:250937      Min.   :2015   Min.   :2015   Length:250937     
##  Class :character   1st Qu.:2016   1st Qu.:2016   Class :character  
##  Mode  :character   Median :2018   Median :2018   Mode  :character  
##                     Mean   :2018   Mean   :2018                     
##                     3rd Qu.:2020   3rd Qu.:2020                     
##                     Max.   :2021   Max.   :2021                     
##                                                                     
##  LocationDesc        Datasource           Class              Topic          
##  Length:250937      Length:250937      Length:250937      Length:250937     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    Question         Response       Data_Value_Unit    DataValueTypeID   
##  Length:250937      Mode:logical   Length:250937      Length:250937     
##  Class :character   NA's:250937    Class :character   Class :character  
##  Mode  :character                  Mode  :character   Mode  :character  
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##  Data_Value_Type      Data_Value     Data_Value_Alt  
##  Length:250937      Min.   :  0.00   Min.   :  0.00  
##  Class :character   1st Qu.: 15.70   1st Qu.: 15.70  
##  Mode  :character   Median : 32.30   Median : 32.30  
##                     Mean   : 37.33   Mean   : 37.33  
##                     3rd Qu.: 56.00   3rd Qu.: 56.00  
##                     Max.   :100.00   Max.   :100.00  
##                     NA's   :81635    NA's   :81635   
##  Data_Value_Footnote_Symbol Data_Value_Footnote Low_Confidence_Limit
##  Length:250937              Length:250937       Min.   :-0.7        
##  Class :character           Class :character    1st Qu.:12.4        
##  Mode  :character           Mode  :character    Median :26.6        
##                                                 Mean   :32.7        
##                                                 3rd Qu.:48.4        
##                                                 Max.   :99.6        
##                                                 NA's   :81811       
##  High_Confidence_Limit Sample_Size    StratificationCategory1
##  Min.   :  1.40        Mode:logical   Length:250937          
##  1st Qu.: 19.40        NA's:250937    Class :character       
##  Median : 38.30                       Mode  :character       
##  Mean   : 42.24                                              
##  3rd Qu.: 64.00                                              
##  Max.   :100.00                                              
##  NA's   :81811                                               
##  Stratification1    StratificationCategory2 Stratification2   
##  Length:250937      Length:250937           Length:250937     
##  Class :character   Class :character        Class :character  
##  Mode  :character   Mode  :character        Mode  :character  
##                                                               
##                                                               
##                                                               
##                                                               
##  StratificationCategory3 Stratification3 Geolocation          ClassID         
##  Mode:logical            Mode:logical    Length:250937      Length:250937     
##  NA's:250937             NA's:250937     Class :character   Class :character  
##                                          Mode  :character   Mode  :character  
##                                                                               
##                                                                               
##                                                                               
##                                                                               
##    TopicID           QuestionID        ResponseID      LocationID       
##  Length:250937      Length:250937      Mode:logical   Length:250937     
##  Class :character   Class :character   NA's:250937    Class :character  
##  Mode  :character   Mode  :character                  Mode  :character  
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##  StratificationCategoryID1 StratificationID1  StratificationCategoryID2
##  Length:250937             Length:250937      Length:250937            
##  Class :character          Class :character   Class :character         
##  Mode  :character          Mode  :character   Mode  :character         
##                                                                        
##                                                                        
##                                                                        
##                                                                        
##  StratificationID2  StratificationCategoryID3 StratificationID3  Report       
##  Length:250937      Mode:logical              Mode:logical      Mode:logical  
##  Class :character   NA's:250937               NA's:250937       NA's:250937   
##  Mode  :character                                                             
##                                                                               
##                                                                               
##                                                                               
## 

Glimpse

Third chunk. Checking how many column and rows the dataset have and which ones are quantitative and qualitative.

glimpse(alzheimer)
## Rows: 250,937
## Columns: 39
## $ RowId                      <chr> "BRFSS~2015~2015~9003~Q43~TOC11~AGE~OVERALL…
## $ YearStart                  <dbl> 2015, 2021, 2021, 2021, 2021, 2021, 2021, 2…
## $ YearEnd                    <dbl> 2015, 2021, 2021, 2021, 2021, 2021, 2021, 2…
## $ LocationAbbr               <chr> "SOU", "AL", "OR", "NE", "IN", "AZ", "OH", …
## $ LocationDesc               <chr> "South", "Alabama", "Oregon", "Nebraska", "…
## $ Datasource                 <chr> "BRFSS", "BRFSS", "BRFSS", "BRFSS", "BRFSS"…
## $ Class                      <chr> "Overall Health", "Mental Health", "Mental …
## $ Topic                      <chr> "Arthritis among older adults", "Frequent m…
## $ Question                   <chr> "Percentage of older adults ever told they …
## $ Response                   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Unit            <chr> "%", "%", "%", "%", "%", "%", "%", "%", "%"…
## $ DataValueTypeID            <chr> "PRCTG", "PRCTG", "PRCTG", "PRCTG", "PRCTG"…
## $ Data_Value_Type            <chr> "Percentage", "Percentage", "Percentage", "…
## $ Data_Value                 <dbl> 36.8, 15.5, 23.5, 13.6, 25.5, 9.1, 22.2, 16…
## $ Data_Value_Alt             <dbl> 36.8, 15.5, 23.5, 13.6, 25.5, 9.1, 22.2, 16…
## $ Data_Value_Footnote_Symbol <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Data_Value_Footnote        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Low_Confidence_Limit       <dbl> 35.9, 13.4, 16.0, 12.6, 23.9, 7.3, 20.4, 14…
## $ High_Confidence_Limit      <dbl> 37.7, 17.9, 33.2, 14.6, 27.3, 11.1, 24.0, 1…
## $ Sample_Size                <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ StratificationCategory1    <chr> "Age Group", "Age Group", "Age Group", "Age…
## $ Stratification1            <chr> "50-64 years", "Overall", "Overall", "Overa…
## $ StratificationCategory2    <chr> NA, "Gender", "Race/Ethnicity", NA, "Gender…
## $ Stratification2            <chr> NA, "Female", "Hispanic", NA, "Female", "Ma…
## $ StratificationCategory3    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Stratification3            <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Geolocation                <chr> NA, "POINT (-86.63186076199969 32.840571122…
## $ ClassID                    <chr> "C01", "C05", "C05", "C05", "C05", "C05", "…
## $ TopicID                    <chr> "TOC11", "TMC01", "TMC03", "TMC03", "TMC03"…
## $ QuestionID                 <chr> "Q43", "Q03", "Q27", "Q27", "Q27", "Q27", "…
## $ ResponseID                 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LocationID                 <chr> "9003", "01", "41", "31", "18", "04", "39",…
## $ StratificationCategoryID1  <chr> "AGE", "AGE", "AGE", "AGE", "AGE", "AGE", "…
## $ StratificationID1          <chr> "5064", "AGE_OVERALL", "AGE_OVERALL", "AGE_…
## $ StratificationCategoryID2  <chr> "OVERALL", "GENDER", "RACE", "OVERALL", "GE…
## $ StratificationID2          <chr> "OVERALL", "FEMALE", "HIS", "OVERALL", "FEM…
## $ StratificationCategoryID3  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ StratificationID3          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Report                     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Loading Extra Packages.

Loading extra packages in order to create different visualizations.

library(devtools)
## Loading required package: usethis
## Warning: package 'usethis' was built under R version 4.2.3
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.2.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use

Filtering the File by Class

This Chunk was named alzheimerpercetage that reads from the dataset Alzheimer. In this chunk will be filter the “Data_Value_Type” which will be analyze the Percentage of Alzheimer in Female and Male.

alzheimerpercetage <- alzheimer %>%
filter(Data_Value_Type == "Percentage")

Filtering the File by Genre

This Chunk was named “alzheimerfemale” that reads from the dataset “alzheimerpercetage”. In this chunk will be filter the “Stratification2” which will be analyze the Percentage of Alzheimer in Female.

alzheimerfemale <- alzheimerpercetage %>%
filter(Stratification2 == "Female")

First Plot

This Chunk was named “fem_plot” that reads from the dataset “alzheimerfemale”. In this chunk puts together the Year End, Data Value, and Location Dec. The goal is to analyze the percentage of the females with Alzheimer in each state according to each year.

fem_plot <- ggplot(alzheimerfemale, aes(x = YearEnd, y = Data_Value, color = LocationDesc))+
  guides(color = FALSE, fill = FALSE) +
  geom_point() +
  geom_smooth(method='lm',formula=y~x, color = "red") +
  labs(title = "Female Alzheimer Percentage in Each Year") +
  xlab("Year") +
  ylab ("Percentage") +
  theme_minimal() 
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
fem_plot <- ggplotly(fem_plot)
## Warning: Removed 344 rows containing non-finite values (`stat_smooth()`).
fem_plot

Second Plot

This second plot have the same same goal as the first plot, but I was trying to visualize in a better way.

ggplot(alzheimerfemale, aes(x = LocationDesc, y = Data_Value, color = LocationDesc)) +
  geom_boxplot() +
  guides(color = FALSE) +
  labs(title = "Female Alzheimer Pergentage", y = "Pergentage", x = "States") +
  coord_flip() +
  theme(axis.text.y = element_text(size = 5))
## Warning: Removed 344 rows containing non-finite values (`stat_boxplot()`).

Filtering the File by Genre

This Chunk name “alzheimermale” that reads from the dataset “alzheimerpercetage”. In this chunk will be filter the “Stratification2” which will be analyze the Percentage of Alzheimer in Male.

alzheimermale <- alzheimerpercetage %>%
filter(Stratification2 == "Male")

Third Plot

This Chunk was named “m_plot” that reads from the dataset “alzheimermale”. In this chunk puts together the Year End, Data Value, and Location Dec. The goal is to analyze the percentage of the males with Alzheimer in each state according to each year.

m_plot <- ggplot(alzheimermale, aes(x = YearEnd, y = Data_Value, color = LocationDesc))+
  guides(color = FALSE, fill = FALSE) +
  geom_point() +
  geom_smooth(method='lm',formula=y~x, color = "blue") +
  labs(title = "Male Alzheimer Percentage  in Each Year") +
  xlab("Year") +
  ylab ("Percentage") +
  theme_minimal() 
m_plot <- ggplotly(m_plot)
## Warning: Removed 624 rows containing non-finite values (`stat_smooth()`).
m_plot

Fourth Plot

This third plot have the same same goal as the first plot, but I was trying to visualize in a better way.

ggplot(alzheimermale, aes(x = LocationDesc, y = Data_Value, color = LocationDesc)) +
  geom_boxplot() +
  guides(color = FALSE) +
  labs(title = "Male Alzheimer Pergentage", y = "Pergentage", x = "States") +
  coord_flip() +
  theme(axis.text.y = element_text(size = 5))
## Warning: Removed 624 rows containing non-finite values (`stat_boxplot()`).

The visualization represents the dataset filtered by percentage and by genre. My goal was to analyzing the percentage though the years of the female who had Alzheimer in each state. Also, to analyzing the percentage though the years of males who had Alzheimer. I was trying to see if one genre is more incline to develop Alzheimer than the other.

An interesting surprise that I noticed on the Female Alzheimer Percentage and Male Alzheimer Percentage visualization is that for female the state that had a higher percentage of Alzheimer was South Dakota. On the other hand, for males the state that had a higher percentage was Virginia.In the other two visualizations Male Alzheimer Percentage in Each Year and Female Alzheimer Percentage in Each Year I noticed that for the males the States of Puerto Rico was on the top for the years 2018, 2020, 2021. The same states Puerto Rico appears at the top for the females in 2016, 2018, 2019, 2020. I was not able to tell if genre plays a rule when Alzheimer is developed in someone, but see Puerto Rico appearing more the ones in each genre and multiply years makes me think if one’s life style, environment, could contribute to the Alzheimer to be develop.

What I could have been shown that I could not get to work it were maps. I need to work on processing maps in R to make sure that I am able to create them with any dataset. I tried to follow one’s note, but I got stuck.