Homework 2 draft

Research Questions

Pull in Data from World Health Organization (WHO)

The tables will be pulled from the indicators page, specifically for the assisitve technology indicator, on the WHO website.

Pull in Prevalence of AT use Data

prevATUse <- read_csv("Final Project/Prev_Use_AT.csv")
## Rows: 1430 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (15): IndicatorCode, Indicator, ValueType, ParentLocationCode, ParentLo...
## dbl   (3): Period, FactValueNumeric, FactValueTranslationID
## lgl  (15): IsLatestYear, Dim2 type, Dim2, Dim2ValueCode, Dim3 type, Dim3, Di...
## dttm  (1): DateModified
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(prevATUse)
## # A tibble: 6 × 34
##   IndicatorCode            Indicator ValueType ParentLocationCode ParentLocation
##   <chr>                    <chr>     <chr>     <chr>              <chr>         
## 1 ASSISTIVETECH_USEPREVAL… Prevalen… numeric   EUR                Europe        
## 2 ASSISTIVETECH_USEPREVAL… Prevalen… numeric   EUR                Europe        
## 3 ASSISTIVETECH_USEPREVAL… Prevalen… numeric   EUR                Europe        
## 4 ASSISTIVETECH_USEPREVAL… Prevalen… numeric   EUR                Europe        
## 5 ASSISTIVETECH_USEPREVAL… Prevalen… numeric   EUR                Europe        
## 6 ASSISTIVETECH_USEPREVAL… Prevalen… numeric   EUR                Europe        
## # ℹ 29 more variables: `Location type` <chr>, SpatialDimValueCode <chr>,
## #   Location <chr>, `Period type` <chr>, Period <dbl>, IsLatestYear <lgl>,
## #   `Dim1 type` <chr>, Dim1 <chr>, Dim1ValueCode <chr>, `Dim2 type` <lgl>,
## #   Dim2 <lgl>, Dim2ValueCode <lgl>, `Dim3 type` <lgl>, Dim3 <lgl>,
## #   Dim3ValueCode <lgl>, DataSourceDimValueCode <lgl>, DataSource <lgl>,
## #   FactValueNumericPrefix <lgl>, FactValueNumeric <dbl>, FactValueUoM <lgl>,
## #   FactValueNumericLowPrefix <lgl>, FactValueNumericLow <lgl>, …

Pull in Prevalence of AT need Data

prevATNeed <- read_csv("Final Project/Prev_Need_AT.csv")
## Rows: 1435 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (15): IndicatorCode, Indicator, ValueType, ParentLocationCode, ParentLo...
## dbl   (3): Period, FactValueNumeric, FactValueTranslationID
## lgl  (15): IsLatestYear, Dim2 type, Dim2, Dim2ValueCode, Dim3 type, Dim3, Di...
## dttm  (1): DateModified
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(prevATNeed)
## # A tibble: 6 × 34
##   IndicatorCode           Indicator  ValueType ParentLocationCode ParentLocation
##   <chr>                   <chr>      <chr>     <chr>              <chr>         
## 1 ASSISTIVETECH_TOTALNEED Prevalenc… numeric   AFR                Africa        
## 2 ASSISTIVETECH_TOTALNEED Prevalenc… numeric   AFR                Africa        
## 3 ASSISTIVETECH_TOTALNEED Prevalenc… numeric   AFR                Africa        
## 4 ASSISTIVETECH_TOTALNEED Prevalenc… numeric   AFR                Africa        
## 5 ASSISTIVETECH_TOTALNEED Prevalenc… numeric   AFR                Africa        
## 6 ASSISTIVETECH_TOTALNEED Prevalenc… numeric   AFR                Africa        
## # ℹ 29 more variables: `Location type` <chr>, SpatialDimValueCode <chr>,
## #   Location <chr>, `Period type` <chr>, Period <dbl>, IsLatestYear <lgl>,
## #   `Dim1 type` <chr>, Dim1 <chr>, Dim1ValueCode <chr>, `Dim2 type` <lgl>,
## #   Dim2 <lgl>, Dim2ValueCode <lgl>, `Dim3 type` <lgl>, Dim3 <lgl>,
## #   Dim3ValueCode <lgl>, DataSourceDimValueCode <lgl>, DataSource <lgl>,
## #   FactValueNumericPrefix <lgl>, FactValueNumeric <dbl>, FactValueUoM <lgl>,
## #   FactValueNumericLowPrefix <lgl>, FactValueNumericLow <lgl>, …

Pull in Educational resources for AT Data

eduAT <- read_csv("Final Project/Ava_Edu_AT.csv")
## Rows: 490 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (14): IndicatorCode, Indicator, ValueType, ParentLocationCode, ParentLo...
## dbl   (2): Period, FactValueTranslationID
## lgl  (17): IsLatestYear, Dim2 type, Dim2, Dim2ValueCode, Dim3 type, Dim3, Di...
## dttm  (1): DateModified
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(eduAT)
## # A tibble: 6 × 34
##   IndicatorCode     Indicator        ValueType ParentLocationCode ParentLocation
##   <chr>             <chr>            <chr>     <chr>              <chr>         
## 1 ASSISTIVETECH_Q07 Availability of… text      EMR                Eastern Medit…
## 2 ASSISTIVETECH_Q07 Availability of… text      EMR                Eastern Medit…
## 3 ASSISTIVETECH_Q07 Availability of… text      EMR                Eastern Medit…
## 4 ASSISTIVETECH_Q07 Availability of… text      EMR                Eastern Medit…
## 5 ASSISTIVETECH_Q07 Availability of… text      EMR                Eastern Medit…
## 6 ASSISTIVETECH_Q07 Availability of… text      EMR                Eastern Medit…
## # ℹ 29 more variables: `Location type` <chr>, SpatialDimValueCode <chr>,
## #   Location <chr>, `Period type` <chr>, Period <dbl>, IsLatestYear <lgl>,
## #   `Dim1 type` <chr>, Dim1 <chr>, Dim1ValueCode <chr>, `Dim2 type` <lgl>,
## #   Dim2 <lgl>, Dim2ValueCode <lgl>, `Dim3 type` <lgl>, Dim3 <lgl>,
## #   Dim3ValueCode <lgl>, DataSourceDimValueCode <lgl>, DataSource <lgl>,
## #   FactValueNumericPrefix <lgl>, FactValueNumeric <lgl>, FactValueUoM <lgl>,
## #   FactValueNumericLowPrefix <lgl>, FactValueNumericLow <lgl>, …

Clean up the the three data tables before joining them

I checked different columns using distinct function in the console to remove any columns that did not contain any data. I also checked to see if columns have repetitive data (e.g. Dim 1 Type has the same information found in Dim1 and Dim1ValueCode).

I will remove all these columns from all the data sets with the intention to join them when we learn how to do that next week.

#remove `IndicatorCode`, `ValueType`, `ParentLocationCode`, `Location type`, `SpatialDimValueCode`, `Period type`, `IsLatestYear`, `Dim1 type`, `Dim2 type`, `Dim3 type`. `DataSource`, `FactValueNumericPrefix`, `FactValueUoM`, `FactValueNumericLowPrefix`, `FactValueNumericHigh`, `Language` from all the data sets 

prevATUseClean <- prevATUse %>% 
  select(-`IndicatorCode`, -`ValueType`, -`ParentLocationCode`, 
         -`Location type`, -`SpatialDimValueCode`, -`Period type`, 
         -`IsLatestYear`, -`Dim1 type`, -`Dim2 type`, -`Dim3 type`,
         -`DataSource`, -`FactValueNumericPrefix`, -`FactValueUoM`, 
         -`FactValueNumericLowPrefix`, -`FactValueNumericHigh`, -`Language`)

prevATNeedClean <- prevATNeed %>% 
  select(-`IndicatorCode`, -`ValueType`, -`ParentLocationCode`, 
         -`Location type`, -`SpatialDimValueCode`, -`Period type`, 
         -`IsLatestYear`, -`Dim1 type`, -`Dim2 type`, -`Dim3 type`,
         -`DataSource`, -`FactValueNumericPrefix`, -`FactValueUoM`, 
         -`FactValueNumericLowPrefix`, -`FactValueNumericHigh`, -`Language`)

eduATClean <- eduAT %>% 
  select(-`IndicatorCode`, -`ValueType`, -`ParentLocationCode`, 
         -`Location type`, -`SpatialDimValueCode`, -`Period type`, 
         -`IsLatestYear`, -`Dim1 type`, -`Dim2 type`, -`Dim3 type`,
         -`DataSource`, -`FactValueNumericPrefix`, -`FactValueUoM`, 
         -`FactValueNumericLowPrefix`, -`FactValueNumericHigh`, -`Language`)

Graph preATUseClean data

I want to understand the different assistive technologies and their prevalence globally. The data set already calculated the prevalence in Value they are used

Help at this point

How do I reorder from highest to lowest? reorder(Dim1, -Value) wasn’t working. I still have the reorder in the first plot and there is no error but it is not reordering.

I also organized with facet wrap. Any suggestions on how to make it easier to read? Do I just need to shorten the AT names?

#make `Value` numeric rather than categorical as it is the decimal of what the prevalence of each assistive technology is in each country
class(prevATNeedClean$Value)
## [1] "character"
#change Value from character to numeric
prevATNeedClean$Value <- as.numeric(gsub("[^0-9.]", "", prevATNeedClean$Value))

# Check the data type again
class(prevATNeedClean$Value)
## [1] "numeric"
#filter so it is only Europe
prevATNeedCleanEuro <- prevATNeedClean %>% 
  filter(ParentLocation == 'Europe')

#create function to filter for only vision AT
visionAT <- function(data) {
  filteredForVisionAT <- data %>%
    filter(Dim1 %in% c(
      "Braille displays (note takers)",
      "Braille writing equipment/braillers",
      "Deafblind communicators (for seeing/vision)",
      "Magnifiers, digital handheld",
      "Smart phones/tablets/PDA (for seeing/vision)",
      "White canes",
      "Screen readers",
      "Magnifiers, optical",
      "Spectacles; low-vision, short/long distance/filters etc"
    ))
  
  return(filteredForVisionAT)
}

#applying function so new data frame is filter only for vision AT
prevATNeedCleanEuroVision<- visionAT(prevATNeedCleanEuro)
head(prevATNeedCleanEuroVision)
## # A tibble: 6 × 18
##   Indicator             ParentLocation Location Period Dim1  Dim1ValueCode Dim2 
##   <chr>                 <chr>          <chr>     <dbl> <chr> <chr>         <lgl>
## 1 Prevalence of need o… Europe         Poland     2021 Brai… ASSISTIVETEC… NA   
## 2 Prevalence of need o… Europe         Poland     2021 Brai… ASSISTIVETEC… NA   
## 3 Prevalence of need o… Europe         Sweden     2021 Brai… ASSISTIVETEC… NA   
## 4 Prevalence of need o… Europe         Tajikis…   2021 Brai… ASSISTIVETEC… NA   
## 5 Prevalence of need o… Europe         Tajikis…   2021 Brai… ASSISTIVETEC… NA   
## 6 Prevalence of need o… Europe         Tajikis…   2021 Magn… ASSISTIVETEC… NA   
## # ℹ 11 more variables: Dim2ValueCode <lgl>, Dim3 <lgl>, Dim3ValueCode <lgl>,
## #   DataSourceDimValueCode <lgl>, FactValueNumeric <dbl>,
## #   FactValueNumericLow <lgl>, FactValueNumericHighPrefix <lgl>, Value <dbl>,
## #   FactValueTranslationID <dbl>, FactComments <chr>, DateModified <dttm>
#testing what the sum would be for prevalence
prevATNeedCleanEuroVision %>% 
  group_by(Dim1)%>%
  summarise(Value = sum(Value))%>%
  ungroup()
## # A tibble: 9 × 2
##   Dim1                                                    Value
##   <chr>                                                   <dbl>
## 1 Braille displays (note takers)                            1.2
## 2 Braille writing equipment/braillers                       0.5
## 3 Deafblind communicators (for seeing/vision)               0.7
## 4 Magnifiers, digital handheld                              2.4
## 5 Magnifiers, optical                                       7.4
## 6 Screen readers                                            0.8
## 7 Smart phones/tablets/PDA (for seeing/vision)              1.8
## 8 Spectacles; low-vision, short/long distance/filters etc 261. 
## 9 White canes                                               0.7
# Create the plot. 
prevATNeedCleanEuroVision %>%
  group_by(Dim1) %>%
  summarise(Value = sum(Value)) %>%
  ungroup() %>%
  ggplot(aes(x = Dim1, y = Value,reorder(Dim1, -Value))) +
  geom_bar(stat = "identity", fill = 'black') +
  geom_text(aes(label = Value), vjust = -0.5, color = "white", size = 3) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5)) +
  labs(title = "Prevalence of Vision Assistive Tech in Europe", x = "Assistive Technology") +
  scale_y_continuous(name = "Prevalence", labels = scales::percent_format(scale = 1))

#aim to organize by location
prevATNeedCleanEuroVision %>% 
  ggplot(aes(x = Dim1, y = Value)) +
  geom_bar(stat = "identity", fill='black') +
  geom_text(aes(label = Value), vjust = -0.5, color = "white", size = 3) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Prevalence of Vision Assistive Tech in Europe", x = "Assistive Technology") +
scale_y_continuous(name= "Prevalence", 
                     labels = scales::percent_format(scale = 1)) + 
  facet_wrap(vars(Location))

Only looking at Sweden

Europe does not actually have all the european countries data within it and it is hard to read the graphs using facet wrap, I will just look at one country for vision assistive tech.

#filter so it is only Sweden
prevATNeedCleanSweden <- prevATNeedClean %>% 
  filter(Location == 'Sweden')

#filter for vision AT within the sweden data frame
prevATNeedCleanSwedenVision <- visionAT(prevATNeedCleanSweden)

#testing what the sum would be for prevalence
prevATNeedCleanSwedenVision %>% 
  group_by(Dim1)%>%
  summarise(Value = sum(Value))%>%
  ungroup()
## # A tibble: 9 × 2
##   Dim1                                                    Value
##   <chr>                                                   <dbl>
## 1 Braille displays (note takers)                            0.1
## 2 Braille writing equipment/braillers                       0  
## 3 Deafblind communicators (for seeing/vision)               0.1
## 4 Magnifiers, digital handheld                              0.1
## 5 Magnifiers, optical                                       0.3
## 6 Screen readers                                            0.2
## 7 Smart phones/tablets/PDA (for seeing/vision)              0.1
## 8 Spectacles; low-vision, short/long distance/filters etc  65.1
## 9 White canes                                               0.1
# Create the plot
prevATNeedCleanSwedenVision %>% 
  group_by(Dim1)%>%
  summarise(Value = sum(Value))%>%
  ungroup() %>% 
  ggplot(aes(x = Dim1, y = Value)) +
  geom_bar(stat = "identity", fill='black') +
  geom_text(aes(label = Value), vjust = -0.5, color = "white", size = 3) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Prevalence of Vision Assistive Tech in Sweden", x = "Assistive Technology") +
 scale_y_continuous(name = "Prevalence", 
                     labels = scales::percent_format(scale = 1), limits = c(0, 100))

Repeating this but to not filter down AT type

This is messy. Not sure how to make this cleaner besides narrowing down type of AT

#filter so it is only Sweden
prevATNeedCleanSweden <- prevATNeedClean %>% 
  filter(Location == 'Sweden')

#testing what the sum would be for prevalence
prevATNeedCleanSweden %>% 
  group_by(Dim1)%>%
  summarise(Value = sum(Value))%>%
  ungroup()
## # A tibble: 55 × 2
##    Dim1                                       Value
##    <chr>                                      <dbl>
##  1 Alarm signalers with light/sound/vibration   0.1
##  2 Audio-players with DAISY capability          0  
##  3 Axillary / Elbow crutches                    1.3
##  4 Braille displays (note takers)               0.1
##  5 Braille writing equipment/braillers          0  
##  6 Canes/sticks, tripod and quadripod           1.3
##  7 Chairs for shower/bath/toilet                0  
##  8 Closed captioning displays                   0  
##  9 Club foot braces                             0.1
## 10 Communication boards/books/cards             0.1
## # ℹ 45 more rows
# Create the plot
prevATNeedCleanSweden %>% 
  group_by(Dim1)%>%
  summarise(Value = sum(Value))%>%
  ungroup() %>% 
  ggplot(aes(x = Dim1, y = Value)) +
  geom_bar(stat = "identity", fill='black') +
  geom_text(aes(label = Value), vjust = -0.5, color = "white", size = 3) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Prevalence of Assistive Tech in Europe", x = "Assistive Technology") +
 scale_y_continuous(name = "Prevalence", 
                     labels = scales::percent_format(scale = 1), limits = c(0, 100))

Joining data

Do I need to filter out any column I don’t want duplicates of? Can you help me through how I may compare use versus need in a line graph?

#first to clean us the use data and filter so it is only Sweden
prevATUseCleanSweden <- prevATUseClean %>% 
  filter(Location == 'Sweden')

#testing what the sum would be for prevalence
prevATUseCleanSweden %>%
  group_by(Dim1) %>%
  summarise(Value = sum(as.numeric(Value))) %>%
  ungroup()
## # A tibble: 55 × 2
##    Dim1                                       Value
##    <chr>                                      <dbl>
##  1 Alarm signalers with light/sound/vibration   0  
##  2 Audio-players with DAISY capability          0  
##  3 Axillary / Elbow crutches                    1.3
##  4 Braille displays (note takers)               0.1
##  5 Braille writing equipment/braillers          0  
##  6 Canes/sticks, tripod and quadripod           1.1
##  7 Chairs for shower/bath/toilet                0  
##  8 Closed captioning displays                   0  
##  9 Club foot braces                             0.1
## 10 Communication boards/books/cards             0  
## # ℹ 45 more rows
SwedenNeedVsUse <- inner_join(prevATUseCleanSweden, prevATNeedCleanSweden, by = c("Dim1"))
head(SwedenNeedVsUse)
## # A tibble: 6 × 35
##   Indicator.x  ParentLocation.x Location.x Period.x Dim1  Dim1ValueCode.x Dim2.x
##   <chr>        <chr>            <chr>         <dbl> <chr> <chr>           <lgl> 
## 1 Prevalence … Europe           Sweden         2021 Alar… ASSISTIVETECHP… NA    
## 2 Prevalence … Europe           Sweden         2021 Audi… ASSISTIVETECHP… NA    
## 3 Prevalence … Europe           Sweden         2021 Grab… ASSISTIVETECHP… NA    
## 4 Prevalence … Europe           Sweden         2021 Brai… ASSISTIVETECHP… NA    
## 5 Prevalence … Europe           Sweden         2021 Chai… ASSISTIVETECHP… NA    
## 6 Prevalence … Europe           Sweden         2021 Clos… ASSISTIVETECHP… NA    
## # ℹ 28 more variables: Dim2ValueCode.x <lgl>, Dim3.x <lgl>,
## #   Dim3ValueCode.x <lgl>, DataSourceDimValueCode.x <lgl>,
## #   FactValueNumeric.x <dbl>, FactValueNumericLow.x <lgl>,
## #   FactValueNumericHighPrefix.x <lgl>, Value.x <chr>,
## #   FactValueTranslationID.x <dbl>, FactComments.x <chr>,
## #   DateModified.x <dttm>, Indicator.y <chr>, ParentLocation.y <chr>,
## #   Location.y <chr>, Period.y <dbl>, Dim1ValueCode.y <chr>, Dim2.y <lgl>, …

Prevalence is already calculated as a percentage/decimal. I will pull in education opp to do these descriptive statistics (which really can only be mode since the data is ordinal. Although it is only 6 edu oppy so the mode doesn’t really tell us more than filtering by coverage)

#Mode for the entire dataset of places that have coverage. there is more places with education access versus not. 
eduATClean %>% 
  group_by(Location) %>% 
  filter(Value == "Yes") %>% 
  summarise(n=n()) %>%
   ungroup() %>% 
   filter(n==max(n))
## # A tibble: 15 × 2
##    Location                     n
##    <chr>                    <int>
##  1 Australia                    6
##  2 Azerbaijan                   6
##  3 Canada                       6
##  4 Chile                        6
##  5 Costa Rica                   6
##  6 Estonia                      6
##  7 Kenya                        6
##  8 New Zealand                  6
##  9 Poland                       6
## 10 Portugal                     6
## 11 Qatar                        6
## 12 Republic of Korea            6
## 13 Sweden                       6
## 14 United Arab Emirates         6
## 15 United States of America     6
#Mode for the entire dataset of places that have no coverage. there is more places with education access versus not. 
eduATClean %>% 
  group_by(Location) %>% 
  filter(Value == "No") %>% 
  summarise(n=n()) %>%
   ungroup() %>% 
   filter(n==max(n))
## # A tibble: 10 × 2
##    Location         n
##    <chr>        <int>
##  1 Benin            6
##  2 Burkina Faso     6
##  3 Burundi          6
##  4 Cabo Verde       6
##  5 Chad             6
##  6 Eswatini         6
##  7 Georgia          6
##  8 Maldives         6
##  9 San Marino       6
## 10 Tajikistan       6
#counting the times each location have 'yes' for each type of AT education
eduATCleanCoverageType <- eduATClean %>% 
  group_by(Location) %>% 
  filter(Dim1 != "Summary", Value != "Information not available" & Value == "Yes") %>% 
  count(YesCount = n()) %>% 
  ungroup()

#graphing out each location into a graph based on how many different kinds of AT edu they have 
eduATCleanCoverageType %>% 
  group_by(Location, n) %>% 
  summarise(totalCount=sum(n)) %>% 
  ungroup() %>% 
  ggplot(aes(x = reorder(Location, -totalCount), y = totalCount)) +
  geom_bar(stat = "identity", fill='black') +
  geom_text(aes(label = n), vjust = -0.5, color = "white", size = 3) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5)) + 
  labs(title = "TBD", x = "Assistive Technology") + 
  facet_wrap(vars(totalCount))
## `summarise()` has grouped output by 'Location'. You can override using the
## `.groups` argument.

#filter out the no information and counting the different values (full, partial or no coverage)
eduCoverage <- eduATClean %>% 
  filter(Dim1 == "Summary" & Value != "No Information") %>% 
  count(Value)

#graphing how the coverage 
eduCoverage %>% 
  ggplot(aes(x = reorder(Value, -n), y = n)) +
  geom_bar(stat = "identity", fill='black') +
  geom_text(aes(label = n), vjust = -0.5, color = "white", size = 3) +
  theme_dark() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), plot.title = element_text(hjust = 0.5)) + 
scale_y_continuous(limits = c(0, 40)) +
  labs(title = "Coverage", x = "Type of Coverage", y = "Number of Countries")

Instructions so I don’t forget to cover these aspects of the assignment

Include descriptive statistics (e.g, mean, median, and standard deviation for numerical variables, and frequencies and/or mode for categorical variables

Include relevant visualizations using ggplot2 to complement these descriptive statistics. Be sure to use faceting, coloring, and titles as needed. Each visualization should be accompanied by descriptive text that highlights:

     the variable(s) used

     what questions might be answered with the visualizations

     what conclusions you can draw

Identify limitations of your visualization, such as:

    What questions are left unanswered with your visualizations

    What about the visualizations may be unclear to a naive viewer

    How could you improve the visualizations for the final project