R Bridge Week 2

I was able to take a csv file from the list provided, download it, and re-upload it to by GitHub and access it with R Studio.

I selected the data set regarding UN development goals from 1998.

unDevURL <- "https://raw.githubusercontent.com/iscostello/Rbridgedata/master/UN98.csv"
unDev <- read.csv(file = unDevURL, header = TRUE, sep = ",")

head(unDev)
##                X region  tfr contraception educationMale educationFemale
## 1    Afghanistan   Asia 6.90            NA            NA              NA
## 2        Albania Europe 2.60            NA            NA              NA
## 3        Algeria Africa 3.81            52          11.1             9.9
## 4 American.Samoa   Asia   NA            NA            NA              NA
## 5        Andorra Europe   NA            NA            NA              NA
## 6         Angola Africa 6.69            NA            NA              NA
##   lifeMale lifeFemale infantMortality GDPperCapita economicActivityMale
## 1     45.0       46.0             154         2848                 87.5
## 2     68.0       74.0              32          863                   NA
## 3     67.5       70.3              44         1531                 76.4
## 4     68.0       73.0              11           NA                 58.8
## 5       NA         NA              NA           NA                   NA
## 6     44.9       48.1             124          355                   NA
##   economicActivityFemale illiteracyMale illiteracyFemale
## 1                    7.2         52.800            85.00
## 2                     NA             NA               NA
## 3                    7.8         26.100            51.00
## 4                   42.4          0.264             0.36
## 5                     NA             NA               NA
## 6                     NA             NA               NA

Summarizing the data.

summary(unDev)
##       X                region               tfr        contraception  
##  Length:207         Length:207         Min.   :1.190   Min.   : 2.00  
##  Class :character   Class :character   1st Qu.:1.950   1st Qu.:21.00  
##  Mode  :character   Mode  :character   Median :3.070   Median :47.00  
##                                        Mean   :3.529   Mean   :43.43  
##                                        3rd Qu.:4.980   3rd Qu.:64.00  
##                                        Max.   :8.000   Max.   :86.00  
##                                        NA's   :10      NA's   :63     
##  educationMale   educationFemale     lifeMale       lifeFemale   
##  Min.   : 3.30   Min.   : 2.000   Min.   :36.00   Min.   :39.10  
##  1st Qu.: 9.75   1st Qu.: 9.325   1st Qu.:57.38   1st Qu.:59.60  
##  Median :11.25   Median :11.650   Median :66.50   Median :72.15  
##  Mean   :11.41   Mean   :11.275   Mean   :63.63   Mean   :68.39  
##  3rd Qu.:13.90   3rd Qu.:13.650   3rd Qu.:70.90   3rd Qu.:76.42  
##  Max.   :17.20   Max.   :17.800   Max.   :77.40   Max.   :82.90  
##  NA's   :131     NA's   :131      NA's   :11      NA's   :11     
##  infantMortality   GDPperCapita   economicActivityMale economicActivityFemale
##  Min.   :  2.00   Min.   :   36   Min.   :51.20        Min.   : 1.90         
##  1st Qu.: 12.00   1st Qu.:  442   1st Qu.:72.30        1st Qu.:37.00         
##  Median : 30.00   Median : 1779   Median :76.80        Median :48.40         
##  Mean   : 43.48   Mean   : 6262   Mean   :76.46        Mean   :46.79         
##  3rd Qu.: 66.00   3rd Qu.: 7272   3rd Qu.:81.20        3rd Qu.:56.40         
##  Max.   :169.00   Max.   :42416   Max.   :93.00        Max.   :90.60         
##  NA's   :6        NA's   :10      NA's   :42           NA's   :42            
##  illiteracyMale   illiteracyFemale
##  Min.   : 0.200   Min.   : 0.200  
##  1st Qu.: 2.952   1st Qu.: 4.847  
##  Median :10.829   Median :20.100  
##  Mean   :17.555   Mean   :27.906  
##  3rd Qu.:27.575   3rd Qu.:48.025  
##  Max.   :79.100   Max.   :93.400  
##  NA's   :47       NA's   :47

Observing a mean and median of metric 1.

mean(unDev$lifeMale, na.rm = TRUE)
## [1] 63.62551
median(unDev$lifeMale, na.rm =TRUE)
## [1] 66.5

Observing a mean and median of metric 2.

mean(unDev$lifeFemale, na.rm = TRUE)
## [1] 68.39184
median(unDev$lifeFemale, na.rm =TRUE)
## [1] 72.15

These metrics relate to life expectancy between men and women. In keeping with other research, the mean and median of women are higher.

Subset of Data

I created a pretty random subset of data for thirty countries, pulling the key column, region, and life expectancies for both men and women.

dfSub <- unDev[20:50,c(1,2,7,8)]
summary(dfSub)
##       X                region             lifeMale       lifeFemale   
##  Length:31          Length:31          Min.   :45.10   Min.   :47.00  
##  Class :character   Class :character   1st Qu.:51.38   1st Qu.:54.60  
##  Mode  :character   Mode  :character   Median :64.45   Median :69.35  
##                                        Mean   :61.20   Mean   :65.58  
##                                        3rd Qu.:70.33   3rd Qu.:76.38  
##                                        Max.   :76.10   Max.   :81.80  
##                                        NA's   :1       NA's   :1
mean(dfSub$lifeMale, na.rm = TRUE)
## [1] 61.20333
median(dfSub$lifeMale, na.rm =TRUE)
## [1] 64.45
mean(dfSub$lifeFemale, na.rm = TRUE)
## [1] 65.58333
median(dfSub$lifeFemale, na.rm =TRUE)
## [1] 69.35

My selection of countries for both men and women showed a lower life expectancy. The counties in the list were overwhlemingly developing nations and so would have lower infrastructural capacity and access to health care, which may lead to these outcomes.

dfSub$lifeMale[dfSub$lifeMale >= 70] <- "High"
dfSub$lifeMale[dfSub$lifeMale >= 50 & dfSub$lifeMale < 70] <- "Medium"
dfSub$lifeMale[dfSub$lifeMale <= 49] <- "Low"
print(dfSub)
##                       X  region lifeMale lifeFemale
## 20                Benin  Africa   Medium       57.2
## 21               Bhutan    Asia   Medium       54.9
## 22              Bolivia America   Medium       63.2
## 23               Bosnia  Europe     High       75.9
## 24             Botswana  Africa      Low       51.7
## 25               Brazil America   Medium       71.2
## 26               Brunei    Asia     High       78.1
## 27             Bulgaria  Europe   Medium       74.9
## 28         Burkina.Faso  Africa      Low       47.0
## 29              Burundi  Africa      Low       48.8
## 30             Cambodia    Asia   Medium       55.4
## 31             Cameroon  Africa   Medium       57.2
## 32               Canada America     High       81.8
## 33           Cape.Verde  Africa   Medium       67.5
## 34  Central.African.Rep  Africa      Low       51.0
## 35                 Chad  Africa      Low       49.3
## 36                Chile America     High       78.3
## 37                China    Asia   Medium       71.7
## 38             Colombia America   Medium       73.7
## 39              Comoros  Africa   Medium       58.0
## 40                Congo  Africa      Low       53.4
## 41         Cook.Islands Oceania   Medium       73.0
## 42           Costa.Rica America     High       79.2
## 43              Croatia  Europe   Medium       76.5
## 44                 Cuba America     High       78.0
## 45               Cyprus  Europe     High       79.8
## 46       Czech.Republic  Europe   Medium       76.0
## 47 Dem.Rep.of.the.Congo  Africa   Medium       54.5
## 48              Denmark  Europe     High       78.3
## 49             Djibouti  Africa      Low       52.0
## 50             Dominica America     <NA>         NA

I was having some trouble renaming these values. I thought the most straightforward way was to make them conditional, high, medium, and low based on the age in the country. I got two to rename okay, but adding a third somehow broke the code and made them all revert to “high.” I’m not sure what is going on, and maybe my approach is needlessly more complex.