I was able to take a csv file from the list provided, download it, and re-upload it to by GitHub and access it with R Studio.
I selected the data set regarding UN development goals from 1998.
unDevURL <- "https://raw.githubusercontent.com/iscostello/Rbridgedata/master/UN98.csv"
unDev <- read.csv(file = unDevURL, header = TRUE, sep = ",")
head(unDev)
## X region tfr contraception educationMale educationFemale
## 1 Afghanistan Asia 6.90 NA NA NA
## 2 Albania Europe 2.60 NA NA NA
## 3 Algeria Africa 3.81 52 11.1 9.9
## 4 American.Samoa Asia NA NA NA NA
## 5 Andorra Europe NA NA NA NA
## 6 Angola Africa 6.69 NA NA NA
## lifeMale lifeFemale infantMortality GDPperCapita economicActivityMale
## 1 45.0 46.0 154 2848 87.5
## 2 68.0 74.0 32 863 NA
## 3 67.5 70.3 44 1531 76.4
## 4 68.0 73.0 11 NA 58.8
## 5 NA NA NA NA NA
## 6 44.9 48.1 124 355 NA
## economicActivityFemale illiteracyMale illiteracyFemale
## 1 7.2 52.800 85.00
## 2 NA NA NA
## 3 7.8 26.100 51.00
## 4 42.4 0.264 0.36
## 5 NA NA NA
## 6 NA NA NA
Summarizing the data.
summary(unDev)
## X region tfr contraception
## Length:207 Length:207 Min. :1.190 Min. : 2.00
## Class :character Class :character 1st Qu.:1.950 1st Qu.:21.00
## Mode :character Mode :character Median :3.070 Median :47.00
## Mean :3.529 Mean :43.43
## 3rd Qu.:4.980 3rd Qu.:64.00
## Max. :8.000 Max. :86.00
## NA's :10 NA's :63
## educationMale educationFemale lifeMale lifeFemale
## Min. : 3.30 Min. : 2.000 Min. :36.00 Min. :39.10
## 1st Qu.: 9.75 1st Qu.: 9.325 1st Qu.:57.38 1st Qu.:59.60
## Median :11.25 Median :11.650 Median :66.50 Median :72.15
## Mean :11.41 Mean :11.275 Mean :63.63 Mean :68.39
## 3rd Qu.:13.90 3rd Qu.:13.650 3rd Qu.:70.90 3rd Qu.:76.42
## Max. :17.20 Max. :17.800 Max. :77.40 Max. :82.90
## NA's :131 NA's :131 NA's :11 NA's :11
## infantMortality GDPperCapita economicActivityMale economicActivityFemale
## Min. : 2.00 Min. : 36 Min. :51.20 Min. : 1.90
## 1st Qu.: 12.00 1st Qu.: 442 1st Qu.:72.30 1st Qu.:37.00
## Median : 30.00 Median : 1779 Median :76.80 Median :48.40
## Mean : 43.48 Mean : 6262 Mean :76.46 Mean :46.79
## 3rd Qu.: 66.00 3rd Qu.: 7272 3rd Qu.:81.20 3rd Qu.:56.40
## Max. :169.00 Max. :42416 Max. :93.00 Max. :90.60
## NA's :6 NA's :10 NA's :42 NA's :42
## illiteracyMale illiteracyFemale
## Min. : 0.200 Min. : 0.200
## 1st Qu.: 2.952 1st Qu.: 4.847
## Median :10.829 Median :20.100
## Mean :17.555 Mean :27.906
## 3rd Qu.:27.575 3rd Qu.:48.025
## Max. :79.100 Max. :93.400
## NA's :47 NA's :47
Observing a mean and median of metric 1.
mean(unDev$lifeMale, na.rm = TRUE)
## [1] 63.62551
median(unDev$lifeMale, na.rm =TRUE)
## [1] 66.5
Observing a mean and median of metric 2.
mean(unDev$lifeFemale, na.rm = TRUE)
## [1] 68.39184
median(unDev$lifeFemale, na.rm =TRUE)
## [1] 72.15
These metrics relate to life expectancy between men and women. In keeping with other research, the mean and median of women are higher.
I created a pretty random subset of data for thirty countries, pulling the key column, region, and life expectancies for both men and women.
dfSub <- unDev[20:50,c(1,2,7,8)]
summary(dfSub)
## X region lifeMale lifeFemale
## Length:31 Length:31 Min. :45.10 Min. :47.00
## Class :character Class :character 1st Qu.:51.38 1st Qu.:54.60
## Mode :character Mode :character Median :64.45 Median :69.35
## Mean :61.20 Mean :65.58
## 3rd Qu.:70.33 3rd Qu.:76.38
## Max. :76.10 Max. :81.80
## NA's :1 NA's :1
mean(dfSub$lifeMale, na.rm = TRUE)
## [1] 61.20333
median(dfSub$lifeMale, na.rm =TRUE)
## [1] 64.45
mean(dfSub$lifeFemale, na.rm = TRUE)
## [1] 65.58333
median(dfSub$lifeFemale, na.rm =TRUE)
## [1] 69.35
My selection of countries for both men and women showed a lower life expectancy. The counties in the list were overwhlemingly developing nations and so would have lower infrastructural capacity and access to health care, which may lead to these outcomes.
dfSub$lifeMale[dfSub$lifeMale >= 70] <- "High"
dfSub$lifeMale[dfSub$lifeMale >= 50 & dfSub$lifeMale < 70] <- "Medium"
dfSub$lifeMale[dfSub$lifeMale <= 49] <- "Low"
print(dfSub)
## X region lifeMale lifeFemale
## 20 Benin Africa Medium 57.2
## 21 Bhutan Asia Medium 54.9
## 22 Bolivia America Medium 63.2
## 23 Bosnia Europe High 75.9
## 24 Botswana Africa Low 51.7
## 25 Brazil America Medium 71.2
## 26 Brunei Asia High 78.1
## 27 Bulgaria Europe Medium 74.9
## 28 Burkina.Faso Africa Low 47.0
## 29 Burundi Africa Low 48.8
## 30 Cambodia Asia Medium 55.4
## 31 Cameroon Africa Medium 57.2
## 32 Canada America High 81.8
## 33 Cape.Verde Africa Medium 67.5
## 34 Central.African.Rep Africa Low 51.0
## 35 Chad Africa Low 49.3
## 36 Chile America High 78.3
## 37 China Asia Medium 71.7
## 38 Colombia America Medium 73.7
## 39 Comoros Africa Medium 58.0
## 40 Congo Africa Low 53.4
## 41 Cook.Islands Oceania Medium 73.0
## 42 Costa.Rica America High 79.2
## 43 Croatia Europe Medium 76.5
## 44 Cuba America High 78.0
## 45 Cyprus Europe High 79.8
## 46 Czech.Republic Europe Medium 76.0
## 47 Dem.Rep.of.the.Congo Africa Medium 54.5
## 48 Denmark Europe High 78.3
## 49 Djibouti Africa Low 52.0
## 50 Dominica America <NA> NA
I was having some trouble renaming these values. I thought the most straightforward way was to make them conditional, high, medium, and low based on the age in the country. I got two to rename okay, but adding a third somehow broke the code and made them all revert to “high.” I’m not sure what is going on, and maybe my approach is needlessly more complex.