Chapter: Vector

Create a vector that represents the age of at least four different family members or friends. You can name it whatever you want.

friends_age <- c(25,26,21,32)
  1. What is the mean age of the people in your vector? Find out in two ways, with and without using the mean() command.
mean(friends_age)
## [1] 26
sum(friends_age[1:4])/4
## [1] 26
  1. How old is the youngest person in your vector? (Use an R command to find out.)
min(friends_age)
## [1] 21
  1. What is the age gap between the youngest person and the oldest person in your vector? (Again use R to find out, and try to be as general as possible in the sense that your code should work even if the elements in your vector, or their order, change.)
max(friends_age)-min(friends_age)
## [1] 11
  1. How many people in your vector are above age 25? (Again, try to make your code work even in the case that your vector changes.)
n <- c(friends_age>25)
length(n)
## [1] 4
  1. Replace the age of the oldest person in your vector with the age of someone else you know.
friends_age[friends_age==32] <- 23
friends_age
## [1] 25 26 21 23
  1. Create a new vector that indicates how old each person in your vector will be in 10 years.
age_10yrs <- friends_age + 10
age_10yrs
## [1] 35 36 31 33
  1. Create a new vector that indicates what year each person in your vector will turn 100 years old.
year_friends_age100 <- (100-friends_age)+2022
print(year_friends_age100)
## [1] 2097 2096 2101 2099
  1. Create a new vector with a random sample of 3 individuals from your original vector. What is the mean age of the people in this new vector?
sample_age <- friends_age[sample(1:length(friends_age), size=3)]

mean(sample_age)
## [1] 24.66667

Chapter: Dataset Basics

  1. Read the world-small.csv data into R and store it in an object called world. Set your working directory using code first
getwd()
## [1] "C:/Users/FieldCamp17/OneDrive - The University of Memphis/Desktop/Biogeography"
setwd('C:/Users/FieldCamp17/R_WD')

world <- read.csv("world-small.csv")

head(world) 
##     country       region gdppcap08 polityIV
## 1   Albania   C&E Europe      7715     17.8
## 2   Algeria       Africa      8033     10.0
## 3    Angola       Africa      5899      8.0
## 4 Argentina   S. America     14333     18.0
## 5   Armenia   C&E Europe      6070     15.0
## 6 Australia Asia-Pacific     35677     20.0
  1. (Conceptual) What is the unit of analysis in the dataset? What’s the name of the dataset’s id variable?
#county is the analysis unit in the dataset. In other words, country is the id variable in this dataset.
  1. How many observations does world have? How many variables? Use an R command to find out.
dim(world)
## [1] 145   4
#Total 145 observations are in the data set and it has 4 variables
  1. Use brackets and a logical statement to inspect all the values for Nigeria and United States. That is, your code should return two entire rows of the dataset.
world [world$country == "Algeria", ]; 
##   country region gdppcap08 polityIV
## 2 Algeria Africa      8033       10
world [world$country == "United States", ]
##           country     region gdppcap08 polityIV
## 138 United States N. America     46716       20
  1. Use R to return China’s Polity IV score. As in question 4, use a logical statement and brackets, but don’t return the entire row. Rather, return a single value with the Polity IV score.
world [world$country == "China", ]
##    country       region gdppcap08 polityIV
## 26   China Asia-Pacific      5962        3
world [world$country == "China" & world$polityIV,4 ]
## [1] 3
  1. What is the lowest GDP per capita in the dataset? (Use R to return only the value.)
min(world$gdppcap08)
## [1] 188
  1. What country has the lowest GDP per capita?
world [world$gdppcap08== min(world$gdppcap08),1 ]
## [1] "Zimbabwe"

Chapter:Modifying Data

require (plyr)
require(dplyr)
  1. Read the world-small.csv dataset into R and store it in an object called world.
world <- read.csv("world-small.csv")
  1. Subset world to European countries. Save this subset as a new data frame called europe.
europe <- world %>% filter(region == "C&E Europe" | region == "W. Europe" | region == "Scandinavia")
head (europe)
##      country     region gdppcap08 polityIV
## 1    Albania C&E Europe      7715     17.8
## 2    Armenia C&E Europe      6070     15.0
## 3    Austria  W. Europe     38152     20.0
## 4 Azerbaijan C&E Europe      8765      3.0
## 5    Belarus C&E Europe     12261      3.0
## 6    Belgium  W. Europe     34493     20.0
summary(europe)
##    country             region            gdppcap08        polityIV    
##  Length:41          Length:41          Min.   : 1906   Min.   : 1.00  
##  Class :character   Class :character   1st Qu.:10041   1st Qu.:16.00  
##  Mode  :character   Mode  :character   Median :19330   Median :19.00  
##                                        Mean   :22007   Mean   :16.41  
##                                        3rd Qu.:34493   3rd Qu.:20.00  
##                                        Max.   :58138   Max.   :20.00
  1. Add two variables to europe: #a. A variable that recodes polityIV from 0-20 to -10-10. #b. variable that categorizes a country as “rich” or “poor” based on some cutoff of gdppcap08 you think is reasonable.
europe<- europe %>% mutate (polityIV_A = ifelse (polityIV<=10,1,2)) 
head(europe)
##      country     region gdppcap08 polityIV polityIV_A
## 1    Albania C&E Europe      7715     17.8          2
## 2    Armenia C&E Europe      6070     15.0          2
## 3    Austria  W. Europe     38152     20.0          2
## 4 Azerbaijan C&E Europe      8765      3.0          1
## 5    Belarus C&E Europe     12261      3.0          1
## 6    Belgium  W. Europe     34493     20.0          2
europe<- europe %>% mutate(status= ifelse(gdppcap08 > 25000, "rich", "poor"))
head(europe)
##      country     region gdppcap08 polityIV polityIV_A status
## 1    Albania C&E Europe      7715     17.8          2   poor
## 2    Armenia C&E Europe      6070     15.0          2   poor
## 3    Austria  W. Europe     38152     20.0          2   rich
## 4 Azerbaijan C&E Europe      8765      3.0          1   poor
## 5    Belarus C&E Europe     12261      3.0          1   poor
## 6    Belgium  W. Europe     34493     20.0          2   rich
  1. Drop the region variable in europe (keep the rest).
europe %>% select(-region)
##           country gdppcap08  polityIV polityIV_A status
## 1         Albania      7715 17.800000          2   poor
## 2         Armenia      6070 15.000000          2   poor
## 3         Austria     38152 20.000000          2   rich
## 4      Azerbaijan      8765  3.000000          1   poor
## 5         Belarus     12261  3.000000          1   poor
## 6         Belgium     34493 20.000000          2   rich
## 7        Bulgaria     12393 19.000000          2   poor
## 8         Croatia     19084 18.400000          2   poor
## 9  Czech Republic     24712 20.000000          2   poor
## 10        Denmark     36607 20.000000          2   rich
## 11        Estonia     20662 16.000000          2   poor
## 12        Finland     35427 20.000000          2   rich
## 13         France     34045 19.000000          2   rich
## 14        Georgia      4896 15.666667          2   poor
## 15        Germany     35613 20.000000          2   rich
## 16         Greece     29361 20.000000          2   rich
## 17        Hungary     19330 20.000000          2   poor
## 18        Ireland     44200 20.000000          2   rich
## 19          Italy     30756 20.000000          2   rich
## 20     Kazakhstan     11315  4.000000          1   poor
## 21     Kyrgyzstan      2188  7.000000          1   poor
## 22         Latvia     17100 18.000000          2   poor
## 23      Lithuania     18824 20.000000          2   poor
## 24      Macedonia     10041 19.000000          2   poor
## 25        Moldova      2925 18.000000          2   poor
## 26    Netherlands     40849 20.000000          2   rich
## 27         Norway     58138 20.000000          2   rich
## 28         Poland     17625 20.000000          2   poor
## 29       Portugal     23074 20.000000          2   poor
## 30        Romania     14065 18.333333          2   poor
## 31         Russia     16139 17.000000          2   poor
## 32       Slovakia     22081 19.000000          2   poor
## 33       Slovenia     27605 20.000000          2   rich
## 34          Spain     31954 20.000000          2   rich
## 35         Sweden     37383 20.000000          2   rich
## 36    Switzerland     42536 20.000000          2   rich
## 37     Tajikistan      1906  7.666667          1   poor
## 38   Turkmenistan      6641  1.000000          1   poor
## 39        Ukraine      7271 16.000000          2   poor
## 40 United Kingdom     35445 20.000000          2   rich
## 41     Uzbekistan      2656  1.000000          1   poor
  1. Sort europe based on Polity IV.
europe %>% arrange(polityIV)
##           country      region gdppcap08  polityIV polityIV_A status
## 1    Turkmenistan  C&E Europe      6641  1.000000          1   poor
## 2      Uzbekistan  C&E Europe      2656  1.000000          1   poor
## 3      Azerbaijan  C&E Europe      8765  3.000000          1   poor
## 4         Belarus  C&E Europe     12261  3.000000          1   poor
## 5      Kazakhstan  C&E Europe     11315  4.000000          1   poor
## 6      Kyrgyzstan  C&E Europe      2188  7.000000          1   poor
## 7      Tajikistan  C&E Europe      1906  7.666667          1   poor
## 8         Armenia  C&E Europe      6070 15.000000          2   poor
## 9         Georgia  C&E Europe      4896 15.666667          2   poor
## 10        Estonia  C&E Europe     20662 16.000000          2   poor
## 11        Ukraine  C&E Europe      7271 16.000000          2   poor
## 12         Russia  C&E Europe     16139 17.000000          2   poor
## 13        Albania  C&E Europe      7715 17.800000          2   poor
## 14         Latvia  C&E Europe     17100 18.000000          2   poor
## 15        Moldova  C&E Europe      2925 18.000000          2   poor
## 16        Romania  C&E Europe     14065 18.333333          2   poor
## 17        Croatia  C&E Europe     19084 18.400000          2   poor
## 18       Bulgaria  C&E Europe     12393 19.000000          2   poor
## 19         France   W. Europe     34045 19.000000          2   rich
## 20      Macedonia  C&E Europe     10041 19.000000          2   poor
## 21       Slovakia  C&E Europe     22081 19.000000          2   poor
## 22        Austria   W. Europe     38152 20.000000          2   rich
## 23        Belgium   W. Europe     34493 20.000000          2   rich
## 24 Czech Republic  C&E Europe     24712 20.000000          2   poor
## 25        Denmark Scandinavia     36607 20.000000          2   rich
## 26        Finland Scandinavia     35427 20.000000          2   rich
## 27        Germany   W. Europe     35613 20.000000          2   rich
## 28         Greece   W. Europe     29361 20.000000          2   rich
## 29        Hungary  C&E Europe     19330 20.000000          2   poor
## 30        Ireland   W. Europe     44200 20.000000          2   rich
## 31          Italy   W. Europe     30756 20.000000          2   rich
## 32      Lithuania  C&E Europe     18824 20.000000          2   poor
## 33    Netherlands   W. Europe     40849 20.000000          2   rich
## 34         Norway Scandinavia     58138 20.000000          2   rich
## 35         Poland  C&E Europe     17625 20.000000          2   poor
## 36       Portugal   W. Europe     23074 20.000000          2   poor
## 37       Slovenia  C&E Europe     27605 20.000000          2   rich
## 38          Spain   W. Europe     31954 20.000000          2   rich
## 39         Sweden Scandinavia     37383 20.000000          2   rich
## 40    Switzerland   W. Europe     42536 20.000000          2   rich
## 41 United Kingdom   W. Europe     35445 20.000000          2   rich
  1. Repeat Exercises 2-5 using chaining.
europe <-  world %>% 
           filter(region == "C&E Europe" | region == "W. Europe" | region == "Scandinavia") %>%
           mutate (polityIV_A = ifelse (polityIV>=10,1,2)) %>%
           mutate(status=ifelse(gdppcap08 >= 25000, "rich", "poor")) %>%
           select(-region)%>% 
           arrange(polityIV)
europe
##           country gdppcap08  polityIV polityIV_A status
## 1    Turkmenistan      6641  1.000000          2   poor
## 2      Uzbekistan      2656  1.000000          2   poor
## 3      Azerbaijan      8765  3.000000          2   poor
## 4         Belarus     12261  3.000000          2   poor
## 5      Kazakhstan     11315  4.000000          2   poor
## 6      Kyrgyzstan      2188  7.000000          2   poor
## 7      Tajikistan      1906  7.666667          2   poor
## 8         Armenia      6070 15.000000          1   poor
## 9         Georgia      4896 15.666667          1   poor
## 10        Estonia     20662 16.000000          1   poor
## 11        Ukraine      7271 16.000000          1   poor
## 12         Russia     16139 17.000000          1   poor
## 13        Albania      7715 17.800000          1   poor
## 14         Latvia     17100 18.000000          1   poor
## 15        Moldova      2925 18.000000          1   poor
## 16        Romania     14065 18.333333          1   poor
## 17        Croatia     19084 18.400000          1   poor
## 18       Bulgaria     12393 19.000000          1   poor
## 19         France     34045 19.000000          1   rich
## 20      Macedonia     10041 19.000000          1   poor
## 21       Slovakia     22081 19.000000          1   poor
## 22        Austria     38152 20.000000          1   rich
## 23        Belgium     34493 20.000000          1   rich
## 24 Czech Republic     24712 20.000000          1   poor
## 25        Denmark     36607 20.000000          1   rich
## 26        Finland     35427 20.000000          1   rich
## 27        Germany     35613 20.000000          1   rich
## 28         Greece     29361 20.000000          1   rich
## 29        Hungary     19330 20.000000          1   poor
## 30        Ireland     44200 20.000000          1   rich
## 31          Italy     30756 20.000000          1   rich
## 32      Lithuania     18824 20.000000          1   poor
## 33    Netherlands     40849 20.000000          1   rich
## 34         Norway     58138 20.000000          1   rich
## 35         Poland     17625 20.000000          1   poor
## 36       Portugal     23074 20.000000          1   poor
## 37       Slovenia     27605 20.000000          1   rich
## 38          Spain     31954 20.000000          1   rich
## 39         Sweden     37383 20.000000          1   rich
## 40    Switzerland     42536 20.000000          1   rich
## 41 United Kingdom     35445 20.000000          1   rich
  1. What was the world’s mean GDP per capita in 2008? Polity IV score?
world<- world%>% mutate(gdppcap08 =as.numeric (as.character(gdppcap08)))
world<- world%>% mutate(polityIV =as.numeric (as.character(polityIV)))
mean(world$gdppcap08)
## [1] 13251.99
mean(world$polityIV)
## [1] 13.40782
  1. What was Africa’s mean GDP per capita and Polity IV score?
africa<- filter(world, region=="Africa")
mean(africa$gdppcap08)
## [1] 3613.095
mean(africa$polityIV)
## [1] 11.44921
  1. What was the poorest country in the world in 2008? Richest?

Poorest in terms of lowest GDP per captia

world [world$gdppcap08== min(world$gdppcap08),1 ]
## [1] "Zimbabwe"

Richest in terms of highest GDP per captia

world [world$gdppcap08== max(world$gdppcap08),1 ]
## [1] "Qatar"
  1. How many countries in Europe are “rich” according to your coding? How many are poor? What percentage have Polity IV scores of at least 18?
rich<- filter(europe, status=="rich")
count(rich)
##    n
## 1 16
poor<- filter(europe, status=="poor")
count(poor)
##    n
## 1 25
score_polityIV<-filter(europe, polityIV>18)
count(score_polityIV)
##    n
## 1 26

##Chapter: Collapsing Data

1.Read the world-small.csv dataset (available here) into R. Get to know the structure of this dataset using functions like dim(), head(), and summary().

world <- read.csv("world-small.csv")
dim(world)
## [1] 145   4
head(world)
##     country       region gdppcap08 polityIV
## 1   Albania   C&E Europe      7715     17.8
## 2   Algeria       Africa      8033     10.0
## 3    Angola       Africa      5899      8.0
## 4 Argentina   S. America     14333     18.0
## 5   Armenia   C&E Europe      6070     15.0
## 6 Australia Asia-Pacific     35677     20.0
summary(world)
##    country             region            gdppcap08        polityIV     
##  Length:145         Length:145         Min.   :  188   Min.   : 0.000  
##  Class :character   Class :character   1st Qu.: 2153   1st Qu.: 7.667  
##  Mode  :character   Mode  :character   Median : 7271   Median :16.000  
##                                        Mean   :13252   Mean   :13.408  
##                                        3rd Qu.:19330   3rd Qu.:19.000  
##                                        Max.   :85868   Max.   :20.000
  1. Find the mean and median GDP per capita and Polity IV score, by region (that is, for each region in the dataset) Also find the number of countries by region.
region_summary<- world %>%
        group_by(region) %>%
           summarize (mean_gdppcap08= mean(gdppcap08),
                      median_gdppcap08= median(gdppcap08),
                      mean_polityIV= mean(polityIV),
                      median_polityIV = median(polityIV),
                      number_country= n())
    
region_summary
## # A tibble: 8 x 6
##   region       mean_gdppcap08 median_gdppcap08 mean_polityIV median_polityIV
##   <chr>                 <dbl>            <dbl>         <dbl>           <dbl>
## 1 Africa                3613.            1408.         11.4             11  
## 2 Asia-Pacific         11552.            4178.         13.6             16.3
## 3 C&E Europe           12571.           12261          14.2             17.8
## 4 Middle East          21450.           14661           5.69             3.5
## 5 N. America           32552.           36444          19.3             20  
## 6 S. America            7862.            8009          16.6             18  
## 7 Scandinavia          41889.           36995          20               20  
## 8 W. Europe            35040.           34969          19.9             20  
## # ... with 1 more variable: number_country <int>
  1. Find the mean and median GDP per capita, by region and whether a country is a “democracy” or not. For the purpose of this exercise, a country is a “democracy” if it has a Polity IV score of 15 or higher.
world <- world %>% mutate(democracy_stat = ifelse(polityIV >= 15, "Democracy", "Autocracy"))

region_sum<- world %>%
        group_by(region, democracy_stat) %>%
        summarize (mean_gdppcap08_D= mean(gdppcap08),
                  median_gdppcap08_D= median(gdppcap08),
                  number_dem= n())
region_sum
## # A tibble: 13 x 5
## # Groups:   region [8]
##    region       democracy_stat mean_gdppcap08_D median_gdppcap08_D number_dem
##    <chr>        <chr>                     <dbl>              <dbl>      <int>
##  1 Africa       Autocracy                 3908               1409          24
##  2 Africa       Democracy                 3220.              1404          18
##  3 Asia-Pacific Autocracy                 8918.              3584.         10
##  4 Asia-Pacific Democracy                13433.              4268.         14
##  5 C&E Europe   Autocracy                 6533.              6641           7
##  6 C&E Europe   Democracy                14919.             16620.         18
##  7 Middle East  Autocracy                21553.             13534          14
##  8 Middle East  Democracy                20734              20734           2
##  9 N. America   Democracy                32552.             36444           3
## 10 S. America   Autocracy                 5338.              5338.          2
## 11 S. America   Democracy                 8159.              8009          17
## 12 Scandinavia  Democracy                41889.             36995           4
## 13 W. Europe    Democracy                35040.             34969          12

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.