#SECTION 1

Question 1

For each of the following, come up with a variable name that would be appropriate to use in R for the listed variable: a. Body temperature in Celcius celsiusbodytemp b. How much aspirin is given per dose for a patient aspirinpatient c. Number of televisions per person TVpperson d. height (including neck and extended legs) of giraffes) giraffeheight

Question 2

Use R to calculate the following: a. 13^3 b. The log of 14 using the natural log c. The log of 100 using the base 10 d. The square root of 81

13^3
## [1] 2197
log(14)
## [1] 2.639057
log10(100)
## [1] 2
sqrt(81)
## [1] 9

#SECTION 2

Question 1

People are notoriously dishonest about revealing how often they perform antisocial behaviors like peeing in swimming pools. (In addition to being disgusting, the nitrogenous chemicals in urine combine with the pool’s chlorine to produce some toxic chemicals like trichloramine, the source of most skin irritations for swimmers.) A group of researchers (Jmaiff Blackstock et al. 2017) recently realized that an artificial sweetener called ACE passes out in urine unmetabolized and in known average quantities, and therefore by measuring ACE concentrations we can measure the amount of urine in a pool.

Here is a list of measurements, each from a different pool, of the concentration of ACE (measured in ng/L) for 23 different pools in Canada.

640, 1070, 780, 70, 160, 130, 60, 50, 2110, 70, 350, 30, 210, 90, 470, 580, 250, 310, 460, 430, 140, 1070, 130

Question 1a - In R, create a vector of these data, and name it appropriately.

ACEconcentration<-c(640, 1070, 780, 70, 160, 130, 60, 50, 2110, 70, 350, 30, 210, 90, 470, 580, 250, 310, 460, 430, 140, 1070, 130)

Question 1b - What is the mean ACE concentration of these 23 pools?

mean(ACEconcentration)
## [1] 420

Question 1c - Urine on average has 4000 ng ACE/ ml. Therefore to convert these measurements of ng ACE / L pool water to ml urine / L pool water we need to divide each by 4000. Make a new vector showing the concentration of urine per liter in these 23 pools. Give it a suitable name.

Urineconcentration<-c(ACEconcentration/4000)

Question 1d - What is the mean concentration of urine per liter? How did this change relative to the mean measurement of ng ACE / L ?

mean(Urineconcentration)
## [1] 0.105
#much smaller conc.

Question 1e - The arithmetic mean is calculated by adding up all the numbers and dividing by how many numbers there are. Calculate the mean of these numbers using sum() and length(). Did you get the same answer as with using mean()?

sum(Urineconcentration)/length(Urineconcentration)
## [1] 0.105
#yes
sum(ACEconcentration)/length(ACEconcentration)
## [1] 420
#yes

Question 1f - Use R to calculate the average amount of urine (in ml) in a 500,000 L pool.

mean(Urineconcentration)*500000
## [1] 52500

Question 2

Weddell seals live in Antarctic waters and take long strenuous dives in order to find fish to feed upon. Researchers (Williams et al. 2004) wanted to know whether these feeding dives were more energetically expensive than regular dives (perhaps because they are deeper, or the seal has to swim further or faster). They measured the metabolic costs of dives using the oxygen consumption of 10 animals (in ml O2 / kg) during a feeding dive. (Photo above by Giuseppe Zibordi, NOAA Photo Library) Here are the data:

71.0, 77.3, 82.6, 96.1, 106.6, 112.8, 121.2, 126.4, 127.5, 143.1

For the same 10 animals, they also measured the oxygen consumption in non-feeding dives. With the 10 animals in the same order as before, here are those data:

42.2, 51.7, 59.8, 66.5, 81.9, 82.0, 81.3, 81.3, 96.0, 104.1

Question 2a - Make a vector for each of these lists, and give them appropriate names.

FeedDive<-c(71.0, 77.3, 82.6, 96.1, 106.6, 112.8, 121.2, 126.4, 127.5, 143.1)
NormDive<-c(42.2, 51.7, 59.8, 66.5, 81.9, 82.0, 81.3, 81.3, 96.0, 104.1)

Question 2b - Confirm (using R) that both of your vectors have the same number of individuals in them.

length(FeedDive)
## [1] 10
length(NormDive)
## [1] 10

Question 2c - Create a vector called MetabolismDifference by calculating the difference in oxygen consumption between feeding dives and nonfeeding dives for each animal.

MetabolismDifference<-c(FeedDive-NormDive)
MetabolismDifference
##  [1] 28.8 25.6 22.8 29.6 24.7 30.8 39.9 45.1 31.5 39.0

Question 2d - What is the average difference between feeding dives and nonfeeding dives in oxygen consumption?

AvgFeedDive<-(mean(FeedDive))
AvgNormDive<-(mean(NormDive))
AvgDiffDive<-(AvgFeedDive-AvgNormDive)
AvgDiffDive
## [1] 31.78

Question 2e - Another appropriate way to represent the relationship between these two numbers would be to take the ratio of O2 consumption for feeding dives over the O2 consumption of nonfeeding dives. Make a vector which gives this ratio for each seal.

ratioFeedtoNormDive<-c(FeedDive/NormDive)

Question 2f - Sometimes ratios are easier to analyze when we look at the log of the ratio. Create a vector which gives the log of the ratios from the previous step. (Use the natural log.) What is the mean of this log-ratio?

logRatioDive<-c(log(ratioFeedtoNormDive))
mean(logRatioDive)
## [1] 0.363873

Question 3

The data file called “countries.csv” in the Data folder contains information about all the countries on Earth. Each row is a country, and each column contains a variable.

Question 3a - Use read.csv() to read the data from this file into a data frame called countries.

countries<-read.csv("countries.csv")

Quesiton 3b - Use str() to get a quick description of this data set. What are the first three variables?

str(countries)
## 'data.frame':    196 obs. of  18 variables:
##  $ country                                     : chr  "Afghanistan" "Albania" "Algeria" "Andorra" ...
##  $ total_population_in_thousands_2015          : num  32526.6 2896.7 39666.5 70.5 25022 ...
##  $ gross_national_income_per_capita_2013       : int  2000 10520 12990 NA 6770 20070 NA 8140 42540 43840 ...
##  $ life_expectancy_at_birth_female             : num  61.3 80.6 77.3 NA 53.1 78.4 79.8 77.5 84.6 83.8 ...
##  $ life_expectancy_at_birth_male               : num  58.6 74.7 73.6 NA 50.2 73.9 72.5 71.4 80.7 78.9 ...
##  $ life_expectancy_at_age_60_female            : num  16.6 23.7 22.4 NA 16.2 22.9 23.7 21.2 26.7 25.8 ...
##  $ life_expectancy_at_age_60_male              : num  15.2 19.5 21 NA 15 20.1 18.7 17.5 23.9 22.1 ...
##  $ physicians_density_per_1000                 : num  0.304 NA NA NA NA ...
##  $ number_neonatal_deaths_in_thousands_2014    : int  37 0 15 0 53 0 5 0 1 0 ...
##  $ measles_immunization_oneyearolds            : int  66 98 95 96 60 98 95 97 93 96 ...
##  $ dpt2_vaccination_oneyearolds                : int  75 98 95 97 64 99 94 93 92 98 ...
##  $ fines_for_tobacco_advertising_2014          : chr  "No" "Yes" "No" "No" ...
##  $ mortality_rate_cancer_2012                  : num  123.6 123.1 80.6 NA 89.6 ...
##  $ cigarette_price_2014                        : num  NA NA 1.58 NA NA NA 1.8 0.94 15.4 6.23 ...
##  $ continent                                   : chr  "Asia" "Europe" "Africa" "Europe" ...
##  $ ecological_footprint_2000                   : num  NA 1.86 1.79 NA 0.82 NA 3.79 1.16 8.49 5.45 ...
##  $ ecological_footprint_2012                   : num  NA 1.8 1.6 NA NA NA 2.7 NA NA 5.3 ...
##  $ cell_phone_subscriptions_per_100_people_2012: num  53.9 108.5 103.3 74.3 48.6 ...
#country, total pop in thousands, and GNI

Question 3c - Using the output of str(), how many countries are from Africa in this data set?

AfricaCountries<-subset(countries,continent=="Africa")
length(AfricaCountries)
## [1] 18
#18

Question 3d - What kinds of variables (i.e., categorical or numerical) are continents, cell_phone_subscriptions_per_100_people_2012, total_population_in_thousands_2015 and fines_for_tobacco_advertising_2014? (Don’t go by their variable names – look at the data in the summary results to decide.)

summary(countries)
##    country          total_population_in_thousands_2015
##  Length:196         Min.   :      1.6                 
##  Class :character   1st Qu.:   1875.8                 
##  Mode  :character   Median :   8069.6                 
##                     Mean   :  37721.9                 
##                     3rd Qu.:  26413.0                 
##                     Max.   :1400000.0                 
##                     NA's   :2                         
##  gross_national_income_per_capita_2013 life_expectancy_at_birth_female
##  Min.   :   600                        Min.   :48.80                  
##  1st Qu.:  3070                        1st Qu.:67.05                  
##  Median :  9800                        Median :75.90                  
##  Mean   : 14792                        Mean   :73.42                  
##  3rd Qu.: 20370                        3rd Qu.:79.25                  
##  Max.   :123860                        Max.   :86.70                  
##  NA's   :27                            NA's   :13                     
##  life_expectancy_at_birth_male life_expectancy_at_age_60_female
##  Min.   :47.40                 Min.   :12.70                   
##  1st Qu.:62.90                 1st Qu.:18.00                   
##  Median :69.80                 Median :20.40                   
##  Mean   :68.53                 Mean   :20.81                   
##  3rd Qu.:73.95                 3rd Qu.:23.40                   
##  Max.   :81.10                 Max.   :28.60                   
##  NA's   :13                    NA's   :13                      
##  life_expectancy_at_age_60_male physicians_density_per_1000
##  Min.   :12.50                  Min.   :0.029              
##  1st Qu.:15.80                  1st Qu.:1.681              
##  Median :17.50                  Median :2.765              
##  Mean   :18.07                  Mean   :2.725              
##  3rd Qu.:20.20                  3rd Qu.:3.510              
##  Max.   :23.90                  Max.   :7.519              
##  NA's   :13                     NA's   :125                
##  number_neonatal_deaths_in_thousands_2014 measles_immunization_oneyearolds
##  Min.   :  0.00                           Min.   :22.00                   
##  1st Qu.:  0.00                           1st Qu.:83.25                   
##  Median :  1.00                           Median :93.00                   
##  Mean   : 14.11                           Mean   :87.28                   
##  3rd Qu.:  9.50                           3rd Qu.:97.00                   
##  Max.   :722.00                           Max.   :99.00                   
##  NA's   :2                                NA's   :2                       
##  dpt2_vaccination_oneyearolds fines_for_tobacco_advertising_2014
##  Min.   :20.00                Length:196                        
##  1st Qu.:84.25                Class :character                  
##  Median :94.00                Mode  :character                  
##  Mean   :87.91                                                  
##  3rd Qu.:97.00                                                  
##  Max.   :99.00                                                  
##  NA's   :2                                                      
##  mortality_rate_cancer_2012 cigarette_price_2014  continent        
##  Min.   : 54.00             Min.   : 0.360       Length:196        
##  1st Qu.: 88.62             1st Qu.: 1.320       Class :character  
##  Median :108.00             Median : 2.620       Mode  :character  
##  Mean   :109.64             Mean   : 3.798                         
##  3rd Qu.:124.53             3rd Qu.: 4.965                         
##  Max.   :223.00             Max.   :16.140                         
##  NA's   :24                 NA's   :89                             
##  ecological_footprint_2000 ecological_footprint_2012
##  Min.   : 0.600            Min.   :0.700            
##  1st Qu.: 1.097            1st Qu.:1.400            
##  Median : 2.140            Median :2.000            
##  Mean   : 3.147            Mean   :2.353            
##  3rd Qu.: 4.872            3rd Qu.:3.000            
##  Max.   :15.990            Max.   :5.300            
##  NA's   :58                NA's   :147              
##  cell_phone_subscriptions_per_100_people_2012
##  Min.   :  5.47                              
##  1st Qu.: 69.83                              
##  Median :103.25                              
##  Mean   : 99.90                              
##  3rd Qu.:126.10                              
##  Max.   :198.62                              
##  NA's   :10
#continents is categorical
#cell_phone_subscriptions_per_100_people_2012 is numerical
#total_population_in_thousands_2015 is numerical
#fines_for_tobacco_advertising_2014 is categorical

Answer here continents is categorical cell_phone_subscriptions_per_100_people_2012 is numerical total_population_in_thousands_2015 is numerical fines_for_tobacco_advertising_2014 is categorical

Question 3e - Add a new column to your countries data frame that has the difference in ecological footprint between 2012 and 2000. What is the mean of this difference? (Note: this variable will have “missing data”, which means that some of the countries do not have data in this file for one or the other of the years of ecological footprint. By default, R doesn’t calculate a mean unless all the data are present. To tell R to ignore the missing data, add an option to the mean() command that says na.rm=TRUE. We’ll learn more about this later.)

countries$diffEcoFt<-countries$ecological_footprint_2012-countries$ecological_footprint_2000
mean(countries$diffEcoFt, na.rm = TRUE)
## [1] -0.4169565

Question 4

Using the countries data again, create a new data frame called AfricaData, that only includes data for countries in Africa. What is the sum of the total_population_in_thousands_2015 for this new data frame?

AfricaData<-subset(countries, continent=="Africa")
sum(AfricaData$total_population_in_thousands_2015)
## [1] 1184501