R Markdown

EXERCISE 3.11

For these exercises we will use the US murders dataset. Make sure you load it prior to starting.

library(dslabs)
data("murders")

Q1: Use the $ operator to access the population size data and store it as the object pop. Then use the sort function to redefine so that it is sorted. Finally, use the [ operator to report the smallest population size.

Use the $ operator to access the population size

pop<-c(murders$population)
pop
##  [1]  4779736   710231  6392017  2915918 37253956  5029196  3574097   897934
##  [9]   601723 19687653  9920000  1360301  1567582 12830632  6483802  3046355
## [17]  2853118  4339367  4533372  1328361  5773552  6547629  9883640  5303925
## [25]  2967297  5988927   989415  1826341  2700551  1316470  8791894  2059179
## [33] 19378102  9535483   672591 11536504  3751351  3831074 12702379  1052567
## [41]  4625364   814180  6346105 25145561  2763885   625741  8001024  6724540
## [49]  1852994  5686986   563626

use the sort function to redefine pop

sort(pop)
##  [1]   563626   601723   625741   672591   710231   814180   897934   989415
##  [9]  1052567  1316470  1328361  1360301  1567582  1826341  1852994  2059179
## [17]  2700551  2763885  2853118  2915918  2967297  3046355  3574097  3751351
## [25]  3831074  4339367  4533372  4625364  4779736  5029196  5303925  5686986
## [33]  5773552  5988927  6346105  6392017  6483802  6547629  6724540  8001024
## [41]  8791894  9535483  9883640  9920000 11536504 12702379 12830632 19378102
## [49] 19687653 25145561 37253956

use the [ operator to report the smallest population size

murders$population[which.min(pop)]
## [1] 563626

Q2:Now instead of the smallest population size, fnd the index of the entry with the smallest population size. Hint: use order instead of sort.

index of population size

x<-(murders$population)
order(x)
##  [1] 51  9 46 35  2 42  8 27 40 30 20 12 13 28 49 32 29 45 17  4 25 16  7 37 38
## [26] 18 19 41  1  6 24 50 21 26 43  3 15 22 48 47 31 34 23 11 36 39 14 33 10 44
## [51]  5

index of the entry with the smallest population size

order(x)[1]
## [1] 51

Q3:We can actually perform the same operation as in the previous exercise using the function which.min.Write one line of code that does this

murders$population[which.min(pop)]
## [1] 563626

Q:4 Now we know how small the smallest state is and we know which row represents it. Which state is it? Define a variable states to be the state names from the murders data frame. Report the name of the state with the smallest population.

Define a variable states to be the state names from the murders data frame

states<-c(murders$state)
states
##  [1] "Alabama"              "Alaska"               "Arizona"             
##  [4] "Arkansas"             "California"           "Colorado"            
##  [7] "Connecticut"          "Delaware"             "District of Columbia"
## [10] "Florida"              "Georgia"              "Hawaii"              
## [13] "Idaho"                "Illinois"             "Indiana"             
## [16] "Iowa"                 "Kansas"               "Kentucky"            
## [19] "Louisiana"            "Maine"                "Maryland"            
## [22] "Massachusetts"        "Michigan"             "Minnesota"           
## [25] "Mississippi"          "Missouri"             "Montana"             
## [28] "Nebraska"             "Nevada"               "New Hampshire"       
## [31] "New Jersey"           "New Mexico"           "New York"            
## [34] "North Carolina"       "North Dakota"         "Ohio"                
## [37] "Oklahoma"             "Oregon"               "Pennsylvania"        
## [40] "Rhode Island"         "South Carolina"       "South Dakota"        
## [43] "Tennessee"            "Texas"                "Utah"                
## [46] "Vermont"              "Virginia"             "Washington"          
## [49] "West Virginia"        "Wisconsin"            "Wyoming"

Report the name of the state with the smallest population.

murders$state[which.min(murders$population)]
## [1] "Wyoming"

Q:5 . You can create a data frame using the data.frame function. Here is a quick example:

temp <- c(35, 88, 42, 84, 81, 30)

city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”) city_temps <- data.frame(name = city, temperature = temp)

Use the rank function to determine the population rank of each state from smallest population size to biggest. Save these ranks in an object called ranks, then create a data frame with the state name and its rank. Call the data frame my_df

Use the rank function to determine the population rank of each state from smallest population size to biggest.

pop<-(murders$population)
pop
##  [1]  4779736   710231  6392017  2915918 37253956  5029196  3574097   897934
##  [9]   601723 19687653  9920000  1360301  1567582 12830632  6483802  3046355
## [17]  2853118  4339367  4533372  1328361  5773552  6547629  9883640  5303925
## [25]  2967297  5988927   989415  1826341  2700551  1316470  8791894  2059179
## [33] 19378102  9535483   672591 11536504  3751351  3831074 12702379  1052567
## [41]  4625364   814180  6346105 25145561  2763885   625741  8001024  6724540
## [49]  1852994  5686986   563626

Save these ranks in an object called ranks

ranks<-rank(pop)

ranks
##  [1] 29  5 36 20 51 30 23  7  2 49 44 12 13 47 37 22 19 26 27 11 33 38 43 31 21
## [26] 34  8 14 17 10 41 16 48 42  4 45 24 25 46  9 28  6 35 50 18  3 40 39 15 32
## [51]  1

create a data frame with the state name and its rank.Call the data frame my_df

my_df<-data.frame(rank=ranks, states=states)
my_df
##    rank               states
## 1    29              Alabama
## 2     5               Alaska
## 3    36              Arizona
## 4    20             Arkansas
## 5    51           California
## 6    30             Colorado
## 7    23          Connecticut
## 8     7             Delaware
## 9     2 District of Columbia
## 10   49              Florida
## 11   44              Georgia
## 12   12               Hawaii
## 13   13                Idaho
## 14   47             Illinois
## 15   37              Indiana
## 16   22                 Iowa
## 17   19               Kansas
## 18   26             Kentucky
## 19   27            Louisiana
## 20   11                Maine
## 21   33             Maryland
## 22   38        Massachusetts
## 23   43             Michigan
## 24   31            Minnesota
## 25   21          Mississippi
## 26   34             Missouri
## 27    8              Montana
## 28   14             Nebraska
## 29   17               Nevada
## 30   10        New Hampshire
## 31   41           New Jersey
## 32   16           New Mexico
## 33   48             New York
## 34   42       North Carolina
## 35    4         North Dakota
## 36   45                 Ohio
## 37   24             Oklahoma
## 38   25               Oregon
## 39   46         Pennsylvania
## 40    9         Rhode Island
## 41   28       South Carolina
## 42    6         South Dakota
## 43   35            Tennessee
## 44   50                Texas
## 45   18                 Utah
## 46    3              Vermont
## 47   40             Virginia
## 48   39           Washington
## 49   15        West Virginia
## 50   32            Wisconsin
## 51    1              Wyoming

Q6: Repeat the previous exercise, but this time order my_df so that the states are ordered from least populous to most populous. Hint: create an object ind that stores the indexes needed to order the population values. Then use the bracket operator [ to re-order each column in the data frame

object ind that stores the indexes needed to order the population values.

ind<-order(ranks)
ind
##  [1] 51  9 46 35  2 42  8 27 40 30 20 12 13 28 49 32 29 45 17  4 25 16  7 37 38
## [26] 18 19 41  1  6 24 50 21 26 43  3 15 22 48 47 31 34 23 11 36 39 14 33 10 44
## [51]  5

use the bracket operator [ to re-order each column in the data frame

my_df <- my_df[ind,]
my_df
##    rank               states
## 51    1              Wyoming
## 9     2 District of Columbia
## 46    3              Vermont
## 35    4         North Dakota
## 2     5               Alaska
## 42    6         South Dakota
## 8     7             Delaware
## 27    8              Montana
## 40    9         Rhode Island
## 30   10        New Hampshire
## 20   11                Maine
## 12   12               Hawaii
## 13   13                Idaho
## 28   14             Nebraska
## 49   15        West Virginia
## 32   16           New Mexico
## 29   17               Nevada
## 45   18                 Utah
## 17   19               Kansas
## 4    20             Arkansas
## 25   21          Mississippi
## 16   22                 Iowa
## 7    23          Connecticut
## 37   24             Oklahoma
## 38   25               Oregon
## 18   26             Kentucky
## 19   27            Louisiana
## 41   28       South Carolina
## 1    29              Alabama
## 6    30             Colorado
## 24   31            Minnesota
## 50   32            Wisconsin
## 21   33             Maryland
## 26   34             Missouri
## 43   35            Tennessee
## 3    36              Arizona
## 15   37              Indiana
## 22   38        Massachusetts
## 48   39           Washington
## 47   40             Virginia
## 31   41           New Jersey
## 34   42       North Carolina
## 23   43             Michigan
## 11   44              Georgia
## 36   45                 Ohio
## 39   46         Pennsylvania
## 14   47             Illinois
## 33   48             New York
## 10   49              Florida
## 44   50                Texas
## 5    51           California

Q7:The na_example vector represents a series of counts. You can quickly examine the object using:

data(“na_example”)

str(na_example)

#> int [1:1000] 2 1 3 2 1 3 1 4 3 2 …

However, when we compute the average with the function mean, we obtain an NA: mean(na_example)

#> [1] NA The is.na function returns a logical vector that tells us which entries are NA. Assign this logical vector to an object called ind and determine how many NAs does na_example have.

data("na_example")
str(na_example)
##  int [1:1000] 2 1 3 2 1 3 1 4 3 2 ...
index<-is.na(na_example)
head(index)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
num_na <- sum(index)
num_na
## [1] 145

Q:8 . Now compute the average again, but only for the entries that are not NA. Hint: remember the !operator.

mean_no_na <- mean(na_example[!index])

mean_no_na
## [1] 2.301754

EXERCISE 3.13

Q1:Previously we created this data frame:

temp <- c(35, 88, 42, 84, 81, 30)

city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”) city_temps <- data.frame(name = city, temperature = temp)

Remake the data frame using the code above, but add a line that converts the temperature from Fahrenheit to Celsius. The conversion is C = 5/9 × (F − 32).

temp <- c(35, 88, 42, 84, 81, 30)
#converts the temperature from Fahrenheit to Celsius
temp_C<-5/9*(temp-32)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", "San Juan", "Toronto")
city_temps <- data.frame(name = city, temperature = temp, centigrade=temp_C)
city_temps
##             name temperature centigrade
## 1        Beijing          35   1.666667
## 2          Lagos          88  31.111111
## 3          Paris          42   5.555556
## 4 Rio de Janeiro          84  28.888889
## 5       San Juan          81  27.222222
## 6        Toronto          30  -1.111111

Q2:What is the following sum 1+1/22 + 1/32 + … 1/1002? Hint: thanks to Euler, we know it should be close to π2/6

terms<-1/(1:1000)*2
sum_result<-sum(terms)
sum_result
## [1] 14.97094

Q3:Compute the per 100,000 murder rate for each state and store it in the object murder_rate. Then compute the average murder rate for the US using the function mean. What is the average.

rate for each state and store it in the object murder_rate

murders_rate=(murders$total/murders$population)*1000000
murders_rate
##  [1]  28.244238  26.751860  36.295273  31.893901  33.741383  12.924531
##  [7]  27.139722  42.319369 164.527532  33.980688  37.903226   5.145920
## [13]   7.655102  28.369608  21.900730   6.893484  22.081106  26.732010
## [19]  77.425810   8.280881  50.748655  18.021791  41.786225   9.992600
## [25]  40.440846  53.598917  12.128379  17.521372  31.104763   3.798036
## [31]  27.980319  32.537239  26.679599  29.993237   5.947151  26.871225
## [37]  29.589340   9.396843  35.977513  15.200933  44.753235   9.825837
## [43]  34.509357  32.013603   7.959810   3.196211  31.246001  13.829942
## [49]  14.571013  17.056487   8.871131

average

mean(murders_rate)
## [1] 27.79125