For these exercises we will use the US murders dataset. Make sure you load it prior to starting.
library(dslabs)
data("murders")
Q1: Use the $ operator to access the population size data and store it as the object pop. Then use the sort function to redefine so that it is sorted. Finally, use the [ operator to report the smallest population size.
Use the $ operator to access the population size
pop<-c(murders$population)
pop
## [1] 4779736 710231 6392017 2915918 37253956 5029196 3574097 897934
## [9] 601723 19687653 9920000 1360301 1567582 12830632 6483802 3046355
## [17] 2853118 4339367 4533372 1328361 5773552 6547629 9883640 5303925
## [25] 2967297 5988927 989415 1826341 2700551 1316470 8791894 2059179
## [33] 19378102 9535483 672591 11536504 3751351 3831074 12702379 1052567
## [41] 4625364 814180 6346105 25145561 2763885 625741 8001024 6724540
## [49] 1852994 5686986 563626
use the sort function to redefine pop
sort(pop)
## [1] 563626 601723 625741 672591 710231 814180 897934 989415
## [9] 1052567 1316470 1328361 1360301 1567582 1826341 1852994 2059179
## [17] 2700551 2763885 2853118 2915918 2967297 3046355 3574097 3751351
## [25] 3831074 4339367 4533372 4625364 4779736 5029196 5303925 5686986
## [33] 5773552 5988927 6346105 6392017 6483802 6547629 6724540 8001024
## [41] 8791894 9535483 9883640 9920000 11536504 12702379 12830632 19378102
## [49] 19687653 25145561 37253956
use the [ operator to report the smallest population size
murders$population[which.min(pop)]
## [1] 563626
Q2:Now instead of the smallest population size, fnd the index of the entry with the smallest population size. Hint: use order instead of sort.
index of population size
x<-(murders$population)
order(x)
## [1] 51 9 46 35 2 42 8 27 40 30 20 12 13 28 49 32 29 45 17 4 25 16 7 37 38
## [26] 18 19 41 1 6 24 50 21 26 43 3 15 22 48 47 31 34 23 11 36 39 14 33 10 44
## [51] 5
index of the entry with the smallest population size
order(x)[1]
## [1] 51
Q3:We can actually perform the same operation as in the previous exercise using the function which.min.Write one line of code that does this
murders$population[which.min(pop)]
## [1] 563626
Q:4 Now we know how small the smallest state is and we know which row represents it. Which state is it? Define a variable states to be the state names from the murders data frame. Report the name of the state with the smallest population.
Define a variable states to be the state names from the murders data frame
states<-c(murders$state)
states
## [1] "Alabama" "Alaska" "Arizona"
## [4] "Arkansas" "California" "Colorado"
## [7] "Connecticut" "Delaware" "District of Columbia"
## [10] "Florida" "Georgia" "Hawaii"
## [13] "Idaho" "Illinois" "Indiana"
## [16] "Iowa" "Kansas" "Kentucky"
## [19] "Louisiana" "Maine" "Maryland"
## [22] "Massachusetts" "Michigan" "Minnesota"
## [25] "Mississippi" "Missouri" "Montana"
## [28] "Nebraska" "Nevada" "New Hampshire"
## [31] "New Jersey" "New Mexico" "New York"
## [34] "North Carolina" "North Dakota" "Ohio"
## [37] "Oklahoma" "Oregon" "Pennsylvania"
## [40] "Rhode Island" "South Carolina" "South Dakota"
## [43] "Tennessee" "Texas" "Utah"
## [46] "Vermont" "Virginia" "Washington"
## [49] "West Virginia" "Wisconsin" "Wyoming"
Report the name of the state with the smallest population.
murders$state[which.min(murders$population)]
## [1] "Wyoming"
Q:5 . You can create a data frame using the data.frame function. Here is a quick example:
temp <- c(35, 88, 42, 84, 81, 30)
city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”) city_temps <- data.frame(name = city, temperature = temp)
Use the rank function to determine the population rank of each state from smallest population size to biggest. Save these ranks in an object called ranks, then create a data frame with the state name and its rank. Call the data frame my_df
Use the rank function to determine the population rank of each state from smallest population size to biggest.
pop<-(murders$population)
pop
## [1] 4779736 710231 6392017 2915918 37253956 5029196 3574097 897934
## [9] 601723 19687653 9920000 1360301 1567582 12830632 6483802 3046355
## [17] 2853118 4339367 4533372 1328361 5773552 6547629 9883640 5303925
## [25] 2967297 5988927 989415 1826341 2700551 1316470 8791894 2059179
## [33] 19378102 9535483 672591 11536504 3751351 3831074 12702379 1052567
## [41] 4625364 814180 6346105 25145561 2763885 625741 8001024 6724540
## [49] 1852994 5686986 563626
Save these ranks in an object called ranks
ranks<-rank(pop)
ranks
## [1] 29 5 36 20 51 30 23 7 2 49 44 12 13 47 37 22 19 26 27 11 33 38 43 31 21
## [26] 34 8 14 17 10 41 16 48 42 4 45 24 25 46 9 28 6 35 50 18 3 40 39 15 32
## [51] 1
create a data frame with the state name and its rank.Call the data frame my_df
my_df<-data.frame(rank=ranks, states=states)
my_df
## rank states
## 1 29 Alabama
## 2 5 Alaska
## 3 36 Arizona
## 4 20 Arkansas
## 5 51 California
## 6 30 Colorado
## 7 23 Connecticut
## 8 7 Delaware
## 9 2 District of Columbia
## 10 49 Florida
## 11 44 Georgia
## 12 12 Hawaii
## 13 13 Idaho
## 14 47 Illinois
## 15 37 Indiana
## 16 22 Iowa
## 17 19 Kansas
## 18 26 Kentucky
## 19 27 Louisiana
## 20 11 Maine
## 21 33 Maryland
## 22 38 Massachusetts
## 23 43 Michigan
## 24 31 Minnesota
## 25 21 Mississippi
## 26 34 Missouri
## 27 8 Montana
## 28 14 Nebraska
## 29 17 Nevada
## 30 10 New Hampshire
## 31 41 New Jersey
## 32 16 New Mexico
## 33 48 New York
## 34 42 North Carolina
## 35 4 North Dakota
## 36 45 Ohio
## 37 24 Oklahoma
## 38 25 Oregon
## 39 46 Pennsylvania
## 40 9 Rhode Island
## 41 28 South Carolina
## 42 6 South Dakota
## 43 35 Tennessee
## 44 50 Texas
## 45 18 Utah
## 46 3 Vermont
## 47 40 Virginia
## 48 39 Washington
## 49 15 West Virginia
## 50 32 Wisconsin
## 51 1 Wyoming
Q6: Repeat the previous exercise, but this time order my_df so that the states are ordered from least populous to most populous. Hint: create an object ind that stores the indexes needed to order the population values. Then use the bracket operator [ to re-order each column in the data frame
object ind that stores the indexes needed to order the population values.
ind<-order(ranks)
ind
## [1] 51 9 46 35 2 42 8 27 40 30 20 12 13 28 49 32 29 45 17 4 25 16 7 37 38
## [26] 18 19 41 1 6 24 50 21 26 43 3 15 22 48 47 31 34 23 11 36 39 14 33 10 44
## [51] 5
use the bracket operator [ to re-order each column in the data frame
my_df <- my_df[ind,]
my_df
## rank states
## 51 1 Wyoming
## 9 2 District of Columbia
## 46 3 Vermont
## 35 4 North Dakota
## 2 5 Alaska
## 42 6 South Dakota
## 8 7 Delaware
## 27 8 Montana
## 40 9 Rhode Island
## 30 10 New Hampshire
## 20 11 Maine
## 12 12 Hawaii
## 13 13 Idaho
## 28 14 Nebraska
## 49 15 West Virginia
## 32 16 New Mexico
## 29 17 Nevada
## 45 18 Utah
## 17 19 Kansas
## 4 20 Arkansas
## 25 21 Mississippi
## 16 22 Iowa
## 7 23 Connecticut
## 37 24 Oklahoma
## 38 25 Oregon
## 18 26 Kentucky
## 19 27 Louisiana
## 41 28 South Carolina
## 1 29 Alabama
## 6 30 Colorado
## 24 31 Minnesota
## 50 32 Wisconsin
## 21 33 Maryland
## 26 34 Missouri
## 43 35 Tennessee
## 3 36 Arizona
## 15 37 Indiana
## 22 38 Massachusetts
## 48 39 Washington
## 47 40 Virginia
## 31 41 New Jersey
## 34 42 North Carolina
## 23 43 Michigan
## 11 44 Georgia
## 36 45 Ohio
## 39 46 Pennsylvania
## 14 47 Illinois
## 33 48 New York
## 10 49 Florida
## 44 50 Texas
## 5 51 California
Q7:The na_example vector represents a series of counts. You can quickly examine the object using:
data(“na_example”)
str(na_example)
#> int [1:1000] 2 1 3 2 1 3 1 4 3 2 …
However, when we compute the average with the function mean, we obtain an NA: mean(na_example)
#> [1] NA The is.na function returns a logical vector that tells us which entries are NA. Assign this logical vector to an object called ind and determine how many NAs does na_example have.
data("na_example")
str(na_example)
## int [1:1000] 2 1 3 2 1 3 1 4 3 2 ...
index<-is.na(na_example)
head(index)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
num_na <- sum(index)
num_na
## [1] 145
Q:8 . Now compute the average again, but only for the entries that are not NA. Hint: remember the !operator.
mean_no_na <- mean(na_example[!index])
mean_no_na
## [1] 2.301754
Q1:Previously we created this data frame:
temp <- c(35, 88, 42, 84, 81, 30)
city <- c(“Beijing”, “Lagos”, “Paris”, “Rio de Janeiro”, “San Juan”, “Toronto”) city_temps <- data.frame(name = city, temperature = temp)
Remake the data frame using the code above, but add a line that converts the temperature from Fahrenheit to Celsius. The conversion is C = 5/9 × (F − 32).
temp <- c(35, 88, 42, 84, 81, 30)
#converts the temperature from Fahrenheit to Celsius
temp_C<-5/9*(temp-32)
city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", "San Juan", "Toronto")
city_temps <- data.frame(name = city, temperature = temp, centigrade=temp_C)
city_temps
## name temperature centigrade
## 1 Beijing 35 1.666667
## 2 Lagos 88 31.111111
## 3 Paris 42 5.555556
## 4 Rio de Janeiro 84 28.888889
## 5 San Juan 81 27.222222
## 6 Toronto 30 -1.111111
Q2:What is the following sum 1+1/22 + 1/32 + … 1/1002? Hint: thanks to Euler, we know it should be close to π2/6
terms<-1/(1:1000)*2
sum_result<-sum(terms)
sum_result
## [1] 14.97094
Q3:Compute the per 100,000 murder rate for each state and store it in the object murder_rate. Then compute the average murder rate for the US using the function mean. What is the average.
rate for each state and store it in the object murder_rate
murders_rate=(murders$total/murders$population)*1000000
murders_rate
## [1] 28.244238 26.751860 36.295273 31.893901 33.741383 12.924531
## [7] 27.139722 42.319369 164.527532 33.980688 37.903226 5.145920
## [13] 7.655102 28.369608 21.900730 6.893484 22.081106 26.732010
## [19] 77.425810 8.280881 50.748655 18.021791 41.786225 9.992600
## [25] 40.440846 53.598917 12.128379 17.521372 31.104763 3.798036
## [31] 27.980319 32.537239 26.679599 29.993237 5.947151 26.871225
## [37] 29.589340 9.396843 35.977513 15.200933 44.753235 9.825837
## [43] 34.509357 32.013603 7.959810 3.196211 31.246001 13.829942
## [49] 14.571013 17.056487 8.871131
average
mean(murders_rate)
## [1] 27.79125