R Markdown

EXERCISE 3.6

Q:1 Load the US murders dataset.

library(dslabs) data(murders) Use the function str to examine the structure of the murders object. We can see that this object is a data frame with 51 rows and fve columns. Which of the following best describes the variables represented in this data frame? A. The 51 states. B. The murder rates for all 50 states and DC. C. The state name, the abbreviation of the state name, the state's region, and the state's population and total number of murders for 2010. D. str shows no relevant information.

library(dslabs)

data(murders)
head(murders)
##        state abb region population total
## 1    Alabama  AL  South    4779736   135
## 2     Alaska  AK   West     710231    19
## 3    Arizona  AZ   West    6392017   232
## 4   Arkansas  AR  South    2915918    93
## 5 California  CA   West   37253956  1257
## 6   Colorado  CO   West    5029196    65
str(murders)
## 'data.frame':    51 obs. of  5 variables:
##  $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ abb       : chr  "AL" "AK" "AZ" "AR" ...
##  $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
##  $ population: num  4779736 710231 6392017 2915918 37253956 ...
##  $ total     : num  135 19 232 93 1257 ...

(C) option is best as it best decribe all these variables The state name, the abbreviation of the state name, the state’s region, and the state’s population and total number of murders for 2010

Q:2 What are the column names used by the data frame for these five variables?

colnames(murders)
## [1] "state"      "abb"        "region"     "population" "total"

Q:3 Use the accessor $ to extract the state abbreviations and assign them to the object a. What is the class of this object?

a<-(murders$abb)
a
##  [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
## [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
## [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
## [46] "VT" "VA" "WA" "WV" "WI" "WY"
class(a)
## [1] "character"

Q:4 Now use the square brackets to extract the state abbreviations and assign them to the object b. Use the identical function to determine if a and b are the same

b<-murders[["abb"]]
b
##  [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
## [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
## [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
## [46] "VT" "VA" "WA" "WV" "WI" "WY"
identical(a,b)
## [1] TRUE

Q:5 We saw that the region column stores a factor. You can corroborate this by typing: class(murders$region) With one line of code, use the function levels and length to determine the number of regions defned by this dataset.

class(murders$region)
## [1] "factor"
length(levels(murders$region))
## [1] 4

Q:6 The function table takes a vector and returns the frequency of each element. You can quickly see how many states are in each region by applying this function. Use this function in one line of code to create a table of states per region

table(murders$region)
## 
##     Northeast         South North Central          West 
##             9            17            12            13

EXERCISE 3.9

Q:1 Use the function c to create a vector with the average high temperatures in January for Beijing, Lagos,Paris, Rio de Janeiro, San Juan and Toronto, which are 35, 88, 42, 84, 81, and 30 degrees Fahrenheit.Call the object temp.

temp<-c('Beijing'=35,'Lagos'=88,'Paris'=42,' Rio de Janeiro'=84, 'San Juan'=81, 'Toronto'=30)
temp
##         Beijing           Lagos           Paris  Rio de Janeiro        San Juan 
##              35              88              42              84              81 
##         Toronto 
##              30

Q:2 Now create a vector with the city names and call the object city.

city<-c( 'Beijing', 'Lagos','Paris', 'Rio de Janeiro', 'San Juan', 'Toronto')
city
## [1] "Beijing"        "Lagos"          "Paris"          "Rio de Janeiro"
## [5] "San Juan"       "Toronto"

Q:3 Use the names function and the objects defined in the previous exercises to associate the temperature data with its corresponding city

city<-c( 'Beijing', 'Lagos','Paris', 'Rio de Janeiro', 'San Juan', 'Toronto')
temp<-c(35, 88, 42, 84, 81,30 )
names(temp)<-city
temp
##        Beijing          Lagos          Paris Rio de Janeiro       San Juan 
##             35             88             42             84             81 
##        Toronto 
##             30

Q:4 Use the [ and : operators to access the temperature of the first three cities on the list.

city[1:3]
## [1] "Beijing" "Lagos"   "Paris"

Q:5 Use the [ operator to access the temperature of Paris and San Juan.

temp['Paris']
## Paris 
##    42
temp['San Juan']
## San Juan 
##       81

Q:6 Use the : operator to create a sequence of numbers 12, 13, 14,..., 73

seq(12,73)
##  [1] 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [26] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
## [51] 62 63 64 65 66 67 68 69 70 71 72 73

Q:7 Create a vector containing all the positive odd numbers smaller than 100.

positive_odd_num<-seq(1,100,2)
positive_odd_num
##  [1]  1  3  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
## [26] 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99

Q:8 Create a vector of numbers that starts at 6, does not pass 55, and adds numbers in increments of 4/7:6, 6+4/7, 6+8/7, etc.. How many numbers does the list have? Hint: use seq and length.

my_sequence <- seq(from = 6, by = 4/7, to = 55)
my_sequence
##  [1]  6.000000  6.571429  7.142857  7.714286  8.285714  8.857143  9.428571
##  [8] 10.000000 10.571429 11.142857 11.714286 12.285714 12.857143 13.428571
## [15] 14.000000 14.571429 15.142857 15.714286 16.285714 16.857143 17.428571
## [22] 18.000000 18.571429 19.142857 19.714286 20.285714 20.857143 21.428571
## [29] 22.000000 22.571429 23.142857 23.714286 24.285714 24.857143 25.428571
## [36] 26.000000 26.571429 27.142857 27.714286 28.285714 28.857143 29.428571
## [43] 30.000000 30.571429 31.142857 31.714286 32.285714 32.857143 33.428571
## [50] 34.000000 34.571429 35.142857 35.714286 36.285714 36.857143 37.428571
## [57] 38.000000 38.571429 39.142857 39.714286 40.285714 40.857143 41.428571
## [64] 42.000000 42.571429 43.142857 43.714286 44.285714 44.857143 45.428571
## [71] 46.000000 46.571429 47.142857 47.714286 48.285714 48.857143 49.428571
## [78] 50.000000 50.571429 51.142857 51.714286 52.285714 52.857143 53.428571
## [85] 54.000000 54.571429
length(my_sequence)
## [1] 86

Q:9 What is the class of the following object a <- seq(1, 10, 0.5)?

a <- seq(1, 10, 0.5)
class(a)
## [1] "numeric"

Q:10 What is the class of the following object a <- seq(1, 10)?

a <- seq(1, 10)
class(a)
## [1] "integer"

Q:11 The class of class(a<-1) is numeric, not integer. R defaults to numeric and to force an integer, you need to add the letter L. Confrm that the class of 1L is integer.

class(a<-1)
## [1] "numeric"
class(a<-1L) 
## [1] "integer"

Q:12 Defne the following vector:x <- c(“1”, “3”, “5”) and coerce it to get integers

x <- c("1", "3", "5")
x<-as.integer(x)
x
## [1] 1 3 5