Assignment-1.utf8

Questions 1 - 4 in Section 3.3 Exercises on page 52

3.3.1 What is the sum of the first 100 positive integers? The formula for the sum of integers 1 through n is n(n + 1)/2. Define n = 100 and then use R to compute the sum of 1 through 100 using the formula.

What is the sum?

sum_n <- function(n){
   n*(n+1)/2
  }

sum_n(100)

## [1] 5050

3.3.2 Now use the same formula to compute the sum of the integers from 1 through 1,000.

# Question 2
sum_n(1000)

## [1] 500500

3.3.3

Look at the result of typing the following code into R: n <- 1000 x <- seq(1, n) sum(x)

Based on the result, what do you think the functions seq and sum do? You can use the help system: A. sum creates a list of numbers and seq adds them up. *B. seq creates a list of numbers and sum adds them up. C. seq computes the difference between two arguments and sum computes the sum of 1 through 1000. D. sum always returns the same number.

n <- 1000
x <- seq(1, n)
sum(x)

## [1] 500500

# B, Seq creates a list of Numbers and Sum adds them up.

3.3.4

In math and programming, we say that we evaluate a function when we replace the argument with a given number. So if we type sqrt(4), we evaluate the sqrt function. In R, you can evaluate a function inside another function. The evaluations happen from the inside out. Use one line of code to compute the log, in base 10, of the square root of 100.

log(sqrt(100), 10)

## [1] 1

Questions 1 & 3 in Section 3.5 Exercises on page 58

library(dslabs)
data(murders)

str(murders)

## 'data.frame':    51 obs. of  5 variables:
##  $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ abb       : chr  "AL" "AK" "AZ" "AR" ...
##  $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
##  $ population: num  4779736 710231 6392017 2915918 37253956 ...
##  $ total     : num  135 19 232 93 1257 ...

3.5.1

Which of the following best describes the variables represented in this data frame?

C. The state name, the abbreviation of the state name, the state’s region, and the state’s population and total number of murders for 2010.

3.5.3 Use the accessor $ to extract the state abbreviations and assign them to the object a. What is the class of this object?

a <- murders$abb

class(a)

## [1] "character"

Questions 1 - 6, 9 &12 in Section 3.8 Exercises on page 63

Use the function c to create a vector with the average high temperatures in January for Beijing, Lagos, Paris, Rio de Janeiro, San Juan and Toronto, which are 35, 88, 42, 84, 81, and 30 degrees Fahrenheit. Call the object temp.

temp <- c(35, 88, 42, 84, 81, 30)

Now create a vector with the city names and call the object city.

city <- c("Beijing", "Lagos", "Paris", "Rio de Janeiro", "San Juan", "Toronto")

Use the names function and the objects defined in the previous exercises to associate the temperature data with its corresponding city.

names(temp)<-city
temp

##        Beijing          Lagos          Paris Rio de Janeiro       San Juan 
##             35             88             42             84             81 
##        Toronto 
##             30

Use the [ and : operators to access the temperature of the first three cities on the list.

temp[1:3]

## Beijing   Lagos   Paris 
##      35      88      42

Use the [ operator to access the temperature of Paris and San Juan.

temp[c("Paris", "San Juan")]

##    Paris San Juan 
##       42       81

Use the : operator to create a sequence of numbers 12, 13, 14, . . . , 73.

12:73

##  [1] 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## [26] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
## [51] 62 63 64 65 66 67 68 69 70 71 72 73

What is the class of the following object a <- seq(1, 10, 0.5)?

a <- seq(1, 10, 0.5)
class(a)

## [1] "numeric"

Define the following vector: x <- c(“1”, “3”, “5”) and coerce it to get integers.

x <- c("1", "3", "5")
as.numeric(x)

## [1] 1 3 5

Questions 1, 2, 5 & 7 in Section 3.10 Exercises on page 66

library(dslabs)
data("murders")

Use the $ operator to access the population size data and store it as the object pop. Then use the sort function to redefine pop so that it is sorted. Finally, use the [ operator to report the smallest population size.

pop =murders$population

sort(pop)[1]

## [1] 563626

Now instead of the smallest population size, find the index of the entry with the smallest population size. Hint: use order instead of sort.

order(pop)[1]

## [1] 51

Questions 1 - 6 in Section 3.14 Exercises on page 71

Start by loading the library and data.

Compute the per 100,000 murder rate for each state and store it in an object called murder_rate. Then use logical operators to create a logical vector named low that tells us which entries of murder_rate are lower than 1.

murders["murder_rate"] = murders$total / murders$population * 100000 
murders["low"] = murders$murder_rate < 1
murders

##                   state abb        region population total murder_rate   low
## 1               Alabama  AL         South    4779736   135   2.8244238 FALSE
## 2                Alaska  AK          West     710231    19   2.6751860 FALSE
## 3               Arizona  AZ          West    6392017   232   3.6295273 FALSE
## 4              Arkansas  AR         South    2915918    93   3.1893901 FALSE
## 5            California  CA          West   37253956  1257   3.3741383 FALSE
## 6              Colorado  CO          West    5029196    65   1.2924531 FALSE
## 7           Connecticut  CT     Northeast    3574097    97   2.7139722 FALSE
## 8              Delaware  DE         South     897934    38   4.2319369 FALSE
## 9  District of Columbia  DC         South     601723    99  16.4527532 FALSE
## 10              Florida  FL         South   19687653   669   3.3980688 FALSE
## 11              Georgia  GA         South    9920000   376   3.7903226 FALSE
## 12               Hawaii  HI          West    1360301     7   0.5145920  TRUE
## 13                Idaho  ID          West    1567582    12   0.7655102  TRUE
## 14             Illinois  IL North Central   12830632   364   2.8369608 FALSE
## 15              Indiana  IN North Central    6483802   142   2.1900730 FALSE
## 16                 Iowa  IA North Central    3046355    21   0.6893484  TRUE
## 17               Kansas  KS North Central    2853118    63   2.2081106 FALSE
## 18             Kentucky  KY         South    4339367   116   2.6732010 FALSE
## 19            Louisiana  LA         South    4533372   351   7.7425810 FALSE
## 20                Maine  ME     Northeast    1328361    11   0.8280881  TRUE
## 21             Maryland  MD         South    5773552   293   5.0748655 FALSE
## 22        Massachusetts  MA     Northeast    6547629   118   1.8021791 FALSE
## 23             Michigan  MI North Central    9883640   413   4.1786225 FALSE
## 24            Minnesota  MN North Central    5303925    53   0.9992600  TRUE
## 25          Mississippi  MS         South    2967297   120   4.0440846 FALSE
## 26             Missouri  MO North Central    5988927   321   5.3598917 FALSE
## 27              Montana  MT          West     989415    12   1.2128379 FALSE
## 28             Nebraska  NE North Central    1826341    32   1.7521372 FALSE
## 29               Nevada  NV          West    2700551    84   3.1104763 FALSE
## 30        New Hampshire  NH     Northeast    1316470     5   0.3798036  TRUE
## 31           New Jersey  NJ     Northeast    8791894   246   2.7980319 FALSE
## 32           New Mexico  NM          West    2059179    67   3.2537239 FALSE
## 33             New York  NY     Northeast   19378102   517   2.6679599 FALSE
## 34       North Carolina  NC         South    9535483   286   2.9993237 FALSE
## 35         North Dakota  ND North Central     672591     4   0.5947151  TRUE
## 36                 Ohio  OH North Central   11536504   310   2.6871225 FALSE
## 37             Oklahoma  OK         South    3751351   111   2.9589340 FALSE
## 38               Oregon  OR          West    3831074    36   0.9396843  TRUE
## 39         Pennsylvania  PA     Northeast   12702379   457   3.5977513 FALSE
## 40         Rhode Island  RI     Northeast    1052567    16   1.5200933 FALSE
## 41       South Carolina  SC         South    4625364   207   4.4753235 FALSE
## 42         South Dakota  SD North Central     814180     8   0.9825837  TRUE
## 43            Tennessee  TN         South    6346105   219   3.4509357 FALSE
## 44                Texas  TX         South   25145561   805   3.2013603 FALSE
## 45                 Utah  UT          West    2763885    22   0.7959810  TRUE
## 46              Vermont  VT     Northeast     625741     2   0.3196211  TRUE
## 47             Virginia  VA         South    8001024   250   3.1246001 FALSE
## 48           Washington  WA          West    6724540    93   1.3829942 FALSE
## 49        West Virginia  WV         South    1852994    27   1.4571013 FALSE
## 50            Wisconsin  WI North Central    5686986    97   1.7056487 FALSE
## 51              Wyoming  WY          West     563626     5   0.8871131  TRUE

Now use the results from the previous exercise and the function which to determine the indices of murder_rate associated with values lower than 1.

which(murders$low)

##  [1] 12 13 16 20 24 30 35 38 42 45 46 51

Use the results from the previous exercise to report the names of the states with murder rates lower than 1.

murders$state[which(murders$low)]

##  [1] "Hawaii"        "Idaho"         "Iowa"          "Maine"        
##  [5] "Minnesota"     "New Hampshire" "North Dakota"  "Oregon"       
##  [9] "South Dakota"  "Utah"          "Vermont"       "Wyoming"

Now extend the code from exercise 2 and 3 to report the states in the Northeast with murder rates lower than 1. Hint: use the previously defined logical vector low and the logical operator &.

murders$state[which(murders$low & murders$region == "Northeast")]

## [1] "Maine"         "New Hampshire" "Vermont"

Questions 1 - 3 in Section 3.16 Exercises on page 74

We made a plot of total murders versus population and noted a strong relationship. Not surprisingly, states with larger populations had more murders.

library(dslabs)
data(murders)
population_in_millions <- murders$population/10^6
total_gun_murders <- murders$total
plot(population_in_millions, total_gun_murders)

Keep in mind that many states have populations below 5 million and are bunched up. We may gain further insights from making this plot in the log scale. Transform the variables using the log10 transformation and then plot them.

plot(log10(population_in_millions), log10(total_gun_murders))

2. Create a histogram of the state populations.

hist(murders$population)

hist(with(murders, log10(population)))

Generate boxplots of the state populations by region.

boxplot((population/1000000)~region, murders)