3.3 Exercises 3 & 4 on page 52 (Questions 1 - 4 in Section 3.3 were covered in Assignment 1)

3.5 Exercises 1, 3 & 5 on pages 58 & 59 (Questions 1 & 3 in Section 3.5 were covered in assignment 1. Question 5 follows. )

3.5.5.
We saw that the region column stores a factor.
You can corroborate this by typing:
class(murders$region) With one line of code,
use the function levels and length to ### determine
the number of regions defined by this dataset.
A: 4 regions (code follows)
library(dslabs)
data(murders)

with(murders, length(levels(region)))

3.8 Exercises 1, 8 & 12 on page 63 ( Questions 1 - 6, 9 &12 in Section 3.8 were covered by assignment 1. Question 8 follows. )

3.8.8. Create a vector of numbers that starts at 6,
does not pass 55, and adds numbers in increments of 4/7:
6, 6+4/7, 6+8/7, etc.. How many numbers does the list have? ### Hint: use seq and length.
A: 86 elements (code follows)
my_vector = seq(6, 55, 4/7)
length(my_vector)

3.10 Exercises 5 & 7 on page 66 (Questions 1, 2, 5 & 7 in Section 3.10 were covered in assignment 1)

3.12 Exercises 2 & 3 on page 68
3.12.2. What is the following sum 1 + 1/2^2 +
1/3^2 + . . . 1/100^2?
Hint: thanks to Euler, we know it should be
close to (pi^2)/6.
A: 1.634984 (code follows)

sum(1 / seq(1, 100) ^2)
3.12.3 Compute the per 100,000 murder rate for
each state and store it in the object murder_rate.
Then compute the average murder rate for the US
using the function mean. What is the average?
A: The average is 2.779125 (code follows)
murders$murder_rate<- with(murders, total/population * 10^5)

mean(murders$murder_rate)

3.14 Exercises 7 & 8 on page 71

3.14.7 Use the %in% operator to create a logical
vector that answers the question: which of the
following are actual abbreviations: MA, ME, MI, MO, MU ?
A: MU is not a valid abbreviation. (Code follows)
abbreviations = c("MA", "ME", "MI", "MO", "MU")
abbreviations %in% murders$abb
3.14.8. Extend the code you used in exercise 7
to report the one entry that is not an actual
abbreviation.
Hint: use the ! operator, which turns FALSE into TRUE and ### vice versa, then which to obtain an index.
A: (see code)
abbreviations[which(!abbreviations %in% murders$abb)]

3.16 Exercises 1 - 3 on page 74 (Questions 1 - 3 in Section 3.16 were covered in Assignment 1)

4.6 Exercises 1, 2 & 4 on pages 81 & 82 ##### 4.6.6 After running the code below, what is the ##### value of x? ##### x <- 3 ##### my_func <- function(y){ ##### x <- 5 ##### y+5 ##### }

A: The value of x remains three because the
function is never called
5.15 Exercise 4 on page 103
Write tidyverse code that is equivalent to this code:
exp(mean(log(murders$population))). Write
it using the pipe so that each function is called
without arguments. Use the dot operator to access the
population.
Hint: The code should start with murders %>%.
A: Code follows
##### install.packages("tidyverse")
library(tidyverse)
murders %>% 
  .$population %>%
  log  %>% 
  mean %>% 
  exp 
###install.packages('NHANES')
library("NHANES")
data("na_example")
mean(na_example, na.rm = TRUE)
sd(na_example, na.rm = TRUE)
5.9.1. We will provide some basic facts about blood
pressure. First let’s select a group to set the
standard.
We will use 20-29 year old females. AgeDecade is
a categorical variable with these ages. Note that
the category is coded like ” 20-29“, with a space
in front! What is the average and standard deviation
of systolic blood pressure as saved in the BPSysAve
variable? Save it to a variable called ref.
Hint: Use filter and summarize and use the na.rm = TRUE
argument when computing the average and standard
deviation. You can also filter the NA values using filter.
data("NHANES")

ref = NHANES %>%
  filter(AgeDecade == " 20-29" & Gender == 'female') %>% 
  summarize(average=mean(BPSysAve, na.rm=TRUE), std_dev=sd(BPSysAve, na.rm=TRUE) )
  
ref
5.9.2. Using a pipe, assign the average to a numeric
variable ref_avg.
Hint: Use the code similar to above and then pull.
data("NHANES")

ref_avg = ref %>% pull(average)
  
ref_avg
5.9.3. Now report the min and max values for the same group.
NHANES %>%
  filter(AgeDecade == " 20-29" & Gender == 'female') %>% 
  summarize(min=min(BPSysAve, na.rm=TRUE), max=max(BPSysAve, na.rm=TRUE) )
5.9.4. Compute the average and standard deviation for females,
but for each age group separately rather than a selected decade
as in question 1.
Note that the age groups are defined by AgeDecade.
Hint: rather than filtering by age and gender, filter by Gender
and then use group_by.
NHANES %>% 
  filter(Gender=='female') %>%
  group_by(AgeDecade) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
5.9.5. Repeat exercise 4 for males.
NHANES %>% 
  filter(Gender=='male') %>%
  group_by(AgeDecade) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
5.9.6. We can actually combine both summaries for exercises 4 and 5
into one line of code. This is because group_by permits us to
group by more than one variable. Obtain one big summary table
using group_by(AgeDecade, Gender).
NHANES %>% 

  group_by(AgeDecade, Gender) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
5.9.7. For males between the ages of 40-49, compare systolic
blood pressure across race as reported in the Race1 variable.
Order the resulting table from lowest to highest average
systolic blood pressure.

NHANES %>% 
  filter(Gender=='male', AgeDecade==' 40-49')%>% 
  group_by(Race1) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) ) %>%
  arrange(average)
  
  
