3.3 Exercises 3 & 4 on page 52 (Questions 1 - 4 in Section 3.3 were covered in Assignment 1)

3.5 Exercises 1, 3 & 5 on pages 58 & 59 (Questions 1 & 3 in Section 3.5 were covered in assignment 1. Question 5 follows. )

3.5.5.
We saw that the region column stores a factor.
You can corroborate this by typing:
class(murders$region) With one line of code,
use the function levels and length to ### determine
the number of regions defined by this dataset.
A: 4 regions (code follows)
library(dslabs)
data(murders)

with(murders, length(levels(region)))

3.8 Exercises 1, 8 & 12 on page 63 ( Questions 1 - 6, 9 &12 in Section 3.8 were covered by assignment 1. Question 8 follows. )

3.8.8. Create a vector of numbers that starts at 6,
does not pass 55, and adds numbers in increments of 4/7:
6, 6+4/7, 6+8/7, etc.. How many numbers does the list have? ### Hint: use seq and length.
A: 86 elements (code follows)
my_vector = seq(6, 55, 4/7)
length(my_vector)

3.10 Exercises 5 & 7 on page 66 (Questions 1, 2, 5 & 7 in Section 3.10 were covered in assignment 1)

3.12 Exercises 2 & 3 on page 68
3.12.2. What is the following sum 1 + 1/2^2 +
1/3^2 + . . . 1/100^2?
Hint: thanks to Euler, we know it should be
close to (pi^2)/6.
A: 1.634984 (code follows)

sum(1 / seq(1, 100) ^2)
3.12.3 Compute the per 100,000 murder rate for
each state and store it in the object murder_rate.
Then compute the average murder rate for the US
using the function mean. What is the average?
A: The average is 2.779125 (code follows)
murders$murder_rate<- with(murders, total/population * 10^5)

mean(murders$murder_rate)

3.14 Exercises 7 & 8 on page 71

3.14.7 Use the %in% operator to create a logical
vector that answers the question: which of the
following are actual abbreviations: MA, ME, MI, MO, MU ?
A: MU is not a valid abbreviation. (Code follows)
abbreviations = c("MA", "ME", "MI", "MO", "MU")
abbreviations %in% murders$abb
3.14.8. Extend the code you used in exercise 7
to report the one entry that is not an actual
abbreviation.
Hint: use the ! operator, which turns FALSE into TRUE and ### vice versa, then which to obtain an index.
A: (see code)
abbreviations[which(!abbreviations %in% murders$abb)]

3.16 Exercises 1 - 3 on page 74 (Questions 1 - 3 in Section 3.16 were covered in Assignment 1)

4.6 Exercises 1, 2 & 4 on pages 81 & 82 ##### 4.6.6 After running the code below, what is the ##### value of x? ##### x <- 3 ##### my_func <- function(y){ ##### x <- 5 ##### y+5 ##### }

A: The value of x remains three because the
function is never called
5.15 Exercise 4 on page 103
Write tidyverse code that is equivalent to this code:
exp(mean(log(murders$population))). Write
it using the pipe so that each function is called
without arguments. Use the dot operator to access the
population.
Hint: The code should start with murders %>%.
A: Code follows
##### install.packages("tidyverse")
library(tidyverse)
murders %>% 
  .$population %>%
  log  %>% 
  mean %>% 
  exp 
###install.packages('NHANES')
library("NHANES")
data("na_example")
mean(na_example, na.rm = TRUE)
sd(na_example, na.rm = TRUE)
5.9.1. We will provide some basic facts about blood
pressure. First let’s select a group to set the
standard.
We will use 20-29 year old females. AgeDecade is
a categorical variable with these ages. Note that
the category is coded like ” 20-29“, with a space
in front! What is the average and standard deviation
of systolic blood pressure as saved in the BPSysAve
variable? Save it to a variable called ref.
Hint: Use filter and summarize and use the na.rm = TRUE
argument when computing the average and standard
deviation. You can also filter the NA values using filter.
data("NHANES")

ref = NHANES %>%
  filter(AgeDecade == " 20-29" & Gender == 'female') %>% 
  summarize(average=mean(BPSysAve, na.rm=TRUE), std_dev=sd(BPSysAve, na.rm=TRUE) )
  
ref
5.9.2. Using a pipe, assign the average to a numeric
variable ref_avg.
Hint: Use the code similar to above and then pull.
data("NHANES")

ref_avg = ref %>% pull(average)
  
ref_avg
5.9.3. Now report the min and max values for the same group.
NHANES %>%
  filter(AgeDecade == " 20-29" & Gender == 'female') %>% 
  summarize(min=min(BPSysAve, na.rm=TRUE), max=max(BPSysAve, na.rm=TRUE) )
5.9.4. Compute the average and standard deviation for females,
but for each age group separately rather than a selected decade
as in question 1.
Note that the age groups are defined by AgeDecade.
Hint: rather than filtering by age and gender, filter by Gender
and then use group_by.
NHANES %>% 
  filter(Gender=='female') %>%
  group_by(AgeDecade) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
5.9.5. Repeat exercise 4 for males.
NHANES %>% 
  filter(Gender=='male') %>%
  group_by(AgeDecade) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
5.9.6. We can actually combine both summaries for exercises 4 and 5
into one line of code. This is because group_by permits us to
group by more than one variable. Obtain one big summary table
using group_by(AgeDecade, Gender).
NHANES %>% 

  group_by(AgeDecade, Gender) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
5.9.7. For males between the ages of 40-49, compare systolic
blood pressure across race as reported in the Race1 variable.
Order the resulting table from lowest to highest average
systolic blood pressure.

NHANES %>% 
  filter(Gender=='male', AgeDecade==' 40-49')%>% 
  group_by(Race1) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) ) %>%
  arrange(average)
  
  
---
title: "R Notebook"
output: html_notebook
---





3.3 Exercises 3 & 4 on page 52 
(Questions 1 - 4 in Section 3.3 were covered in Assignment 1)

3.5 Exercises 1, 3 & 5 on pages 58 & 59 
(Questions 1 & 3 in Section 3.5  were covered in assignment 1. Question 5 follows. )



##### 3.5.5. 
##### We saw that the region column stores a factor. 
##### You can corroborate this by typing:
##### class(murders$region) With one line of code, 
##### use the function levels and length to ### determine 
##### the number of regions defined by this dataset.

##### A: 4 regions (code follows)

```{r}
library(dslabs)
data(murders)

with(murders, length(levels(region)))
```


3.8 Exercises 1, 8 & 12 on page 63 
( Questions 1 - 6, 9 &12 in Section 3.8 were covered by assignment 1. Question 8 follows. )


##### 3.8.8. Create a vector of numbers that starts at 6,
##### does not pass 55, and adds numbers in increments of 4/7:
##### 6, 6+4/7, 6+8/7, etc.. How many numbers does the list have? ### Hint: use seq and length.

##### A: 86 elements (code follows)

```{r}
my_vector = seq(6, 55, 4/7)
length(my_vector)



```

3.10 Exercises 5 & 7 on page 66 
(Questions 1, 2, 5 & 7 in Section 3.10 were covered in assignment 1)

##### 3.12 Exercises 2 & 3 on page 68

##### 3.12.2. What is the following sum 1 + 1/2^2 + 
##### 1/3^2 + . . . 1/100^2? 
##### Hint: thanks to Euler, we know it should be
##### close to (pi^2)/6.

##### A: 1.634984 (code follows)

 
```{r}

sum(1 / seq(1, 100) ^2)


```

##### 3.12.3 Compute the per 100,000 murder rate for 
##### each state and store it in the object murder_rate.
##### Then compute the average murder rate for the US 
##### using the function mean. What is the average?

##### A: The average is 2.779125 (code follows)
```{r}
murders$murder_rate<- with(murders, total/population * 10^5)

mean(murders$murder_rate)


```


3.14 Exercises 7 & 8 on page 71

##### 3.14.7 Use the %in% operator to create a logical 
##### vector that answers the question: which of the 
##### following are actual abbreviations: MA, ME, MI, MO, MU ?

##### A: MU is not a valid abbreviation. (Code follows)
```{r}
abbreviations = c("MA", "ME", "MI", "MO", "MU")
abbreviations %in% murders$abb

```

##### 3.14.8. Extend the code you used in exercise 7 
##### to report the one entry that is not an actual 
##### abbreviation.
##### Hint: use the ! operator, which turns FALSE into TRUE and ### vice versa, then which to obtain an index.

##### A: (see code)

```{r}
abbreviations[which(!abbreviations %in% murders$abb)]

```

3.16 Exercises 1 - 3 on page 74
(Questions 1 - 3 in Section 3.16 were covered in Assignment 1)

4.6 Exercises 1, 2 & 4 on pages 81 & 82
##### 4.6.6 After running the code below, what is the
##### value of x?
##### x <- 3
#####  my_func <- function(y){
##### x <- 5
##### y+5
##### }

##### A: The value of x remains three because the 
##### function is never called


##### 5.15 Exercise 4 on page 103

##### Write tidyverse code that is equivalent to this code: 
##### exp(mean(log(murders$population))). Write
##### it using the pipe so that each function is called 
##### without arguments. Use the dot operator to access the
##### population. \

##### Hint: The code should start with murders %>%.

##### A: Code follows
```{r}
##### install.packages("tidyverse")
library(tidyverse)
murders %>% 
  .$population %>%
  log  %>% 
  mean %>% 
  exp 
```



```{r}
###install.packages('NHANES')
library("NHANES")
data("na_example")
```
```{r}
mean(na_example, na.rm = TRUE)
```


```{r}
sd(na_example, na.rm = TRUE)
```
##### 5.9.1. We will provide some basic facts about blood 
##### pressure. First let’s select a group to set the 
##### standard.

##### We will use 20-29 year old females. AgeDecade is 
##### a categorical variable with these ages. Note that 
##### the category is coded like ” 20-29“, with a space 
##### in front! What is the average and standard deviation 
##### of systolic blood pressure as saved in the BPSysAve 
##### variable? Save it to a variable called ref.

##### Hint: Use filter and summarize and use the na.rm = TRUE 
##### argument when computing the average and standard 
##### deviation. You can also filter the NA values using filter.

```{r}
data("NHANES")

ref = NHANES %>%
  filter(AgeDecade == " 20-29" & Gender == 'female') %>% 
  summarize(average=mean(BPSysAve, na.rm=TRUE), std_dev=sd(BPSysAve, na.rm=TRUE) )
  
ref

```


##### 5.9.2. Using a pipe, assign the average to a numeric 
##### variable ref_avg. 
##### Hint: Use the code similar to above and then pull.

```{r}
data("NHANES")

ref_avg = ref %>% pull(average)
  
ref_avg

```
##### 5.9.3. Now report the min and max values for the same group.

```{r}
NHANES %>%
  filter(AgeDecade == " 20-29" & Gender == 'female') %>% 
  summarize(min=min(BPSysAve, na.rm=TRUE), max=max(BPSysAve, na.rm=TRUE) )
```


##### 5.9.4. Compute the average and standard deviation for females, 
##### but for each age group separately rather than a selected decade
##### as in question 1. 

##### Note that the age groups are defined by AgeDecade. 

##### Hint: rather than filtering by age and gender, filter by Gender 
##### and then use group_by.

```{r}
NHANES %>% 
  filter(Gender=='female') %>%
  group_by(AgeDecade) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
```

##### 5.9.5. Repeat exercise 4 for males.
```{r}
NHANES %>% 
  filter(Gender=='male') %>%
  group_by(AgeDecade) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
```
##### 5.9.6. We can actually combine both summaries for exercises 4 and 5 
##### into one line of code. This is because group_by permits us to 
##### group by more than one variable. Obtain one big summary table 
##### using group_by(AgeDecade, Gender).
```{r}
NHANES %>% 

  group_by(AgeDecade, Gender) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) )
  
```

##### 5.9.7. For males between the ages of 40-49, compare systolic 
##### blood pressure across race as reported in the Race1 variable. 
##### Order the resulting table from lowest to highest average 
##### systolic blood pressure.

```{r}

NHANES %>% 
  filter(Gender=='male', AgeDecade==' 40-49')%>% 
  group_by(Race1) %>%
  summarize(average=mean(BPSysAve, na.rm=TRUE), 
            std_dev=sd(BPSysAve, na.rm=TRUE) ) %>%
  arrange(average)
  
  
```


