library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stats)

The following dataset represents the amount of time (in seconds) that it takes 21 heavy smokers to fall off a treadmill at the fastest setting:

smoker_fall_time_s <- c(18, 16, 18, 24, 23, 22, 22, 23, 26, 29, 32,
34, 34, 36, 36, 43, 42, 49, 46, 46, 57)
smoker_fall_time_s
##  [1] 18 16 18 24 23 22 22 23 26 29 32 34 34 36 36 43 42 49 46 46 57

Calculate

a) The mean

#a)
mean(smoker_fall_time_s)
## [1] 32.19048

b) The median

#b)
median(smoker_fall_time_s)
## [1] 32

c) The range

#c)
range(smoker_fall_time_s)
## [1] 16 57
57-16
## [1] 41

The range is 41.

d) Standard deviation

#d)
sd(smoker_fall_time_s)
## [1] 11.58714

e) Variance

#e)
var(smoker_fall_time_s)
## [1] 134.2619

why not)

#why not
summary(smoker_fall_time_s)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   16.00   23.00   32.00   32.19   42.00   57.00

A frequency distribution in which there is a cluster of frequent scores at the low (more negative)end would be described as what?

Skew

The positive skew has a cluster of scores on the low end.

If the scores on a test have a mean of 26 with a standard deviation of 4,

mean <- 26
sd <- 4

a) what is the z-score of a score of 18?

First I will draw it.

The z-score should be in the left tail, a negative value.

xi_a <- 18
z_score_fun <- function (xi)
  (xi - mean) / sd
z_score_a <- z_score_fun(xi_a)
z_score_a
## [1] -2

The z-score is -2.

b) what is the probability that a student scores less than 18 on an exam?

below_18 <- pnorm(z_score_a)
below_18
## [1] 0.02275013
pnorm(xi_a, mean, sd) * 100
## [1] 2.275013

There is a 2.28% chance that a student scored less than 18 on an exam.

c) what is the probability that a student scores between 18 and 28 points on the test?

First I will calculate the z_score for 28. With the mean at 26, the z-score should be positive and the value of data to the left should be larger than 50% because 28 is larger than the mean of 26.

xi_b <- 28
z_score_b <- z_score_fun(xi_b)
z_score_b
## [1] 0.5
below_28 <- pnorm(z_score_b)
below_28
## [1] 0.6914625

The z_score for 28 is 0.5 and 69.1% of the data is below 28.

Next I need to determine the amount of data above 18

above_18 <- 1 - below_18
above_18
## [1] 0.9772499

97.7% of the data is above 18.

At last, I am ready to determine the percentage of scores between 18 and 28.

between_18_28 <- above_18 - below_28
between_18_28 * 100
## [1] 28.57874

There is a 28.6% chance that a student scored between 18 and 28 on the test.

Using the murders dataframe (from the dslabs package), list the variables in this dataset and describe each variable as a nominal, ordinal, interval or ratio variable.

library(dslabs)
data(murders)
head(murders)
##        state abb region population total
## 1    Alabama  AL  South    4779736   135
## 2     Alaska  AK   West     710231    19
## 3    Arizona  AZ   West    6392017   232
## 4   Arkansas  AR  South    2915918    93
## 5 California  CA   West   37253956  1257
## 6   Colorado  CO   West    5029196    65

state is nominal because there is no relation between the variables.

abb is nominal because there is no relation between the variables.

region is nominal because there is no relation between the variables.

population is ratio because there is a natural 0.

total is ratio because there is a natural 0.