ModernDive Chp 9

Learning Check 9.1

# A tibble: 2 x 2
  gender `median(n)`
  <fct>        <dbl>
1 male            12
2 female          12

The medians are the same, there is no variation between the promoted and not promoted.

Learning Check 9.2

I identified the variables of interest. Replicated the data to get a non-biased result. Calculated the result to get a summary of the data I was looking for. Made a bootstrap distribution and confidence interval to visually identify the relashionship between men and womens promotion rates.

Learning Check 9.3

Because they have been relplicated many times.

Learing Check 9.4

The p-value is the probability of obtaining a test statistic just as extreme or more extreme than the observed test statistic assuming the null hypothesis is true. The p-value shows that mean tend to get more promotions than women and the p-value is more extreme than the null hypothesis.

Learning Check 9.5

# A tibble: 1,000 x 2
   replicate    stat
       <int>   <dbl>
 1         1  0.0153
 2         2  0.405 
 3         3  0.564 
 4         4  0.122 
 5         5 -0.262 
 6         6 -0.333 
 7         7 -0.333 
 8         8 -0.0615
 9         9  0.670 
10        10  0.0448
# ... with 990 more rows

It can be interpreted as both have extreme high and low ratings.

Learning Check 9.6

The error says a truly innocent person is found guilty or a truly guilty person is found not guilty.

Learning Check 9.7

To see if a hypothesis is true or close to true.

Learning Check 9.8

A flaw is that the Hypothesis tests are always conducted assuming the null hypothesis is true and the null hypothesis could be wrong.

Learning Check 9.9

The 0.1 would be more liberal and would lead to more errors than a 0.01 test.

Learning Check 9.10

# A tibble: 1,000 x 2
   replicate    stat
       <int>   <dbl>
 1         1  0.0500
 2         2 -0.0500
 3         3  0.3   
 4         4  0.4   
 5         5  0.7   
 6         6  0.850 
 7         7  0.4   
 8         8  0.0500
 9         9  0.25  
10        10 -1.20  
# ... with 990 more rows

# A tibble: 1 x 1
   stat
  <dbl>
1 -1.55

# A tibble: 1 x 1
  p_value
    <dbl>
1   0.004

The median was negative but the median and mean are closely related. The median also has a more accurate p-value.

Learning Check 9.11

I am able to see the extremes.

Learning Check 9.12

We used each step of the Allen Downey diagram to find the values. We got the data set. We picked out the values we wanted to test. Calculated it to get variables from it. Showed it in a data set. We were able to analysis the data to the highest confidence using the method.

Learning Check 9.13

Because the p-value is very small - 0.006.

Learning Check 9.14

The smaller the p-value the more the two ratings are closely related.

Learning Check 9.15

0.016

Learning Check 9.16

Yes the graph matches up but the p-value for medians has a higher confidence. The graphs match up because we see a even distribution between the two graphs.

Learning Check 9.17

# A tibble: 0 x 19
# ... with 19 variables: year <int>, month <int>, day <int>,
#   dep_time <int>, sched_dep_time <int>, dep_delay <dbl>, arr_time <int>,
#   sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
#   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
#   distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>

Warning: Removed 5 rows containing non-finite values (stat_boxplot).

# A tibble: 2 x 4
# Groups:   carrier [2]
  carrier dest      n mean_time
  <chr>   <chr> <int>     <dbl>
1 AS      SEA     714      326.
2 HA      HNL     342      623.

Considering SFO is not in the data set it will not have a greater air time or standard deviation for any amount of minutes.