# A tibble: 2 x 2
gender `median(n)`
<fct> <dbl>
1 male 12
2 female 12
The medians are the same, there is no variation between the promoted and not promoted.
I identified the variables of interest. Replicated the data to get a non-biased result. Calculated the result to get a summary of the data I was looking for. Made a bootstrap distribution and confidence interval to visually identify the relashionship between men and womens promotion rates.
Because they have been relplicated many times.
The p-value is the probability of obtaining a test statistic just as extreme or more extreme than the observed test statistic assuming the null hypothesis is true. The p-value shows that mean tend to get more promotions than women and the p-value is more extreme than the null hypothesis.
# A tibble: 1,000 x 2
replicate stat
<int> <dbl>
1 1 0.0153
2 2 0.405
3 3 0.564
4 4 0.122
5 5 -0.262
6 6 -0.333
7 7 -0.333
8 8 -0.0615
9 9 0.670
10 10 0.0448
# ... with 990 more rows
It can be interpreted as both have extreme high and low ratings.
The error says a truly innocent person is found guilty or a truly guilty person is found not guilty.
To see if a hypothesis is true or close to true.
A flaw is that the Hypothesis tests are always conducted assuming the null hypothesis is true and the null hypothesis could be wrong.
The 0.1 would be more liberal and would lead to more errors than a 0.01 test.
# A tibble: 1,000 x 2
replicate stat
<int> <dbl>
1 1 0.0500
2 2 -0.0500
3 3 0.3
4 4 0.4
5 5 0.7
6 6 0.850
7 7 0.4
8 8 0.0500
9 9 0.25
10 10 -1.20
# ... with 990 more rows
# A tibble: 1 x 1
stat
<dbl>
1 -1.55
# A tibble: 1 x 1
p_value
<dbl>
1 0.004
The median was negative but the median and mean are closely related. The median also has a more accurate p-value.
I am able to see the extremes.
We used each step of the Allen Downey diagram to find the values. We got the data set. We picked out the values we wanted to test. Calculated it to get variables from it. Showed it in a data set. We were able to analysis the data to the highest confidence using the method.
Because the p-value is very small - 0.006.
The smaller the p-value the more the two ratings are closely related.
0.016
Yes the graph matches up but the p-value for medians has a higher confidence. The graphs match up because we see a even distribution between the two graphs.
# A tibble: 0 x 19
# ... with 19 variables: year <int>, month <int>, day <int>,
# dep_time <int>, sched_dep_time <int>, dep_delay <dbl>, arr_time <int>,
# sched_arr_time <int>, arr_delay <dbl>, carrier <chr>, flight <int>,
# tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>,
# distance <dbl>, hour <dbl>, minute <dbl>, time_hour <dttm>
Warning: Removed 5 rows containing non-finite values (stat_boxplot).
# A tibble: 2 x 4
# Groups: carrier [2]
carrier dest n mean_time
<chr> <chr> <int> <dbl>
1 AS SEA 714 326.
2 HA HNL 342 623.
Considering SFO is not in the data set it will not have a greater air time or standard deviation for any amount of minutes.