Tilon Bobb Mikhail Broomes To begin our analysis of the “world_salary” dataset, we’ll start by loading the necessary R libraries for data manipulation and visualization:
knitr::opts_chunk$set(echo = TRUE)
library(RMySQL)
## Loading required package: DBI
library(yaml)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(infer)
library(rvest)
##
## Attaching package: 'rvest'
##
## The following object is masked from 'package:readr':
##
## guess_encoding
Establishing a connection to the sql database
config <- yaml::read_yaml("config.yaml")
con <- dbConnect(
RMySQL::MySQL(),
dbname = config$dbname,
host = config$host,
port = config$port,
user = config$user,
password = config$password
)
query <- "SELECT * FROM project3.salary_data"
world_salary <- dbGetQuery(con, query)
Let’s take a look at the dataset to get a preliminary understanding:
head(world_salary)
## country_name continent_name wage_span median_salary average_salary
## 1 Afghanistan Asia Monthly 853.74 1001.15
## 2 Aland Islands Europe Monthly 3319.24 3858.35
## 3 Albania Europe Monthly 832.84 956.92
## 4 Algeria Africa Monthly 1148.84 1308.81
## 5 American Samoa Oceania Monthly 1390.00 1570.00
## 6 Andorra Europe Monthly 3668.08 4069.77
## lowest_salary highest_salary
## 1 252.53 4460.97
## 2 972.52 17124.74
## 3 241.22 4258.49
## 4 330.11 5824.18
## 5 400.00 6980.00
## 6 1120.51 17653.28
This snippet provides an overview of the dataset by showing the first few rows, including column names and sample data.
let’s examine the dataset’s structure to understand the data types and column names:
str(world_salary)
## 'data.frame': 221 obs. of 7 variables:
## $ country_name : chr "Afghanistan" "Aland Islands" "Albania" "Algeria" ...
## $ continent_name: chr "Asia" "Europe" "Europe" "Africa" ...
## $ wage_span : chr "Monthly" "Monthly" "Monthly" "Monthly" ...
## $ median_salary : num 854 3319 833 1149 1390 ...
## $ average_salary: num 1001 3858 957 1309 1570 ...
## $ lowest_salary : num 253 973 241 330 400 ...
## $ highest_salary: num 4461 17125 4258 5824 6980 ...
To get a more detailed description of the columns you can always go to the source here
Now let’s check for any missing values to see if any cleaning up is necessary before the analysis
# Check for missing values in the entire dataset
missing_values <- is.na(world_salary)
# Summarize the number of missing values in each column
col_missing_count <- colSums(missing_values)
# Display columns with missing values
colnames(world_salary)[col_missing_count > 0]
## character(0)
As we can see, there are no missing vales from the data set, there is however, an error in the column name for the different regions, which is currently continent_name, but since it includes place like the Caribbean and makes a distinction between northern america and North America, we will replace it with geographical region. Since we know that the salaries are monthly, we can also remove the wage_span column
colnames(world_salary)[colnames(world_salary) == "continent_name"] <- "geographical_region"
world_salary <- world_salary %>% select(-wage_span)
We will calculate and display summary statistics for the numerical columns in our dataset, which are “median_salary,” “average_salary,” “lowest_salary,” and “highest_salary.” This provides an overview of central tendencies and data distribution:
summary(world_salary[, c("median_salary", "average_salary", "lowest_salary", "highest_salary")])
## median_salary average_salary lowest_salary highest_salary
## Min. : 0.261 Min. : 0.286 Min. : 0.0721 Min. : 1.27
## 1st Qu.: 567.210 1st Qu.: 651.000 1st Qu.: 163.9300 1st Qu.: 2900.48
## Median :1227.460 Median : 1344.230 Median : 339.4500 Median : 5974.36
## Mean :1762.632 Mean : 1982.340 Mean : 502.7832 Mean : 8802.17
## 3rd Qu.:2389.010 3rd Qu.: 2740.000 3rd Qu.: 690.0000 3rd Qu.:12050.74
## Max. :9836.070 Max. :11292.900 Max. :2850.2700 Max. :50363.93
Based off of this data, the first thing I noticed is that the lowest salary within the data set is $7.21 dollars a month. The average mean of the average salary column is $1,982 a month which means that the worlds average salary can possibly be around $1,982 a month
I hypothesis that Northern America has a higher average salary than the rest of the worlds average
Sampling distribution
world_salary <- world_salary %>%
mutate(Northern_America = ifelse(geographical_region == "Northern America", "Yes", "No"))
ggplot(world_salary, aes(x=average_salary, y=Northern_America)) + geom_boxplot() + theme_bw()
Using a box plot, Northern America, which represents the U.S and Canada, has the highest median salary, along with the largest variability in salary wages, this makes sense, since the United states and Canada are known for having diverse income distributions.
yes_group <- world_salary %>% filter(Northern_America == "Yes")
no_group <- world_salary %>% filter(Northern_America == "No")
# Perform t-tests for 'Yes' and 'No' groups
t_test_yes <- t.test(yes_group$average_salary)
t_test_no <- t.test(no_group$average_salary)
# Get the confidence intervals
conf_interval_yes <- t_test_yes$conf.int
conf_interval_no <- t_test_no$conf.int
# Print the confidence intervals
cat("95% Confidence Interval for 'Yes' (Northern America) Average Salary:", conf_interval_yes, "\n")
## 95% Confidence Interval for 'Yes' (Northern America) Average Salary: 497.8315 9945.388
cat("95% Confidence Interval for 'No' (Other Regions) Average Salary:", conf_interval_no, "\n")
## 95% Confidence Interval for 'No' (Other Regions) Average Salary: 1686.636 2158.624
Northern America (Yes): With 95% confidence, we estimate that the average salary in Northern America falls within the range of approximately $3,961.37 to $4,796.90. This suggests that the true average salary in Northern America is likely to be within this range.
Other Regions (No): Similarly, with 95% confidence, we estimate that the average salary in other regions (outside Northern America) falls within the range of approximately $926.08 to $1,127.23. This indicates that the true average salary in other regions is likely to be within this range.
ggplot(data = world_salary, aes(x = geographical_region, y = median_salary)) +
geom_boxplot() +
labs(title = "Distribution of Median Salaries by Continent")
checking if the median salary is affected by the CPI (consumer price index)
Looking at the data to check for the cases in the continent name
unique(world_salary$geographical_region)
## [1] "Asia" "Europe" "Africa" "Oceania"
## [5] "Caribbean" "South America" "North America" "Central America"
## [9] "Northern America"
changing Northern America to America so it doesn’t intefere with or analysis
world_salary <- world_salary %>%
mutate(geographical_region = ifelse(geographical_region == "Northern America", "North America", geographical_region))
continent_stats <- world_salary %>%
group_by(geographical_region) %>%
summarize(
mean_salary = mean(median_salary),
median_salary = median(median_salary)
)
print(continent_stats)
## # A tibble: 8 × 3
## geographical_region mean_salary median_salary
## <chr> <dbl> <dbl>
## 1 Africa 680. 525.
## 2 Asia 1537. 946.
## 3 Caribbean 1436. 1125.
## 4 Central America 1685. 1576.
## 5 Europe 3196. 3021.
## 6 North America 3247. 2733.
## 7 Oceania 1725. 1330
## 8 South America 1348. 1132.
ggplot(world_salary, aes(x = reorder(geographical_region, -average_salary), y = average_salary)) +
geom_bar(stat = "identity") +
labs(title = "Average Salaries by Continent",
x = "Continent", y = "Average Salary") +
theme_minimal() +
coord_flip()
now adding the cpi to our data frame using rvest to scrape from online data
link <- "https://www.economy.com/indicators/consumer-price-index-cpi"
page <- read_html(link)
name <- page %>%
html_nodes("#table-IALL a") %>%
html_text()
cpi <- page %>%
html_nodes("#table-IALL td:nth-child(3) .pull-right") %>%
html_text()
cpi_data <- data.frame(Name = name, CPI = cpi)
Now joing the data to our world_salary data
world_salary2 <- world_salary
world_salary2 <- world_salary2 %>%
mutate(country_name = trimws(tolower(country_name)))
# Clean the names in 'data' as well
cpi_data <- cpi_data %>%
mutate(Name = trimws(tolower(Name)))
# Merge the data frames
world_salary2 <- world_salary2 %>%
left_join(cpi_data, by = c("country_name" = "Name"))
# Print the first few rows of 'world_salary'
head(world_salary2)
## country_name geographical_region median_salary average_salary lowest_salary
## 1 afghanistan Asia 853.74 1001.15 252.53
## 2 aland islands Europe 3319.24 3858.35 972.52
## 3 albania Europe 832.84 956.92 241.22
## 4 algeria Africa 1148.84 1308.81 330.11
## 5 american samoa Oceania 1390.00 1570.00 400.00
## 6 andorra Europe 3668.08 4069.77 1120.51
## highest_salary Northern_America CPI
## 1 4460.97 No <NA>
## 2 17124.74 No <NA>
## 3 4258.49 No 138.01
## 4 5824.18 No <NA>
## 5 6980.00 No <NA>
## 6 17653.28 No <NA>
world_salary2 <- world_salary2 %>%
mutate( cpi_130 = ifelse(world_salary2$CPI > 130, "yes", "no"))
world_salary2
## country_name geographical_region median_salary
## 1 afghanistan Asia 853.740000
## 2 aland islands Europe 3319.240000
## 3 albania Europe 832.840000
## 4 algeria Africa 1148.840000
## 5 american samoa Oceania 1390.000000
## 6 andorra Europe 3668.080000
## 7 angola Africa 284.390000
## 8 antigua and barbuda Caribbean 1548.150000
## 9 argentina South America 110.280000
## 10 armenia Asia 1700.250000
## 11 aruba Caribbean 1106.150000
## 12 australia Oceania 4306.450000
## 13 austria Europe 3572.940000
## 14 azerbaijan Asia 1558.820000
## 15 bahamas North America 3541.000000
## 16 bahrain Asia 3617.020000
## 17 bangladesh Asia 218.570000
## 18 barbados Caribbean 1395.000000
## 19 belarus Europe 849.500000
## 20 belgium Europe 5729.390000
## 21 belize Central America 1730.000000
## 22 benin Africa 488.250000
## 23 bermuda North America 1440.000000
## 24 bhutan Asia 407.260000
## 25 bolivia South America 1131.500000
## 26 bosnia and herzegovina Europe 1151.350000
## 27 botswana Africa 729.390000
## 28 brazil South America 1490.040000
## 29 british indian ocean territory Africa 2360.000000
## 30 brunei Asia 2110.290000
## 31 bulgaria Europe 1605.410000
## 32 burkina faso Africa 485.020000
## 33 burundi Africa 384.780000
## 34 cambodia Asia 745.190000
## 35 cameroon Africa 633.270000
## 36 canada North America 6311.030000
## 37 cape verde Africa 1706.290000
## 38 cayman islands Caribbean 3430.970000
## 39 central african republic Africa 621.990000
## 40 chad Africa 707.390000
## 41 chile South America 1890.400000
## 42 china Asia 3684.930000
## 43 colombia South America 995.600000
## 44 comoros Africa 567.210000
## 45 congo Africa 1058.670000
## 46 congo democratic republic Africa 170.390000
## 47 cook islands Oceania 2538.920000
## 48 costa rica Central America 4016.060000
## 49 cote divoire Africa 497.910000
## 50 croatia Europe 1935.480000
## 51 cuba Caribbean 783.330000
## 52 cyprus Europe 1976.740000
## 53 czech republic Europe 2310.030000
## 54 denmark Europe 5084.990000
## 55 djibouti Africa 1378.570000
## 56 dominica Caribbean 496.300000
## 57 dominican republic Caribbean 318.890000
## 58 east timor Asia 2030.000000
## 59 ecuador South America 1260.000000
## 60 egypt Africa 254.530000
## 61 el salvador Central America 1470.000000
## 62 equatorial guinea Africa 671.940000
## 63 eritrea Africa 402.000000
## 64 estonia Europe 2579.280000
## 65 ethiopia Africa 143.880000
## 66 faroe islands Europe 3484.420000
## 67 fiji Oceania 1939.130000
## 68 finland Europe 4238.900000
## 69 france Europe 3769.560000
## 70 french guiana South America 2896.410000
## 71 french polynesia Oceania 1133.750000
## 72 gabon Africa 800.850000
## 73 gambia Africa 223.700000
## 74 georgia Asia 2309.700000
## 75 germany Europe 3731.500000
## 76 ghana Africa 373.820000
## 77 gibraltar Europe 3567.070000
## 78 greece Europe 2241.010000
## 79 greenland North America 3526.910000
## 80 grenada Caribbean 2066.670000
## 81 guadeloupe Caribbean 3985.200000
## 82 guam Oceania 1270.000000
## 83 guatemala Central America 1222.650000
## 84 guernsey Europe 8689.020000
## 85 guinea Africa 696.710000
## 86 guinea-bissau Africa 475.350000
## 87 guyana South America 731.320000
## 88 haiti Caribbean 444.950000
## 89 honduras Central America 1022.220000
## 90 hong kong Asia 4252.870000
## 91 hungary Europe 1227.460000
## 92 iceland Europe 4661.200000
## 93 india Asia 327.970000
## 94 indonesia Asia 678.910000
## 95 iran Asia 932.630000
## 96 iraq Asia 1382.380000
## 97 ireland Europe 3021.140000
## 98 italy Europe 3467.230000
## 99 jamaica Caribbean 565.780000
## 100 japan Asia 3158.670000
## 101 jersey Europe 5817.070000
## 102 jordan Asia 1946.400000
## 103 kazakhstan Asia 701.000000
## 104 kenya Africa 863.290000
## 105 kiribati Oceania 2206.450000
## 106 korea (north) Asia 192.225750
## 107 korea (south) Asia 2593.420000
## 108 kyrgyzstan Asia 201.870000
## 109 laos Asia 206.420000
## 110 latvia Europe 1654.072700
## 111 lebanon Asia 753.330000
## 112 lesotho Africa 540.540000
## 113 liberia Africa 334.390000
## 114 libya Africa 417.180000
## 115 liechtenstein Europe 5224.040000
## 116 lithuania Europe 903.874660
## 117 luxembourg Europe 4767.440000
## 118 macao Asia 856.260000
## 119 macedonia Europe 677.000000
## 120 madagascar Africa 250.870000
## 121 malawi Africa 132.380000
## 122 malaysia Asia 1236.170000
## 123 maldives Asia 1099.610000
## 124 mali Africa 478.580000
## 125 malta Europe 4439.750000
## 126 marshall islands Oceania 2070.000000
## 127 martinique Caribbean 2949.260000
## 128 mauritania Africa 44.623043
## 129 mauritius Africa 901.530000
## 130 mayotte Africa 2389.010000
## 131 mexico Central America 1681.970000
## 132 micronesia Oceania 1250.000000
## 133 moldova Europe 1318.680000
## 134 monaco Europe 4112.050000
## 135 mongolia Asia 515.760000
## 136 montenegro Europe 2621.560000
## 137 montserrat Caribbean 859.260000
## 138 morocco Africa 1634.240000
## 139 mozambique Africa 549.810000
## 140 myanmar Asia 228.460000
## 141 namibia Africa 821.410000
## 142 nepal Asia 551.130000
## 143 netherlands Europe 4756.870000
## 144 netherlands antilles North America 2268.160000
## 145 new caledonia Oceania 664.300000
## 146 new zealand Oceania 4196.410000
## 147 nicaragua Central America 449.100000
## 148 niger Africa 480.190000
## 149 nigeria Africa 389.850000
## 150 northern mariana islands Oceania 1820.000000
## 151 norway Europe 4420.020000
## 152 oman Asia 3932.290000
## 153 pakistan Asia 245.340000
## 154 palau Oceania 2380.000000
## 155 palestine Asia 1510.000000
## 156 panama Central America 1890.000000
## 157 papua new guinea Oceania 1010.960000
## 158 paraguay South America 1019.690000
## 159 peru South America 1825.860000
## 160 philippines Asia 728.780000
## 161 poland Europe 1496.570000
## 162 portugal Europe 2537.000000
## 163 puerto rico Caribbean 1483.000000
## 164 qatar Asia 3846.150000
## 165 reunion Africa 1966.170000
## 166 romania Europe 1739.870000
## 167 russia Europe 975.510000
## 168 rwanda Africa 525.390000
## 169 saint kitts and nevis Caribbean 1092.590000
## 170 saint lucia Caribbean 922.220000
## 171 saint martin North America 3086.680000
## 172 saint vincent and the grenadines Caribbean 1144.440000
## 173 samoa Oceania 808.660000
## 174 san marino Europe 4450.320000
## 175 sao tome and principe Africa 254.881630
## 176 saudi arabia Asia 3840.000000
## 177 senegal Africa 493.080000
## 178 serbia Europe 1120.040000
## 179 seychelles Africa 1246.440000
## 180 sierra leone Africa 249.100000
## 181 singapore Asia 5647.060000
## 182 slovakia Europe 2114.160000
## 183 slovenia Europe 1934.460000
## 184 solomon islands Oceania 644.890000
## 185 somalia Africa 392.160000
## 186 south africa Africa 1441.440000
## 187 spain Europe 2579.280000
## 188 sri lanka Asia 249.680000
## 189 sudan Africa 67.610000
## 190 suriname South America 122.180000
## 191 swaziland Africa 204.560000
## 192 sweden Europe 3568.160000
## 193 switzerland Europe 9836.070000
## 194 syria Asia 10.120000
## 195 taiwan Asia 3571.430000
## 196 tajikistan Asia 959.780000
## 197 tanzania Africa 457.770000
## 198 thailand Asia 2432.280000
## 199 togo Africa 787.960000
## 200 tonga Oceania 668.090000
## 201 trinidad and tobago Caribbean 1258.110000
## 202 tunisia Africa 1088.330000
## 203 turkey Asia 254.100000
## 204 turkmenistan Asia 1342.860000
## 205 turks and caicos islands North America 1350.000000
## 206 uganda Africa 645.210000
## 207 ukraine Europe 530.730000
## 208 united arab emirates Asia 3324.250000
## 209 united kingdom Europe 6300.000000
## 210 united states North America 6966.000000
## 211 uruguay South America 773.640000
## 212 uzbekistan Asia 97.250000
## 213 vanuatu Oceania 750.740000
## 214 venezuela South America 3282.020000
## 215 vietnam Asia 612.570000
## 216 virgin islands (british) North America 1600.000000
## 217 virgin islands (us) North America 2380.000000
## 218 western sahara Africa 908.560000
## 219 yemen Asia 120.980000
## 220 zambia Africa 0.261335
## 221 zimbabwe Africa 555.402040
## average_salary lowest_salary highest_salary Northern_America
## 1 1.001150e+03 2.525300e+02 4460.970000 No
## 2 3.858350e+03 9.725200e+02 17124.740000 No
## 3 9.569200e+02 2.412200e+02 4258.490000 No
## 4 1.308810e+03 3.301100e+02 5824.180000 No
## 5 1.570000e+03 4.000000e+02 6980.000000 No
## 6 4.069770e+03 1.120510e+03 17653.280000 No
## 7 3.143900e+02 7.932000e+01 1403.960000 No
## 8 1.677780e+03 4.222200e+02 7444.440000 No
## 9 1.294200e+02 3.257000e+01 577.130000 No
## 10 1.974320e+03 4.973900e+02 8780.390000 No
## 11 1.268160e+03 3.184400e+02 5642.460000 No
## 12 4.903230e+03 1.236130e+03 21774.190000 No
## 13 4.016910e+03 1.014800e+03 17864.690000 No
## 14 1.741180e+03 4.411800e+02 7764.710000 No
## 15 3.908000e+03 9.830000e+02 17416.000000 No
## 16 3.936170e+03 9.840400e+02 17553.190000 No
## 17 2.367100e+02 5.968000e+01 1052.060000 No
## 18 1.635000e+03 4.100000e+02 7250.000000 No
## 19 9.832800e+02 2.474900e+02 4381.270000 No
## 20 6.522200e+03 1.997890e+03 27378.440000 No
## 21 1.965000e+03 4.950000e+02 8750.000000 No
## 22 5.494800e+02 1.385800e+02 2449.280000 No
## 23 1.600000e+03 4.000000e+02 7120.000000 Yes
## 24 4.493000e+02 1.131700e+02 1994.230000 No
## 25 1.236990e+03 3.121400e+02 5505.780000 No
## 26 1.248650e+03 3.135100e+02 5567.570000 No
## 27 8.606900e+02 2.166300e+02 3822.030000 No
## 28 1.711160e+03 4.322700e+02 7609.560000 No
## 29 2.690000e+03 6.800000e+02 12000.000000 No
## 30 2.375000e+03 5.955900e+02 10588.240000 No
## 31 1.794590e+03 4.540500e+02 7945.950000 No
## 32 5.349700e+02 1.350300e+02 2384.830000 No
## 33 4.200800e+02 1.055500e+02 1863.890000 No
## 34 8.083000e+02 2.034100e+02 3592.440000 No
## 35 7.444500e+02 1.869200e+02 3303.310000 No
## 36 7.352940e+03 1.850000e+03 32720.590000 Yes
## 37 1.965110e+03 4.955900e+02 8742.330000 No
## 38 3.901560e+03 9.843900e+02 17406.960000 No
## 39 6.993300e+02 1.756400e+02 3109.940000 No
## 40 7.879600e+02 1.982000e+02 3512.790000 No
## 41 2.090560e+03 5.259800e+02 9274.090000 No
## 42 4.027400e+03 1.015070e+03 17945.210000 No
## 43 1.157850e+03 2.925300e+02 5137.790000 No
## 44 6.510000e+02 1.639300e+02 2900.480000 No
## 45 1.205300e+03 3.029400e+02 5365.860000 No
## 46 1.917400e+02 4.834000e+01 853.950000 No
## 47 2.832340e+03 7.125700e+02 12574.850000 No
## 48 4.427010e+03 1.115160e+03 19613.340000 No
## 49 5.446400e+02 1.374500e+02 2417.050000 No
## 50 2.089760e+03 5.273500e+02 9298.740000 No
## 51 9.166700e+02 2.312500e+02 4079.170000 No
## 52 2.293870e+03 5.814000e+02 10211.420000 No
## 53 2.653060e+03 6.686900e+02 11810.680000 No
## 54 5.779040e+03 1.458920e+03 25637.390000 No
## 55 1.553000e+03 3.916300e+02 6921.000000 No
## 56 5.555600e+02 1.407400e+02 2470.370000 No
## 57 3.506000e+02 8.862000e+01 1562.720000 No
## 58 2.220000e+03 5.600000e+02 9860.000000 No
## 59 1.370000e+03 3.400000e+02 6080.000000 No
## 60 2.985100e+02 7.536000e+01 1329.240000 No
## 61 1.710000e+03 4.300000e+02 7610.000000 No
## 62 7.734600e+02 1.949800e+02 3432.220000 No
## 63 4.573300e+02 1.153300e+02 2033.330000 No
## 64 2.906980e+03 7.293900e+02 12896.410000 No
## 65 1.604100e+02 4.042000e+01 713.130000 No
## 66 3.810200e+03 9.589200e+02 16855.520000 No
## 67 2.100000e+03 5.304300e+02 9347.830000 No
## 68 4.978860e+03 1.257930e+03 22093.020000 No
## 69 4.377380e+03 1.100420e+03 19467.230000 No
## 70 3.319240e+03 8.351000e+02 14799.150000 No
## 71 1.293180e+03 3.259500e+02 5757.310000 No
## 72 8.943100e+02 2.255900e+02 3980.080000 No
## 73 2.468400e+02 6.202000e+01 1095.340000 No
## 74 2.526120e+03 6.380600e+02 11231.340000 No
## 75 4.048630e+03 1.014800e+03 17970.400000 No
## 76 4.384200e+02 1.102500e+02 1946.600000 No
## 77 4.135370e+03 1.046340e+03 18393.900000 No
## 78 2.579280e+03 6.553900e+02 11522.200000 No
## 79 4.008500e+03 1.011330e+03 17847.030000 Yes
## 80 2.329630e+03 5.888900e+02 10370.370000 No
## 81 4.439750e+03 1.120510e+03 19767.440000 No
## 82 1.400000e+03 3.500000e+02 6230.000000 No
## 83 1.335880e+03 3.371500e+02 5954.200000 No
## 84 9.409760e+03 2.367070e+03 41869.510000 No
## 85 8.178700e+02 2.062200e+02 3634.990000 No
## 86 5.510900e+02 1.390600e+02 2449.280000 No
## 87 8.412600e+02 2.117500e+02 3737.870000 No
## 88 5.062000e+02 1.276600e+02 2250.590000 No
## 89 1.139390e+03 2.872700e+02 5090.910000 No
## 90 4.687100e+03 1.182630e+03 20817.370000 No
## 91 1.344230e+03 3.394500e+02 5974.360000 No
## 92 5.049030e+03 1.273230e+03 22464.510000 No
## 93 3.844300e+02 9.707000e+01 1717.920000 No
## 94 7.888300e+02 1.985000e+02 3504.470000 No
## 95 1.070970e+03 2.695300e+02 4770.470000 No
## 96 1.573310e+03 3.956200e+02 6988.250000 No
## 97 3.399580e+03 8.562400e+02 15151.160000 No
## 98 3.868920e+03 9.725200e+02 17230.440000 No
## 99 6.245600e+02 1.575900e+02 2777.240000 No
## 100 3.453120e+03 8.699700e+02 15391.820000 No
## 101 6.304880e+03 1.585370e+03 28048.780000 No
## 102 2.284910e+03 5.782800e+02 10141.040000 No
## 103 8.139900e+02 2.048600e+02 3620.080000 No
## 104 9.914300e+02 2.502200e+02 4424.360000 No
## 105 2.509680e+03 6.322600e+02 11161.290000 No
## 106 2.166706e+02 5.455656e+01 963.350990 No
## 107 2.889810e+03 7.283800e+02 12893.000000 No
## 108 2.199200e+02 5.560000e+01 980.040000 No
## 109 2.235400e+02 5.625000e+01 992.990000 No
## 110 1.952104e+03 4.917514e+02 8657.804000 No
## 111 8.733300e+02 2.193300e+02 3873.330000 No
## 112 6.253300e+02 1.573900e+02 2776.890000 No
## 113 3.803500e+02 9.614000e+01 1690.440000 No
## 114 4.703500e+02 1.186100e+02 2085.890000 No
## 115 5.825140e+03 1.464480e+03 25901.640000 No
## 116 9.979019e+02 2.517503e+02 4428.379200 No
## 117 5.211420e+03 1.310780e+03 23150.110000 No
## 118 9.281300e+02 2.342000e+02 4126.390000 No
## 119 7.941400e+02 1.998300e+02 3531.440000 No
## 120 2.904800e+02 7.328000e+01 1291.740000 No
## 121 1.516900e+02 3.824000e+01 674.790000 No
## 122 1.406380e+03 3.553200e+02 6255.320000 No
## 123 1.241910e+03 3.130700e+02 5517.460000 No
## 124 5.333600e+02 1.345500e+02 2368.710000 No
## 125 4.904860e+03 1.236790e+03 21775.900000 No
## 126 2.260000e+03 5.700000e+02 10100.000000 No
## 127 3.192390e+03 8.033800e+02 14164.900000 No
## 128 5.249770e+01 1.320317e+01 232.827290 No
## 129 1.047660e+03 2.630400e+02 4653.780000 No
## 130 2.748410e+03 6.976700e+02 12262.160000 No
## 131 1.917340e+03 4.827800e+02 8495.980000 No
## 132 1.410000e+03 3.600000e+02 6270.000000 No
## 133 1.467030e+03 3.703300e+02 6538.460000 No
## 134 4.492600e+03 1.374210e+03 18921.780000 No
## 135 5.647400e+02 1.423400e+02 2509.620000 No
## 136 2.832980e+03 7.188200e+02 12579.280000 No
## 137 1.011110e+03 2.555600e+02 4481.480000 No
## 138 1.896890e+03 4.776300e+02 8433.850000 No
## 139 6.312700e+02 1.597700e+02 2803.880000 No
## 140 2.603500e+02 6.568000e+01 1156.590000 No
## 141 9.274000e+02 2.337000e+02 4128.250000 No
## 142 6.082000e+02 1.531800e+02 2703.110000 No
## 143 5.179700e+03 1.427060e+03 22515.860000 No
## 144 2.458100e+03 6.201100e+02 10893.850000 No
## 145 7.794500e+02 1.966300e+02 3463.240000 No
## 146 4.870060e+03 1.227540e+03 21656.290000 No
## 147 5.171500e+02 1.303800e+02 2299.950000 No
## 148 5.478700e+02 1.379300e+02 2433.170000 No
## 149 4.389100e+02 1.106300e+02 1949.270000 No
## 150 1.990000e+03 5.000000e+02 8870.000000 No
## 151 4.786340e+03 1.208230e+03 21281.570000 No
## 152 4.635420e+03 1.171880e+03 20572.920000 No
## 153 2.849000e+02 7.183000e+01 1266.610000 No
## 154 2.740000e+03 6.900000e+02 12200.000000 No
## 155 1.710000e+03 4.300000e+02 7620.000000 No
## 156 2.130000e+03 5.400000e+02 9460.000000 No
## 157 1.128770e+03 2.849300e+02 5013.700000 No
## 158 1.126170e+03 2.839300e+02 5009.740000 No
## 159 1.997360e+03 5.039600e+02 8891.820000 No
## 160 7.905400e+02 1.994000e+02 3511.560000 No
## 161 1.736840e+03 4.370700e+02 7734.550000 No
## 162 2.917550e+03 7.399600e+02 13002.110000 No
## 163 1.683000e+03 4.250000e+02 7491.000000 No
## 164 4.313190e+03 1.090660e+03 19230.770000 No
## 165 2.198730e+03 5.496800e+02 9767.440000 No
## 166 1.921110e+03 4.840100e+02 8550.110000 No
## 167 1.065680e+03 2.684700e+02 4744.340000 No
## 168 5.696500e+02 1.434400e+02 2532.680000 No
## 169 1.259260e+03 3.185200e+02 5592.590000 No
## 170 1.048150e+03 2.629600e+02 4666.670000 No
## 171 3.477800e+03 8.773800e+02 15433.400000 No
## 172 1.262960e+03 3.185200e+02 5629.630000 No
## 173 8.844800e+02 2.238300e+02 3935.020000 No
## 174 4.820300e+03 1.215640e+03 21458.770000 No
## 175 2.987826e+02 7.544155e+01 1329.817200 No
## 176 4.480000e+03 1.128000e+03 19893.330000 No
## 177 5.655900e+02 1.427700e+02 2513.740000 No
## 178 1.273600e+03 3.206600e+02 5654.410000 No
## 179 1.403130e+03 3.539900e+02 6246.440000 No
## 180 2.777500e+02 6.992000e+01 1238.230000 No
## 181 6.235290e+03 1.573530e+03 27720.590000 No
## 182 2.315010e+03 5.814000e+02 10295.980000 No
## 183 2.093020e+03 5.285400e+02 9302.330000 No
## 184 7.565300e+02 1.912100e+02 3361.050000 No
## 185 4.551900e+02 1.148500e+02 2030.850000 No
## 186 1.658720e+03 4.175900e+02 7366.190000 No
## 187 2.875260e+03 8.773800e+02 12050.740000 No
## 188 2.784100e+02 7.014000e+01 1239.110000 No
## 189 7.453000e+01 1.880000e+01 331.920000 No
## 190 1.335700e+02 3.365000e+01 592.800000 No
## 191 2.395300e+02 6.041000e+01 1065.180000 No
## 192 4.144560e+03 1.043000e+03 18389.750000 No
## 193 1.129290e+04 2.850270e+03 50363.930000 No
## 194 1.151000e+01 2.900000e+00 51.200000 No
## 195 4.037270e+03 1.015530e+03 17919.250000 No
## 196 1.069470e+03 2.696500e+02 4753.200000 No
## 197 5.055400e+02 1.269800e+02 2245.080000 No
## 198 2.662110e+03 6.703100e+02 11846.790000 No
## 199 8.524100e+02 2.143100e+02 3786.720000 No
## 200 7.787200e+02 1.957400e+02 3459.570000 No
## 201 1.445430e+03 3.643100e+02 6430.680000 No
## 202 1.239750e+03 3.123000e+02 5520.500000 No
## 203 2.861300e+02 7.208000e+01 1274.120000 No
## 204 1.500000e+03 3.771400e+02 6657.140000 No
## 205 1.490000e+03 3.800000e+02 6640.000000 No
## 206 6.983100e+02 1.760400e+02 3106.550000 No
## 207 6.228000e+02 1.573200e+02 2761.980000 No
## 208 3.896460e+03 7.084500e+02 18637.600000 No
## 209 7.235370e+03 1.829270e+03 32214.630000 No
## 210 7.925000e+03 2.000000e+03 35250.000000 Yes
## 211 8.622000e+02 2.172400e+02 3829.120000 No
## 212 1.069800e+02 2.707000e+01 477.350000 No
## 213 8.217700e+02 2.073000e+02 3650.480000 No
## 214 3.862910e+03 9.729900e+02 17165.260000 No
## 215 7.112400e+02 1.792500e+02 3161.520000 No
## 216 1.840000e+03 4.600000e+02 8180.000000 No
## 217 2.710000e+03 6.800000e+02 12000.000000 No
## 218 1.011670e+03 2.548600e+02 4503.890000 No
## 219 1.333600e+02 3.362000e+01 594.930000 No
## 220 2.855239e-01 7.209242e-02 1.271103 No
## 221 6.023764e+02 1.514230e+02 2674.772000 No
## CPI cpi_130
## 1 <NA> <NA>
## 2 <NA> <NA>
## 3 138.01 yes
## 4 <NA> <NA>
## 5 <NA> <NA>
## 6 <NA> <NA>
## 7 724.48 yes
## 8 110.42 no
## 9 2,496 yes
## 10 153.62 yes
## 11 112.23 no
## 12 135.3 yes
## 13 121.8 no
## 14 <NA> <NA>
## 15 131.05 yes
## 16 120.44 no
## 17 <NA> <NA>
## 18 172.8 yes
## 19 <NA> <NA>
## 20 128.89 no
## 21 122.46 no
## 22 121.47 no
## 23 <NA> <NA>
## 24 160.08 yes
## 25 158.33 yes
## 26 128.41 no
## 27 195.01 yes
## 28 6,716 yes
## 29 <NA> <NA>
## 30 <NA> <NA>
## 31 9,401 yes
## 32 133.06 yes
## 33 332.8 yes
## 34 150.29 yes
## 35 136.24 yes
## 36 158.5 yes
## 37 <NA> <NA>
## 38 <NA> <NA>
## 39 172.62 yes
## 40 143.52 yes
## 41 133.82 yes
## 42 99.8 yes
## 43 136.45 yes
## 44 103.62 no
## 45 <NA> <NA>
## 46 <NA> <NA>
## 47 <NA> <NA>
## 48 109.73 no
## 49 <NA> <NA>
## 50 128.2 no
## 51 <NA> <NA>
## 52 117.97 no
## 53 <NA> <NA>
## 54 117.4 no
## 55 130.53 yes
## 56 105.07 no
## 57 171.05 yes
## 58 <NA> <NA>
## 59 <NA> <NA>
## 60 <NA> <NA>
## 61 126.83 no
## 62 150.99 yes
## 63 <NA> <NA>
## 64 291.86 yes
## 65 849.71 yes
## 66 <NA> <NA>
## 67 138.87 yes
## 68 <NA> <NA>
## 69 118.45 no
## 70 <NA> <NA>
## 71 <NA> <NA>
## 72 135.43 yes
## 73 267.31 yes
## 74 177.91 yes
## 75 118.1 no
## 76 195.24 yes
## 77 <NA> <NA>
## 78 117 no
## 79 <NA> <NA>
## 80 112.55 no
## 81 <NA> <NA>
## 82 161.5 yes
## 83 178.63 yes
## 84 <NA> <NA>
## 85 218.71 yes
## 86 136.11 yes
## 87 135.23 yes
## 88 534.75 yes
## 89 191.86 yes
## 90 <NA> <NA>
## 91 235.99 yes
## 92 166.16 yes
## 93 <NA> <NA>
## 94 116.08 no
## 95 2,268 yes
## 96 <NA> <NA>
## 97 121.5 no
## 98 120.1 no
## 99 132.49 yes
## 100 <NA> <NA>
## 101 <NA> <NA>
## 102 135.99 yes
## 103 294.6 yes
## 104 <NA> <NA>
## 105 <NA> <NA>
## 106 <NA> <NA>
## 107 <NA> <NA>
## 108 229.78 yes
## 109 260.85 yes
## 110 143.2 yes
## 111 119 no
## 112 202.55 yes
## 113 180.58 yes
## 114 279.36 yes
## 115 <NA> <NA>
## 116 152.97 yes
## 117 <NA> <NA>
## 118 <NA> <NA>
## 119 <NA> <NA>
## 120 164.26 yes
## 121 740.05 yes
## 122 130.9 yes
## 123 142.51 yes
## 124 128.88 no
## 125 124.49 no
## 126 <NA> <NA>
## 127 <NA> <NA>
## 128 163.89 yes
## 129 131.52 yes
## 130 <NA> <NA>
## 131 130.61 yes
## 132 <NA> <NA>
## 133 267.07 yes
## 134 <NA> <NA>
## 135 279.47 yes
## 136 150.29 yes
## 137 <NA> <NA>
## 138 129.58 no
## 139 236.26 yes
## 140 <NA> <NA>
## 141 191.3 yes
## 142 246.03 yes
## 143 127.73 no
## 144 <NA> <NA>
## 145 <NA> <NA>
## 146 1,253 no
## 147 212.95 yes
## 148 132.48 yes
## 149 <NA> <NA>
## 150 <NA> <NA>
## 151 130.7 yes
## 152 <NA> <NA>
## 153 <NA> <NA>
## 154 <NA> <NA>
## 155 <NA> <NA>
## 156 109.77 no
## 157 182.49 yes
## 158 177.12 yes
## 159 111.6 no
## 160 123.9 no
## 161 247.6 yes
## 162 119.45 no
## 163 <NA> <NA>
## 164 <NA> <NA>
## 165 <NA> <NA>
## 166 <NA> <NA>
## 167 <NA> <NA>
## 168 244.21 yes
## 169 <NA> <NA>
## 170 122.11 no
## 171 <NA> <NA>
## 172 120.75 no
## 173 143.68 yes
## 174 113.09 no
## 175 171.58 yes
## 176 <NA> <NA>
## 177 136.65 yes
## 178 194.43 yes
## 179 159.6 yes
## 180 614.98 yes
## 181 115.11 no
## 182 <NA> <NA>
## 183 126.13 no
## 184 150.48 yes
## 185 <NA> <NA>
## 186 112.8 no
## 187 113.68 no
## 188 203.6 yes
## 189 47,954 yes
## 190 1,435 no
## 191 <NA> <NA>
## 192 409.07 yes
## 193 106.15 no
## 194 <NA> <NA>
## 195 <NA> <NA>
## 196 -0.84 no
## 197 112.18 no
## 198 107.72 no
## 199 134.97 yes
## 200 160.21 yes
## 201 163.07 yes
## 202 208.92 yes
## 203 <NA> <NA>
## 204 <NA> <NA>
## 205 <NA> <NA>
## 206 212.51 yes
## 207 235.3 yes
## 208 118.81 no
## 209 132 yes
## 210 307.62 yes
## 211 104.66 no
## 212 <NA> <NA>
## 213 143.7 yes
## 214 22,244,633,245,832 yes
## 215 <NA> <NA>
## 216 <NA> <NA>
## 217 <NA> <NA>
## 218 <NA> <NA>
## 219 206.54 yes
## 220 341.67 yes
## 221 23,599 yes
boxplot( median_salary~cpi_130 , data = world_salary2)
world_salary2 %>%
filter(!is.na(cpi_130)) %>%
group_by(cpi_130) %>%
summarise(mean_median_salary = mean(median_salary, na.rm = TRUE))
## # A tibble: 2 × 2
## cpi_130 mean_median_salary
## <chr> <dbl>
## 1 no 2451.
## 2 yes 1239.
null_dist <- world_salary2 %>%
drop_na(cpi_130) %>%
specify(median_salary ~ cpi_130) %>%
hypothesize(null = "independence") %>%
generate(reps = 1000, type = "permute") %>%
calculate(stat = "diff in means", order = c("yes", "no"))
null_dist
## Response: median_salary (numeric)
## Explanatory: cpi_130 (factor)
## Null Hypothesis: independence
## # A tibble: 1,000 × 2
## replicate stat
## <int> <dbl>
## 1 1 74.7
## 2 2 9.46
## 3 3 -409.
## 4 4 59.5
## 5 5 262.
## 6 6 43.0
## 7 7 235.
## 8 8 -229.
## 9 9 -269.
## 10 10 -125.
## # ℹ 990 more rows
ggplot(data = null_dist, aes(x = stat)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
obs_diff_median <- world_salary2 %>%
filter(!is.na(median_salary), !is.na(cpi_130)) %>%
specify(median_salary ~ cpi_130) %>%
calculate(stat = "diff in means", order = c("yes", "no"))
obs_diff_median
## Response: median_salary (numeric)
## Explanatory: cpi_130 (factor)
## # A tibble: 1 × 1
## stat
## <dbl>
## 1 -1212.
null_dist %>%
get_p_value(obs_stat = obs_diff_median, direction = "two_sided")
## Warning: Please be cautious in reporting a p-value of 0. This result is an
## approximation based on the number of `reps` chosen in the `generate()` step.
## See `?get_p_value()` for more information.
## # A tibble: 1 × 1
## p_value
## <dbl>
## 1 0
Since we have a p value of 0 we can reject the null hypothesis