Which city or state is the ideal location for a MSBA student to work and live? Nowadays Big Data is a really hot topic. Accordingly, there is a big demand of data related jobs everywhere in USA, so it is difficult but necessary to choose the most suitable place. Wages should be considered at first, but other factors like Price Parity are also very important, because no matter how high a person’s wage is, his living quality can be reduced due to big living expenditure.
Why am I so interested in this problem? Because as a STEM program student, I plan to work in USA for a few years. Unlike many other Chinese students who go back to China right after graduation, I do believe that working and living in US will be one of the most valuable treasures in my life!
Packages Required
To reproduce the code and results throughout this project you will need to load the following packages.
library(tidyverse) #visualizeing, transforming, inputing, tidying and joining data
library(readxl) #inputing data from Excel
library(plotly) #showing data and plots at the same time
library(ggmap) #loading the function of maping
library(maps) #loading maps
library(mapdata) #loading map data
library(DT) #previewing the data sets
Data Preparation
I got data sets from different resources. I got Wage Data by State and by Job Characterisrics from Bureau of Labor Statistics(BLS), State Price Parity from Coursera, Metropolitan Area Price Parity from Bureau of Economic Analysis(BEA), State Longitude and Latitude from Ink Plant Code, Metropolitan Area Longitude and Latitude from StatCrunch.
Then I read these files into R.
excel_sheets("Rfinalprojectdata/mwe_2016complete.xlsx")
## [1] "File Information" "MWE_2016"
messwage <- read_excel("Rfinalprojectdata/mwe_2016complete.xlsx", sheet ="MWE_2016")
statelonglat <- read_excel("Rfinalprojectdata/longlat.xlsx")
metro <- read_excel("Rfinalprojectdata/metro.xlsx")
state_price_parity <- read_excel("Rfinalprojectdata/Coursera_price_parity.xlsx")
metro_price_parity <- read_excel("Rfinalprojectdata/metro_price_parity.xlsx")
The data is not very messy so I just need to select the variables I want to analyze.
Preview of the clean data sets.
library(DT)
datatable(head(wage,50))
Exploratory Data Analysis
1. the Wages of Data Related Jobs Differ on National Level
First, on national level, I draw some boxplots to analyze the differences of Average Hourly Wage. Of course Marketing Managers have the highest wages, and then Software and Information jobs, and then Analysts.
wage %>%
group_by(`Occupation Text`) %>%
filter(grepl('National',`Area Level`)) %>%
filter(grepl('Analysts|Data|Intelligence|Software|Machine Learning|Marketing Manager|
Solutions|Consultant',`Occupation Text`)) %>%
filter(!grepl('Keyers',`Occupation Text`)) %>%
ggplot(aes(reorder(`Occupation Text`, `Average Hourly Wage`), `Average Hourly Wage`)) +
labs(x = 'Occupation') +
geom_boxplot() +
scale_y_continuous() +
ggtitle("Wage for Data Jobs") +
theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))+
coord_flip()
2. the Wages of Data Related Jobs Differ on State Level
Computer System Analysts have obersavations with the biggest quantity of states, which will be easier to compare the differences, so I take this job title for example.
csAnalysts <-
wage %>%
group_by(`region`) %>%
filter(grepl('State',`Area Level`)) %>%
filter(grepl('Computer Systems Analysts',`Occupation Text`)) %>%
summarise(median_wage = median(`Average Hourly Wage`)) %>%
left_join(statelonglat, by = "region")
ggplot(data = states) +
geom_polygon(aes(x = long, y = lat, group = group), fill = "grey40", color = "white") +
geom_point( data=csAnalysts, aes(x = long, y = lat, size=median_wage, color = median_wage)) +
scale_size( name="median_wage") +
scale_colour_gradientn(colours=rainbow(4))+
geom_text(data = csAnalysts,aes(x = long, y = lat, label = region),
size = 1.8, vjust = 0, hjust = 0)+
coord_fixed(1.3) +
guides(fill=FALSE) +
labs(x="Longitute", y="Latitude") +
ggtitle("Computer System Analyst Wages Across States") +
theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))
It seems that Computer System Analysts in New Jersey State and New York State are really paid well, right? But now let me take the Price Parity into consideration.
adjustwage <-
wage %>%
select(`Average Hourly Wage`, "Area Level", "region", "Occupation Text") %>%
left_join(state_price_parity) %>%
mutate(newwage = `Average Hourly Wage`/`Price Parity`*100)
AdjustcsAnalysts <-
adjustwage %>%
group_by(`region`) %>%
filter(grepl('State',`Area Level`)) %>%
filter(grepl('Computer Systems Analysts',`Occupation Text`)) %>%
summarise(median_newwage = median(newwage)) %>%
left_join(statelonglat, by = "region")
ggplot(data = states) +
geom_polygon(aes(x = long, y = lat, group = group), fill = "grey40", color = "white") +
geom_point( data=AdjustcsAnalysts, aes(x = long, y = lat, size=median_newwage, color = median_newwage)) +
scale_size( name="median_newwage") +
scale_colour_gradientn(colours=rainbow(4))+
geom_text(data = AdjustcsAnalysts,aes(x = long, y = lat, label = region),
size = 1.8, vjust = 0, hjust = 0) +
coord_fixed(1.3) +
guides(fill=FALSE) +
labs(x="Longitute", y="Latitude") +
ggtitle("Adjusted Computer System Analyst Wages Across States") +
theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))
So it is a totally different story now. We found that the Computer System analysts in Missouri and Ohio are the luckiest guys, right?
3. the Wages of Data Related Jobs Differ on Metropolitan Area Level
So now let’s have a look at Metropolitan Area.
metroadjustwage <-
wage %>%
select(`Average Hourly Wage`, "Area Level", "region", "Occupation Text") %>%
filter(grepl('Metro area',`Area Level`)) %>%
left_join(metro_price_parity) %>%
mutate(metronewwage = `Average Hourly Wage`/`Price Parity`*100)
MetroAdjustcsAnalysts <-
metroadjustwage %>%
group_by(`region`, `ID`) %>%
filter(grepl('Metro area',`Area Level`)) %>%
filter(grepl('Computer Systems Analysts',`Occupation Text`)) %>%
summarise(median_metronewwage = median(metronewwage)) %>%
left_join(metro, by = "region")
ggplot(data = states) +
geom_polygon(aes(x = long, y = lat, group = group), fill = "grey40", color = "white") +
geom_point( data = MetroAdjustcsAnalysts, aes(x = long, y = lat, size=median_metronewwage, color = median_metronewwage)) +
scale_size( name="median_newwage") +
scale_colour_gradientn(colours=rainbow(4))+
geom_text(data = MetroAdjustcsAnalysts,aes(x = long, y = lat, label = ID),
size = 1.2, vjust = 0.5, hjust = 0.5) +
coord_fixed(1.3) +
guides(fill=FALSE) +
labs(x="Longitute", y="Latitude") +
ggtitle("Adjusted Computer System Analyst Wages Across Metro Areas") +
theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_text).
So Computer System Analysts in St.Louis, MO-IL, and Cincinnati, OH-KY-IN are best paid. And here I attach the Metropolitan Area list with ID.
| ID | Metropolitan Area |
|---|---|
| 1 | Abilene, TX |
| 2 | Akron, OH |
| 3 | Albany, GA |
| 4 | Albany, OR |
| 5 | Albany-Schenectady-Troy, NY |
| 6 | Albuquerque, NM |
| 7 | Alexandria, LA |
| 8 | Allentown-Bethlehem-Easton, PA-NJ |
| 9 | Altoona, PA |
| 10 | Amarillo, TX |
| 11 | Ames, IA |
| 12 | Anchorage, AK |
| 13 | Ann Arbor, MI |
| 14 | Anniston-Oxford-Jacksonville, AL |
| 15 | Appleton, WI |
| 16 | Asheville, NC |
| 17 | Athens-Clarke County, GA |
| 18 | Atlanta-Sandy Springs-Roswell, GA |
| 19 | Atlantic City-Hammonton, NJ |
| 20 | Auburn-Opelika, AL |
| 21 | Augusta-Richmond County, GA-SC |
| 22 | Austin-Round Rock, TX |
| 23 | Bakersfield, CA |
| 24 | Baltimore-Columbia-Towson, MD |
| 25 | Bangor, ME |
| 26 | Barnstable Town, MA |
| 27 | Baton Rouge, LA |
| 28 | Battle Creek, MI |
| 29 | Bay City, MI |
| 30 | Beaumont-Port Arthur, TX |
| 31 | Beckley, WV |
| 32 | Bellingham, WA |
| 33 | Bend-Redmond, OR |
| 34 | Billings, MT |
| 35 | Binghamton, NY |
| 36 | Birmingham-Hoover, AL |
| 37 | Bismarck, ND |
| 38 | Blacksburg-Christiansburg-Radford, VA |
| 39 | Bloomington, IL |
| 40 | Bloomington, IN |
| 41 | Bloomsburg-Berwick, PA |
| 42 | Boise City, ID |
| 43 | Boston-Cambridge-Newton, MA-NH |
| 44 | Boulder, CO |
| 45 | Bowling Green, KY |
| 46 | Bremerton-Silverdale, WA |
| 47 | Bridgeport-Stamford-Norwalk, CT |
| 48 | Brownsville-Harlingen, TX |
| 49 | Brunswick, GA |
| 50 | Buffalo-Cheektowaga-Niagara Falls, NY |
| 51 | Burlington, NC |
| 52 | Burlington-South Burlington, VT |
| 53 | California-Lexington Park, MD |
| 54 | Canton-Massillon, OH |
| 55 | Cape Coral-Fort Myers, FL |
| 56 | Cape Girardeau, MO-IL |
| 57 | Carbondale-Marion, IL |
| 58 | Carson City, NV |
| 59 | Casper, WY |
| 60 | Cedar Rapids, IA |
| 61 | Chambersburg-Waynesboro, PA |
| 62 | Champaign-Urbana, IL |
| 63 | Charleston, WV |
| 64 | Charleston-North Charleston, SC |
| 65 | Charlotte-Concord-Gastonia, NC-SC |
| 66 | Charlottesville, VA |
| 67 | Chattanooga, TN-GA |
| 68 | Cheyenne, WY |
| 69 | Chicago-Naperville-Elgin, IL-IN-WI |
| 70 | Chico, CA |
| 71 | Cincinnati, OH-KY-IN |
| 72 | Clarksville, TN-KY |
| 73 | Cleveland, TN |
| 74 | Cleveland-Elyria, OH |
| 75 | Coeur d’Alene, ID |
| 76 | College Station-Bryan, TX |
| 77 | Colorado Springs, CO |
| 78 | Columbia, MO |
| 79 | Columbia, SC |
| 80 | Columbus, GA-AL |
| 81 | Columbus, IN |
| 82 | Columbus, OH |
| 83 | Corpus Christi, TX |
| 84 | Corvallis, OR |
| 85 | Crestview-Fort Walton Beach-Destin, FL |
| 86 | Cumberland, MD-WV |
| 87 | Dallas-Fort Worth-Arlington, TX |
| 88 | Dalton, GA |
| 89 | Danville, IL |
| 90 | Daphne-Fairhope-Foley, AL |
| 91 | Davenport-Moline-Rock Island, IA-IL |
| 92 | Dayton, OH |
| 93 | Decatur, AL |
| 94 | Decatur, IL |
| 95 | Deltona-Daytona Beach-Ormond Beach, FL |
| 96 | Denver-Aurora-Lakewood, CO |
| 97 | Des Moines-West Des Moines, IA |
| 98 | Detroit-Warren-Dearborn, MI |
| 99 | Dothan, AL |
| 100 | Dover, DE |
| 101 | Dubuque, IA |
| 102 | Duluth, MN-WI |
| 103 | Durham-Chapel Hill, NC |
| 104 | East Stroudsburg, PA |
| 105 | Eau Claire, WI |
| 106 | El Centro, CA |
| 107 | Elizabethtown-Fort Knox, KY |
| 108 | Elkhart-Goshen, IN |
| 109 | Elmira, NY |
| 110 | El Paso, TX |
| 111 | Enid, OK |
| 112 | Erie, PA |
| 113 | Eugene, OR |
| 114 | Evansville, IN-KY |
| 115 | Fairbanks, AK |
| 116 | Fargo, ND-MN |
| 117 | Farmington, NM |
| 118 | Fayetteville, NC |
| 119 | Fayetteville-Springdale-Rogers, AR-MO |
| 120 | Flagstaff, AZ |
| 121 | Flint, MI |
| 122 | Florence, SC |
| 123 | Florence-Muscle Shoals, AL |
| 124 | Fond du Lac, WI |
| 125 | Fort Collins, CO |
| 126 | Fort Smith, AR-OK |
| 127 | Fort Wayne, IN |
| 128 | Fresno, CA |
| 129 | Gadsden, AL |
| 130 | Gainesville, FL |
| 131 | Gainesville, GA |
| 132 | Gettysburg, PA |
| 133 | Glens Falls, NY |
| 134 | Goldsboro, NC |
| 135 | Grand Forks, ND-MN |
| 136 | Grand Island, NE |
| 137 | Grand Junction, CO |
| 138 | Grand Rapids-Wyoming, MI |
| 139 | Grants Pass, OR |
| 140 | Great Falls, MT |
| 141 | Greeley, CO |
| 142 | Green Bay, WI |
| 143 | Greensboro-High Point, NC |
| 144 | Greenville, NC |
| 145 | Greenville-Anderson-Mauldin, SC |
| 146 | Gulfport-Biloxi-Pascagoula, MS |
| 147 | Hagerstown-Martinsburg, MD-WV |
| 148 | Hammond, LA |
| 149 | Hanford-Corcoran, CA |
| 150 | Harrisburg-Carlisle, PA |
| 151 | Harrisonburg, VA |
| 152 | Hartford-West Hartford-East Hartford, CT |
| 153 | Hattiesburg, MS |
| 154 | Hickory-Lenoir-Morganton, NC |
| 155 | Hilton Head Island-Bluffton-Beaufort, SC |
| 156 | Hinesville, GA |
| 157 | Homosassa Springs, FL |
| 158 | Hot Springs, AR |
| 159 | Houma-Thibodaux, LA |
| 160 | Houston-The Woodlands-Sugar Land, TX |
| 161 | Huntington-Ashland, WV-KY-OH |
| 162 | Huntsville, AL |
| 163 | Idaho Falls, ID |
| 164 | Indianapolis-Carmel-Anderson, IN |
| 165 | Iowa City, IA |
| 166 | Ithaca, NY |
| 167 | Jackson, MI |
| 168 | Jackson, MS |
| 169 | Jackson, TN |
| 170 | Jacksonville, FL |
| 171 | Jacksonville, NC |
| 172 | Janesville-Beloit, WI |
| 173 | Jefferson City, MO |
| 174 | Johnson City, TN |
| 175 | Johnstown, PA |
| 176 | Jonesboro, AR |
| 177 | Joplin, MO |
| 178 | Kahului-Wailuku-Lahaina, HI |
| 179 | Kalamazoo-Portage, MI |
| 180 | Kankakee, IL |
| 181 | Kansas City, MO-KS |
| 182 | Kennewick-Richland, WA |
| 183 | Killeen-Temple, TX |
| 184 | Kingsport-Bristol-Bristol, TN-VA |
| 185 | Kingston, NY |
| 186 | Knoxville, TN |
| 187 | Kokomo, IN |
| 188 | La Crosse-Onalaska, WI-MN |
| 189 | Lafayette, LA |
| 190 | Lafayette-West Lafayette, IN |
| 191 | Lake Charles, LA |
| 192 | Lake Havasu City-Kingman, AZ |
| 193 | Lakeland-Winter Haven, FL |
| 194 | Lancaster, PA |
| 195 | Lansing-East Lansing, MI |
| 196 | Laredo, TX |
| 197 | Las Cruces, NM |
| 198 | Las Vegas-Henderson-Paradise, NV |
| 199 | Lawrence, KS |
| 200 | Lawton, OK |
| 201 | Lebanon, PA |
| 202 | Lewiston, ID-WA |
| 203 | Lewiston-Auburn, ME |
| 204 | Lexington-Fayette, KY |
| 205 | Lima, OH |
| 206 | Lincoln, NE |
| 207 | Little Rock-North Little Rock-Conway, AR |
| 208 | Logan, UT-ID |
| 209 | Longview, TX |
| 210 | Longview, WA |
| 211 | Los Angeles-Long Beach-Anaheim, CA |
| 212 | Louisville/Jefferson County, KY-IN |
| 213 | Lubbock, TX |
| 214 | Lynchburg, VA |
| 215 | Macon-Bibb County, GA |
| 216 | Madera, CA |
| 217 | Madison, WI |
| 218 | Manchester-Nashua, NH |
| 219 | Manhattan, KS |
| 220 | Mankato-North Mankato, MN |
| 221 | Mansfield, OH |
| 222 | McAllen-Edinburg-Mission, TX |
| 223 | Medford, OR |
| 224 | Memphis, TN-MS-AR |
| 225 | Merced, CA |
| 226 | Miami-Fort Lauderdale-West Palm Beach, FL |
| 227 | Michigan City-La Porte, IN |
| 228 | Midland, MI |
| 229 | Midland, TX |
| 230 | Milwaukee-Waukesha-West Allis, WI |
| 231 | Minneapolis-St. Paul-Bloomington, MN-WI |
| 232 | Missoula, MT |
| 233 | Mobile, AL |
| 234 | Modesto, CA |
| 235 | Monroe, LA |
| 236 | Monroe, MI |
| 237 | Montgomery, AL |
| 238 | Morgantown, WV |
| 239 | Morristown, TN |
| 240 | Mount Vernon-Anacortes, WA |
| 241 | Muncie, IN |
| 242 | Muskegon, MI |
| 243 | Myrtle Beach-Conway-North Myrtle Beach, SC-NC |
| 244 | Napa, CA |
| 245 | Naples-Immokalee-Marco Island, FL |
| 246 | Nashville-Davidson–Murfreesboro–Franklin, TN |
| 247 | New Bern, NC |
| 248 | New Haven-Milford, CT |
| 249 | New Orleans-Metairie, LA |
| 250 | New York-Newark-Jersey City, NY-NJ-PA |
| 251 | Niles-Benton Harbor, MI |
| 252 | North Port-Sarasota-Bradenton, FL |
| 253 | Norwich-New London, CT |
| 254 | Ocala, FL |
| 255 | Ocean City, NJ |
| 256 | Odessa, TX |
| 257 | Ogden-Clearfield, UT |
| 258 | Oklahoma City, OK |
| 259 | Olympia-Tumwater, WA |
| 260 | Omaha-Council Bluffs, NE-IA |
| 261 | Orlando-Kissimmee-Sanford, FL |
| 262 | Oshkosh-Neenah, WI |
| 263 | Owensboro, KY |
| 264 | Oxnard-Thousand Oaks-Ventura, CA |
| 265 | Palm Bay-Melbourne-Titusville, FL |
| 266 | Panama City, FL |
| 267 | Parkersburg-Vienna, WV |
| 268 | Pensacola-Ferry Pass-Brent, FL |
| 269 | Peoria, IL |
| 270 | Philadelphia-Camden-Wilmington, PA-NJ-DE-MD |
| 271 | Phoenix-Mesa-Scottsdale, AZ |
| 272 | Pine Bluff, AR |
| 273 | Pittsburgh, PA |
| 274 | Pittsfield, MA |
| 275 | Pocatello, ID |
| 276 | Portland-South Portland, ME |
| 277 | Portland-Vancouver-Hillsboro, OR-WA |
| 278 | Port St. Lucie, FL |
| 279 | Prescott, AZ |
| 280 | Providence-Warwick, RI-MA |
| 281 | Provo-Orem, UT |
| 282 | Pueblo, CO |
| 283 | Punta Gorda, FL |
| 284 | Racine, WI |
| 285 | Raleigh, NC |
| 286 | Rapid City, SD |
| 287 | Reading, PA |
| 288 | Redding, CA |
| 289 | Reno, NV |
| 290 | Richmond, VA |
| 291 | Riverside-San Bernardino-Ontario, CA |
| 292 | Roanoke, VA |
| 293 | Rochester, MN |
| 294 | Rochester, NY |
| 295 | Rockford, IL |
| 296 | Rocky Mount, NC |
| 297 | Rome, GA |
| 298 | Sacramento–Roseville–Arden-Arcade, CA |
| 299 | Saginaw, MI |
| 300 | St. Cloud, MN |
| 301 | St. George, UT |
| 302 | St. Joseph, MO-KS |
| 303 | St. Louis, MO-IL |
| 304 | Salem, OR |
| 305 | Salinas, CA |
| 306 | Salisbury, MD-DE |
| 307 | Salt Lake City, UT |
| 308 | San Angelo, TX |
| 309 | San Antonio-New Braunfels, TX |
| 310 | San Diego-Carlsbad, CA |
| 311 | San Francisco-Oakland-Hayward, CA |
| 312 | San Jose-Sunnyvale-Santa Clara, CA |
| 313 | San Luis Obispo-Paso Robles-Arroyo Grande, CA |
| 314 | Santa Cruz-Watsonville, CA |
| 315 | Santa Fe, NM |
| 316 | Santa Maria-Santa Barbara, CA |
| 317 | Santa Rosa, CA |
| 318 | Savannah, GA |
| 319 | Scranton–Wilkes-Barre–Hazleton, PA |
| 320 | Seattle-Tacoma-Bellevue, WA |
| 321 | Sebastian-Vero Beach, FL |
| 322 | Sebring, FL |
| 323 | Sheboygan, WI |
| 324 | Sherman-Denison, TX |
| 325 | Shreveport-Bossier City, LA |
| 326 | Sierra Vista-Douglas, AZ |
| 327 | Sioux City, IA-NE-SD |
| 328 | Sioux Falls, SD |
| 329 | South Bend-Mishawaka, IN-MI |
| 330 | Spartanburg, SC |
| 331 | Spokane-Spokane Valley, WA |
| 332 | Springfield, IL |
| 333 | Springfield, MA |
| 334 | Springfield, MO |
| 335 | Springfield, OH |
| 336 | State College, PA |
| 337 | Staunton-Waynesboro, VA |
| 338 | Stockton-Lodi, CA |
| 339 | Sumter, SC |
| 340 | Syracuse, NY |
| 341 | Tallahassee, FL |
| 342 | Tampa-St. Petersburg-Clearwater, FL |
| 343 | Terre Haute, IN |
| 344 | Texarkana, TX-AR |
| 345 | The Villages, FL |
| 346 | Toledo, OH |
| 347 | Topeka, KS |
| 348 | Trenton, NJ |
| 349 | Tucson, AZ |
| 350 | Tulsa, OK |
| 351 | Tuscaloosa, AL |
| 352 | Tyler, TX |
| 353 | Urban Honolulu, HI |
| 354 | Utica-Rome, NY |
| 355 | Valdosta, GA |
| 356 | Vallejo-Fairfield, CA |
| 357 | Victoria, TX |
| 358 | Vineland-Bridgeton, NJ |
| 359 | Virginia Beach-Norfolk-Newport News, VA-NC |
| 360 | Visalia-Porterville, CA |
| 361 | Waco, TX |
| 362 | Walla Walla, WA |
| 363 | Warner Robins, GA |
| 364 | Washington-Arlington-Alexandria, DC-VA-MD-WV |
| 365 | Waterloo-Cedar Falls, IA |
| 366 | Watertown-Fort Drum, NY |
| 367 | Wausau, WI |
| 368 | Weirton-Steubenville, WV-OH |
| 369 | Wenatchee, WA |
| 370 | Wheeling, WV-OH |
| 371 | Wichita, KS |
| 372 | Wichita Falls, TX |
| 373 | Williamsport, PA |
| 374 | Wilmington, NC |
| 375 | Winchester, VA-WV |
| 376 | Winston-Salem, NC |
| 377 | Worcester, MA-CT |
| 378 | Yakima, WA |
| 379 | York-Hanover, PA |
| 380 | Youngstown-Warren-Boardman, OH-PA |
| 381 | Yuba City, CA |
| 382 | Yuma, AZ |
Summary
TBD…