Welcome to the draft of my final project!

Introduction

Introduction

Which city or state is the ideal location for a MSBA student to work and live? Nowadays Big Data is a really hot topic. Accordingly, there is a big demand of data related jobs everywhere in USA, so it is difficult but necessary to choose the most suitable place. Wages should be considered at first, but other factors like Price Parity are also very important, because no matter how high a person’s wage is, his living quality can be reduced due to big living expenditure.

Why am I so interested in this problem? Because as a STEM program student, I plan to work in USA for a few years. Unlike many other Chinese students who go back to China right after graduation, I do believe that working and living in US will be one of the most valuable treasures in my life!

Packages Required

Packages Required

To reproduce the code and results throughout this project you will need to load the following packages.

library(tidyverse) #visualizeing, transforming, inputing, tidying and joining data
library(readxl)    #inputing data from Excel
library(plotly)    #showing data and plots at the same time
library(ggmap)     #loading the function of maping
library(maps)      #loading maps
library(mapdata)   #loading map data
library(DT)        #previewing the data sets

Data Preparation

Data Preparation

I got data sets from different resources. I got Wage Data by State and by Job Characterisrics from Bureau of Labor Statistics(BLS), State Price Parity from Coursera, Metropolitan Area Price Parity from Bureau of Economic Analysis(BEA), State Longitude and Latitude from Ink Plant Code, Metropolitan Area Longitude and Latitude from StatCrunch.

Then I read these files into R.

excel_sheets("Rfinalprojectdata/mwe_2016complete.xlsx")
## [1] "File Information" "MWE_2016"
messwage <- read_excel("Rfinalprojectdata/mwe_2016complete.xlsx", sheet ="MWE_2016")
statelonglat <- read_excel("Rfinalprojectdata/longlat.xlsx")
metro <- read_excel("Rfinalprojectdata/metro.xlsx")
state_price_parity <- read_excel("Rfinalprojectdata/Coursera_price_parity.xlsx")
metro_price_parity <- read_excel("Rfinalprojectdata/metro_price_parity.xlsx")

The data is not very messy so I just need to select the variables I want to analyze.

Preview of the clean data sets.

library(DT)
datatable(head(wage,50))

Exploratory Data Analysis

Exploratory Data Analysis

1. the Wages of Data Related Jobs Differ on National Level

First, on national level, I draw some boxplots to analyze the differences of Average Hourly Wage. Of course Marketing Managers have the highest wages, and then Software and Information jobs, and then Analysts.

wage %>%
  group_by(`Occupation Text`) %>%
  filter(grepl('National',`Area Level`)) %>%
  filter(grepl('Analysts|Data|Intelligence|Software|Machine Learning|Marketing Manager|
               Solutions|Consultant',`Occupation Text`)) %>%
  filter(!grepl('Keyers',`Occupation Text`)) %>%
  ggplot(aes(reorder(`Occupation Text`, `Average Hourly Wage`), `Average Hourly Wage`)) +
  labs(x = 'Occupation') +
  geom_boxplot() +
  scale_y_continuous() +
  ggtitle("Wage for Data Jobs") +
  theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))+
  coord_flip()

2. the Wages of Data Related Jobs Differ on State Level

Computer System Analysts have obersavations with the biggest quantity of states, which will be easier to compare the differences, so I take this job title for example.

csAnalysts <-
wage %>%
  group_by(`region`) %>%
  filter(grepl('State',`Area Level`)) %>%
  filter(grepl('Computer Systems Analysts',`Occupation Text`)) %>%
  summarise(median_wage = median(`Average Hourly Wage`)) %>%
  left_join(statelonglat, by = "region")

ggplot(data = states) + 
  geom_polygon(aes(x = long, y = lat, group = group), fill = "grey40", color = "white") +
  geom_point( data=csAnalysts, aes(x = long, y = lat, size=median_wage, color = median_wage)) +
  scale_size( name="median_wage") +
  scale_colour_gradientn(colours=rainbow(4))+
  geom_text(data = csAnalysts,aes(x = long, y = lat, label = region), 
            size = 1.8, vjust = 0, hjust = 0)+
  coord_fixed(1.3) +
  guides(fill=FALSE) +
  labs(x="Longitute", y="Latitude") +
  ggtitle("Computer System Analyst Wages Across States") +
  theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))

It seems that Computer System Analysts in New Jersey State and New York State are really paid well, right? But now let me take the Price Parity into consideration.

adjustwage <-
  wage %>%
  select(`Average Hourly Wage`, "Area Level", "region", "Occupation Text") %>%
  left_join(state_price_parity) %>%
  mutate(newwage = `Average Hourly Wage`/`Price Parity`*100)

AdjustcsAnalysts <-
adjustwage %>%
  group_by(`region`) %>%
  filter(grepl('State',`Area Level`)) %>%
  filter(grepl('Computer Systems Analysts',`Occupation Text`)) %>%
  summarise(median_newwage = median(newwage)) %>%
  left_join(statelonglat, by = "region")

ggplot(data = states) + 
  geom_polygon(aes(x = long, y = lat, group = group), fill = "grey40", color = "white") +
  geom_point( data=AdjustcsAnalysts, aes(x = long, y = lat, size=median_newwage, color = median_newwage)) +
  scale_size( name="median_newwage") +
  scale_colour_gradientn(colours=rainbow(4))+
  geom_text(data = AdjustcsAnalysts,aes(x = long, y = lat, label = region), 
            size = 1.8, vjust = 0, hjust = 0) +
  coord_fixed(1.3) +
  guides(fill=FALSE) +
  labs(x="Longitute", y="Latitude") +
  ggtitle("Adjusted Computer System Analyst Wages Across States") +
  theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))

So it is a totally different story now. We found that the Computer System analysts in Missouri and Ohio are the luckiest guys, right?

3. the Wages of Data Related Jobs Differ on Metropolitan Area Level

So now let’s have a look at Metropolitan Area.

metroadjustwage <-
  wage %>%
  select(`Average Hourly Wage`, "Area Level", "region", "Occupation Text") %>%
  filter(grepl('Metro area',`Area Level`)) %>%
  left_join(metro_price_parity) %>%
  mutate(metronewwage = `Average Hourly Wage`/`Price Parity`*100)

MetroAdjustcsAnalysts <-
metroadjustwage %>%
  group_by(`region`, `ID`) %>%
  filter(grepl('Metro area',`Area Level`)) %>%
  filter(grepl('Computer Systems Analysts',`Occupation Text`)) %>%
  summarise(median_metronewwage = median(metronewwage)) %>%
  left_join(metro, by = "region")

ggplot(data = states) + 
  geom_polygon(aes(x = long, y = lat, group = group), fill = "grey40", color = "white") +
  geom_point( data = MetroAdjustcsAnalysts, aes(x = long, y = lat, size=median_metronewwage, color = median_metronewwage)) +
  scale_size( name="median_newwage") +
  scale_colour_gradientn(colours=rainbow(4))+
  geom_text(data = MetroAdjustcsAnalysts,aes(x = long, y = lat, label =  ID), 
            size = 1.2, vjust = 0.5, hjust = 0.5) +
  coord_fixed(1.3) +
  guides(fill=FALSE) +
  labs(x="Longitute", y="Latitude") +
  ggtitle("Adjusted Computer System Analyst Wages Across Metro Areas") +
  theme(plot.title = element_text(lineheight=.8, face="bold", hjust=0.5))
## Warning: Removed 5 rows containing missing values (geom_point).
## Warning: Removed 5 rows containing missing values (geom_text).

So Computer System Analysts in St.Louis, MO-IL, and Cincinnati, OH-KY-IN are best paid. And here I attach the Metropolitan Area list with ID.

ID Metropolitan Area
1 Abilene, TX
2 Akron, OH
3 Albany, GA
4 Albany, OR
5 Albany-Schenectady-Troy, NY
6 Albuquerque, NM
7 Alexandria, LA
8 Allentown-Bethlehem-Easton, PA-NJ
9 Altoona, PA
10 Amarillo, TX
11 Ames, IA
12 Anchorage, AK
13 Ann Arbor, MI
14 Anniston-Oxford-Jacksonville, AL
15 Appleton, WI
16 Asheville, NC
17 Athens-Clarke County, GA
18 Atlanta-Sandy Springs-Roswell, GA
19 Atlantic City-Hammonton, NJ
20 Auburn-Opelika, AL
21 Augusta-Richmond County, GA-SC
22 Austin-Round Rock, TX
23 Bakersfield, CA
24 Baltimore-Columbia-Towson, MD
25 Bangor, ME
26 Barnstable Town, MA
27 Baton Rouge, LA
28 Battle Creek, MI
29 Bay City, MI
30 Beaumont-Port Arthur, TX
31 Beckley, WV
32 Bellingham, WA
33 Bend-Redmond, OR
34 Billings, MT
35 Binghamton, NY
36 Birmingham-Hoover, AL
37 Bismarck, ND
38 Blacksburg-Christiansburg-Radford, VA
39 Bloomington, IL
40 Bloomington, IN
41 Bloomsburg-Berwick, PA
42 Boise City, ID
43 Boston-Cambridge-Newton, MA-NH
44 Boulder, CO
45 Bowling Green, KY
46 Bremerton-Silverdale, WA
47 Bridgeport-Stamford-Norwalk, CT
48 Brownsville-Harlingen, TX
49 Brunswick, GA
50 Buffalo-Cheektowaga-Niagara Falls, NY
51 Burlington, NC
52 Burlington-South Burlington, VT
53 California-Lexington Park, MD
54 Canton-Massillon, OH
55 Cape Coral-Fort Myers, FL
56 Cape Girardeau, MO-IL
57 Carbondale-Marion, IL
58 Carson City, NV
59 Casper, WY
60 Cedar Rapids, IA
61 Chambersburg-Waynesboro, PA
62 Champaign-Urbana, IL
63 Charleston, WV
64 Charleston-North Charleston, SC
65 Charlotte-Concord-Gastonia, NC-SC
66 Charlottesville, VA
67 Chattanooga, TN-GA
68 Cheyenne, WY
69 Chicago-Naperville-Elgin, IL-IN-WI
70 Chico, CA
71 Cincinnati, OH-KY-IN
72 Clarksville, TN-KY
73 Cleveland, TN
74 Cleveland-Elyria, OH
75 Coeur d’Alene, ID
76 College Station-Bryan, TX
77 Colorado Springs, CO
78 Columbia, MO
79 Columbia, SC
80 Columbus, GA-AL
81 Columbus, IN
82 Columbus, OH
83 Corpus Christi, TX
84 Corvallis, OR
85 Crestview-Fort Walton Beach-Destin, FL
86 Cumberland, MD-WV
87 Dallas-Fort Worth-Arlington, TX
88 Dalton, GA
89 Danville, IL
90 Daphne-Fairhope-Foley, AL
91 Davenport-Moline-Rock Island, IA-IL
92 Dayton, OH
93 Decatur, AL
94 Decatur, IL
95 Deltona-Daytona Beach-Ormond Beach, FL
96 Denver-Aurora-Lakewood, CO
97 Des Moines-West Des Moines, IA
98 Detroit-Warren-Dearborn, MI
99 Dothan, AL
100 Dover, DE
101 Dubuque, IA
102 Duluth, MN-WI
103 Durham-Chapel Hill, NC
104 East Stroudsburg, PA
105 Eau Claire, WI
106 El Centro, CA
107 Elizabethtown-Fort Knox, KY
108 Elkhart-Goshen, IN
109 Elmira, NY
110 El Paso, TX
111 Enid, OK
112 Erie, PA
113 Eugene, OR
114 Evansville, IN-KY
115 Fairbanks, AK
116 Fargo, ND-MN
117 Farmington, NM
118 Fayetteville, NC
119 Fayetteville-Springdale-Rogers, AR-MO
120 Flagstaff, AZ
121 Flint, MI
122 Florence, SC
123 Florence-Muscle Shoals, AL
124 Fond du Lac, WI
125 Fort Collins, CO
126 Fort Smith, AR-OK
127 Fort Wayne, IN
128 Fresno, CA
129 Gadsden, AL
130 Gainesville, FL
131 Gainesville, GA
132 Gettysburg, PA
133 Glens Falls, NY
134 Goldsboro, NC
135 Grand Forks, ND-MN
136 Grand Island, NE
137 Grand Junction, CO
138 Grand Rapids-Wyoming, MI
139 Grants Pass, OR
140 Great Falls, MT
141 Greeley, CO
142 Green Bay, WI
143 Greensboro-High Point, NC
144 Greenville, NC
145 Greenville-Anderson-Mauldin, SC
146 Gulfport-Biloxi-Pascagoula, MS
147 Hagerstown-Martinsburg, MD-WV
148 Hammond, LA
149 Hanford-Corcoran, CA
150 Harrisburg-Carlisle, PA
151 Harrisonburg, VA
152 Hartford-West Hartford-East Hartford, CT
153 Hattiesburg, MS
154 Hickory-Lenoir-Morganton, NC
155 Hilton Head Island-Bluffton-Beaufort, SC
156 Hinesville, GA
157 Homosassa Springs, FL
158 Hot Springs, AR
159 Houma-Thibodaux, LA
160 Houston-The Woodlands-Sugar Land, TX
161 Huntington-Ashland, WV-KY-OH
162 Huntsville, AL
163 Idaho Falls, ID
164 Indianapolis-Carmel-Anderson, IN
165 Iowa City, IA
166 Ithaca, NY
167 Jackson, MI
168 Jackson, MS
169 Jackson, TN
170 Jacksonville, FL
171 Jacksonville, NC
172 Janesville-Beloit, WI
173 Jefferson City, MO
174 Johnson City, TN
175 Johnstown, PA
176 Jonesboro, AR
177 Joplin, MO
178 Kahului-Wailuku-Lahaina, HI
179 Kalamazoo-Portage, MI
180 Kankakee, IL
181 Kansas City, MO-KS
182 Kennewick-Richland, WA
183 Killeen-Temple, TX
184 Kingsport-Bristol-Bristol, TN-VA
185 Kingston, NY
186 Knoxville, TN
187 Kokomo, IN
188 La Crosse-Onalaska, WI-MN
189 Lafayette, LA
190 Lafayette-West Lafayette, IN
191 Lake Charles, LA
192 Lake Havasu City-Kingman, AZ
193 Lakeland-Winter Haven, FL
194 Lancaster, PA
195 Lansing-East Lansing, MI
196 Laredo, TX
197 Las Cruces, NM
198 Las Vegas-Henderson-Paradise, NV
199 Lawrence, KS
200 Lawton, OK
201 Lebanon, PA
202 Lewiston, ID-WA
203 Lewiston-Auburn, ME
204 Lexington-Fayette, KY
205 Lima, OH
206 Lincoln, NE
207 Little Rock-North Little Rock-Conway, AR
208 Logan, UT-ID
209 Longview, TX
210 Longview, WA
211 Los Angeles-Long Beach-Anaheim, CA
212 Louisville/Jefferson County, KY-IN
213 Lubbock, TX
214 Lynchburg, VA
215 Macon-Bibb County, GA
216 Madera, CA
217 Madison, WI
218 Manchester-Nashua, NH
219 Manhattan, KS
220 Mankato-North Mankato, MN
221 Mansfield, OH
222 McAllen-Edinburg-Mission, TX
223 Medford, OR
224 Memphis, TN-MS-AR
225 Merced, CA
226 Miami-Fort Lauderdale-West Palm Beach, FL
227 Michigan City-La Porte, IN
228 Midland, MI
229 Midland, TX
230 Milwaukee-Waukesha-West Allis, WI
231 Minneapolis-St. Paul-Bloomington, MN-WI
232 Missoula, MT
233 Mobile, AL
234 Modesto, CA
235 Monroe, LA
236 Monroe, MI
237 Montgomery, AL
238 Morgantown, WV
239 Morristown, TN
240 Mount Vernon-Anacortes, WA
241 Muncie, IN
242 Muskegon, MI
243 Myrtle Beach-Conway-North Myrtle Beach, SC-NC
244 Napa, CA
245 Naples-Immokalee-Marco Island, FL
246 Nashville-Davidson–Murfreesboro–Franklin, TN
247 New Bern, NC
248 New Haven-Milford, CT
249 New Orleans-Metairie, LA
250 New York-Newark-Jersey City, NY-NJ-PA
251 Niles-Benton Harbor, MI
252 North Port-Sarasota-Bradenton, FL
253 Norwich-New London, CT
254 Ocala, FL
255 Ocean City, NJ
256 Odessa, TX
257 Ogden-Clearfield, UT
258 Oklahoma City, OK
259 Olympia-Tumwater, WA
260 Omaha-Council Bluffs, NE-IA
261 Orlando-Kissimmee-Sanford, FL
262 Oshkosh-Neenah, WI
263 Owensboro, KY
264 Oxnard-Thousand Oaks-Ventura, CA
265 Palm Bay-Melbourne-Titusville, FL
266 Panama City, FL
267 Parkersburg-Vienna, WV
268 Pensacola-Ferry Pass-Brent, FL
269 Peoria, IL
270 Philadelphia-Camden-Wilmington, PA-NJ-DE-MD
271 Phoenix-Mesa-Scottsdale, AZ
272 Pine Bluff, AR
273 Pittsburgh, PA
274 Pittsfield, MA
275 Pocatello, ID
276 Portland-South Portland, ME
277 Portland-Vancouver-Hillsboro, OR-WA
278 Port St. Lucie, FL
279 Prescott, AZ
280 Providence-Warwick, RI-MA
281 Provo-Orem, UT
282 Pueblo, CO
283 Punta Gorda, FL
284 Racine, WI
285 Raleigh, NC
286 Rapid City, SD
287 Reading, PA
288 Redding, CA
289 Reno, NV
290 Richmond, VA
291 Riverside-San Bernardino-Ontario, CA
292 Roanoke, VA
293 Rochester, MN
294 Rochester, NY
295 Rockford, IL
296 Rocky Mount, NC
297 Rome, GA
298 Sacramento–Roseville–Arden-Arcade, CA
299 Saginaw, MI
300 St. Cloud, MN
301 St. George, UT
302 St. Joseph, MO-KS
303 St. Louis, MO-IL
304 Salem, OR
305 Salinas, CA
306 Salisbury, MD-DE
307 Salt Lake City, UT
308 San Angelo, TX
309 San Antonio-New Braunfels, TX
310 San Diego-Carlsbad, CA
311 San Francisco-Oakland-Hayward, CA
312 San Jose-Sunnyvale-Santa Clara, CA
313 San Luis Obispo-Paso Robles-Arroyo Grande, CA
314 Santa Cruz-Watsonville, CA
315 Santa Fe, NM
316 Santa Maria-Santa Barbara, CA
317 Santa Rosa, CA
318 Savannah, GA
319 Scranton–Wilkes-Barre–Hazleton, PA
320 Seattle-Tacoma-Bellevue, WA
321 Sebastian-Vero Beach, FL
322 Sebring, FL
323 Sheboygan, WI
324 Sherman-Denison, TX
325 Shreveport-Bossier City, LA
326 Sierra Vista-Douglas, AZ
327 Sioux City, IA-NE-SD
328 Sioux Falls, SD
329 South Bend-Mishawaka, IN-MI
330 Spartanburg, SC
331 Spokane-Spokane Valley, WA
332 Springfield, IL
333 Springfield, MA
334 Springfield, MO
335 Springfield, OH
336 State College, PA
337 Staunton-Waynesboro, VA
338 Stockton-Lodi, CA
339 Sumter, SC
340 Syracuse, NY
341 Tallahassee, FL
342 Tampa-St. Petersburg-Clearwater, FL
343 Terre Haute, IN
344 Texarkana, TX-AR
345 The Villages, FL
346 Toledo, OH
347 Topeka, KS
348 Trenton, NJ
349 Tucson, AZ
350 Tulsa, OK
351 Tuscaloosa, AL
352 Tyler, TX
353 Urban Honolulu, HI
354 Utica-Rome, NY
355 Valdosta, GA
356 Vallejo-Fairfield, CA
357 Victoria, TX
358 Vineland-Bridgeton, NJ
359 Virginia Beach-Norfolk-Newport News, VA-NC
360 Visalia-Porterville, CA
361 Waco, TX
362 Walla Walla, WA
363 Warner Robins, GA
364 Washington-Arlington-Alexandria, DC-VA-MD-WV
365 Waterloo-Cedar Falls, IA
366 Watertown-Fort Drum, NY
367 Wausau, WI
368 Weirton-Steubenville, WV-OH
369 Wenatchee, WA
370 Wheeling, WV-OH
371 Wichita, KS
372 Wichita Falls, TX
373 Williamsport, PA
374 Wilmington, NC
375 Winchester, VA-WV
376 Winston-Salem, NC
377 Worcester, MA-CT
378 Yakima, WA
379 York-Hanover, PA
380 Youngstown-Warren-Boardman, OH-PA
381 Yuba City, CA
382 Yuma, AZ

Summary

Summary

TBD…