! Alt text
There is a quote, “If you want to catch a fish, you have to go where the fish are!”
The question that my teammate Raj and I will attempt to answer is “What places are ideal for technology and management people with families to live?” There is always a big demand of technology and management skills everywhere but there are lots of factors that go into making a place an ideal place. Some of the key considerations is always availability of ample work opportunities along with strong wage support. But stronger wages in some areas also come with higher cost of living (like California, Boston or NY) which can be indicated by Consumer Price Index to taxes in certain areas. Along with these, other factors like number of other families, diversity also play a big role in bring up a family. In addition to these, we also plan to search and analyze some additional parameters like availability of recreational facilities like parks in the area.
We will load the data through CSV, web-scraping and API’s. 1. Wage Date: https://www.bls.gov/mwe/ 2. Consumer Price Index Data: https://www.bls.gov/cpi/ 3. Unemployment Data: https://www.bls.gov/web/laus/laumstrk.htm 4. State and National Parks: https://en.wikipedia.org/wiki/Lists_of_state_parks_by_U.S._state 5. Census Data: https://www.census.gov/data/datasets/2017/demo/popest/nation-detail.html#ds 6. Average House Prices: https://www.statisticbrain.com/home-sales-average-price/ 7. Job Search Engines: https://wwww.indeed.com
Once the data is loaded we will perform Exploratory Data Analysis(EDA) to narrow our hypothesis, look for missing values, outliers and to get an overall feel of the data. We are going to look at descriptive statistics as well as visualizations such as scatter plots and/ or histograms.
To clean, tidy and prepare the data for downstream analysis we will be using assorted R packages such as tidyr, tidyverse, dplyr, devtools, DSR, etc. Subsequently, in building a neural network model we will attempt to see relationship between various factors such as affordability, taxes, recreational parks, crime records, houses and properties values employment and preferred places to live.
Acquiring the data (web-scraping, API’s) Data verification for missing values Building a neural network between variables
RPubs link Github repo link containing a readme file