Introduction
“A simple life, is a happy life” is something you may or may have not heard before. This saying probably doesn’t apply to most people, as a majority of the population probably would not be happy with a simple life and are pretty complex dynamic individuals. Me personally, being at the age where I should be settling down and having a family, I find myself unhappy with the areas I have lived in for one reason or another. There are so many things to consider when looking for a place to settle down and a lot of thought should go into it. Even physically going to a place to scout it out, you can’t really see the things that may matter most to you sometimes.
Use Interactive Maps!
There are two tools to explore below. One is the interactive Tableau dashboard that I think works best as it allows you to adjust all the variables at once. The second tool is an application made in a package called shiny. This tool is good for visually exploring each variable one at a time.
Tableau Dashboard

Click [HERE] to use the Tableau Dashboard located in Tableau Public!
For mobile version, click [HERE]!
Note: If the map goes completely black during use, this means there are no states fitting the range selected. You will need to readjust the ranges for the map to reappear.
Interactive Shiny Application
Select a variable in the map below for overall intensity.
Data Definitions
- GEOID: This gives each state a unique id number.
- NAME: Name of the state.
- population: Population per state.
- Income: Average income per state.
- geometry: A spatial feature that helps with mapping.
- labor_force: Amount of people in the state that can work, both employed and unemployed.
- unempl: Amount of people unemployed in the state.
- unempl_perc: Percentage of people unemployed in the state.
- avg_temp_f: Average temperature for each state in Fahrenheit.
- avg_snow_in: Average amount of snowfall in inches per year by state.
- cost_index: The cost of living index by state. The higher it is the more expensive.
- air_quality: The air quality rating per state. The lower the number the better.
- school_rank: The overall quality of public schools by state rankings. The lower the value the better the school systems.
- crime_rate: The amount of crime by state. The higher the number the worse the crime rates.
- diversity: Created value combining the number of races present in the state and proportions. The higher the score, the more diverse the state is.
- median_age: The median age for each state.
- politic: The political stance in the previous presidential elections. Republican is represented as “0” and democratic as “1”.
- scenery: This is the ranking for each states scenery. The lower the number, the better the state scored for scenery.
Goal of This Project
This lead me to wanting to build the perfect tool for examining states as a whole and get the most insight as to which areas are a good fit for me. I could not find the perfect data set that fit my needs for this project, so I decided to build it myself from scratch using several sources via web scraping and API. By the end of this project, I will have a fully functioning interactive map on one or more platforms. You can explore the map to adjust variables and set ranges you feel comfortable with, resulting in the highlighted states that are the best fit for you.
Here is a list of necessities that people tend to look for in a home destination that I will build my data around.
- Population: I think most people do not prefer an overcrowded area.
- Diversity: Want to be around a good mix of cultures.
- Similar Age: It is nice when families resemble you own ages.
- Jobs: You need a source of income to support your lifestyle.
- Cost of living: A level of spending that you can justify living there.
- Weather: A lot of decisions get based on if the destination has snow or not.
- Cleanliness: People want a nice clean environment including things like air and water quality.
- Schools: You have children and want them to get a good education.
- Crime Rate: You want a safe place to live.
- Political View: Want to be around people that are like minded in regards to political policy.
- Scenery: Some people want architecture, history, mountains, a large body of water, art, and some want all of these things combined.
Data
Data Sources
In order to attempt to answer some of these sought after features, I am going to use several sources and techniques to obtain the data.
API
Web Scrape
Data Collection Process
I don’t want the view of this project to be just heavy code. For that reason, I have hidden most of the code and outputs and will only show examples of the API and web scraping process. The code for the overall project can be viewed in the top right of the page in a little box labeled “Code” where it can be downloaded.
API
Get Diversity Percentages from Census API
library(tidycensus)
# calculate race percentages by state
# establish race variables
race_vars <- c(
White = "B03002_003",
Black = "B03002_004",
Native = "B03002_005",
Asian = "B03002_006",
HIPI = "B03002_007",
Hispanic = "B03002_012"
)
# retrieve census data from API
us_race <- get_acs(
geography = "state",
variables = race_vars,
summary_var = "B03002_001"
)
# build a manageable dataframe
us_race_percent <- us_race %>%
mutate(percent = 100 * (estimate / summary_est)) %>%
select(NAME, variable, percent)
# change percent data type to integer
us_race_percent$percent <- as.integer(us_race_percent$percent)
# view table
us_race_percent
This data breaks race down into 6 categories along with respective percentages. “HIPI” race is for Hawaii Islander, Pacific Islander, and Native populations. Here is a visual for each state. 
As you can see, diversity is kind of a tricky thing to put a number on as a whole. We will try to make a column where diversity is as representative of the data as possible. We will have to consider the number of races present and the percentages for race in our consideration. I will attempt this by creating a score, which is the number of races present + proportions of race based on a points system. The higher the number, the higher the diversity in that state.
library(dplyr)
# eliminate all rows that have 0 percent
diversity <- us_race_percent[us_race_percent$percent > 0, ]
# count the number of categories that have a percentage more than 0 for each state.
cnt <- diversity %>% group_by(NAME) %>%
tally()
# create point system where higher percentages earn less points
diversity$points <- with(diversity, ifelse(percent >= 90 & percent < 100, 1, ifelse(percent > 80 & percent < 90, 2,ifelse(percent > 70 & percent < 80,3, ifelse(percent > 60 & percent < 70 , 4,ifelse(percent > 50 & percent < 60 , 5,ifelse(percent > 40 & percent < 50 , 6,ifelse(percent > 30 & percent < 40 , 7, ifelse(percent > 20 & percent < 30 , 8,ifelse(percent > 10 & percent < 20 , 9, 10))))))))))
# get sum of points; Higher points means bigger proportions of diversity.
diverse <- diversity %>%
group_by(NAME) %>%
summarise(points = sum(points))
# create score combining number of races + proportion score
diverse$diversity <- cnt$n + diverse$points
diverse <- diverse[,c(1,3)]
Here is how the new established scoring system for diversity looks.
Web Scrape
This is an example of the web scraping process to establish a scenery rank based system.
# libraries
library(tidyverse)
library(stringr)
library(xml2)
# source
url <- "https://www.thrillist.com/travel/nation/most-beautiful-states-in-america"
# scrape the data
scenery <- url %>%
read_html() %>%
html_nodes("h2")
# convert xml to data frame
vals <- data.frame(trimws(xml_text(scenery)))
# split the data into two columns of rank and state
df11 <- str_split_fixed(vals$trimws.xml_text.scenery.., " ", 2)
df11 <- data.frame(df11)
# rename
colnames(df11) <- c("scenery", "NAME")
# convert scenery to number and remove periods
df11$scenery <- as.numeric(gsub("\\.","",as.character(df11$scenery)))
# change order
df11 <- df11[,c(2,1)]
df11
Combine and Examine All Data
The package tidycensus contains pretty clean data already and there was little to no cleaning to be done for that data. As far as the web scraping data goes, we can examine the data more.

Explanation of missing data:
Some sources did not contain observations for Puerto Rico and Washington DC. The average snow had one extra Null observation and that was due to Hawaii being the only state having zero inches of snow. This is fixed later in the code.
Data Table
Below is a view of the data table. Given the compact size of the dataset, having some one glance context is helpful.
Analysis
To create a tool for analyzing the data, I imported the data into Tableau and built the dashboard there. An example of the tool and the results for the ranges I picked are demonstrated in the image below.

After narrowing down the ranges of the variables I felt best fit my needs, the states remaining were Colorado and Virginia. To be more specific, I limited my variables to democratic, small to medium sized populations, a higher average of income, a lower cost of living, a smaller size of labor force, lower median age, medium to low crime rate, and a high scenery ranking. All other variables were pretty much full range and did not matter to me.
Oddly enough, these are the two states I have traveled to recently at the end of 2021, checking out most of Colorado state-wide and Williamsburg, Virginia with the intention of possibly moving there. Both of these areas were beautiful and it is hard to pick one over the other. This tool has helped reaffirm my considerations for moving to these areas.
If you want to read more on this subject, here is an excellent and relevant article. [Article]
