USA is a country of migrants. Immigration is a hot political topic. Being an immigrant myself, I can relate to the topic on personal level. However, there is another side to the migration history of USA - it is a domestic interstate migration, which started with colonial times as first settlers moved westward. And this migration continues to this day.
There are many reasons for people moving from one state to another. I would like to explore the state of USA migration/Geographic mobility. And yes, I do have personal interest in this topic myself as well. I could be one of the these domestic migrants one day. Where do other Americans move? Why? How many? For me, these are very interesting questions, which I hope will be answered by this visualization.
At the end of this project, I expect to come with:
Country level summary of migration
For each state, migration summary
I would like to answer the flowing questions:
Variance by state, the percentage of people who choose to move vs who preferred to reside in the state of their birth
Effect of migration on state population
Which states drive migration and which states gain/lose from migration
These are just some of the questions that should be answered.
USA experienced a few internal mass migrations(Wikipedia 2019):
The reasons for migration are multiple. Some of the major ones are(Wikipedia 2018):
The data source, I am using for the visualization, is USA Census, American Community Survey, 2017 (the latest available):
https://www.census.gov/topics/population/migration/data/tables/acs.html
| State | Population | Alabama | Alaska | Arizona | Arkansas | California | Colorado | Connecticut | Delaware | District of Columbia | Florida | Georgia | Hawaii | Idaho | Illinois | Indiana | Iowa | Kansas | Kentucky | Louisiana | Maine | Maryland | Massachusetts | Michigan | Minnesota | Mississippi | Missouri | Montana | Nebraska | Nevada | New Hampshire | New Jersey | New Mexico | New York | North Carolina | North Dakota | Ohio | Oklahoma | Oregon | Pennsylvania | Rhode Island | South Carolina | South Dakota | Tennessee | Texas | Utah | Vermont | Virginia | Washington | West Virginia | Wisconsin | Wyoming | Puerto Rico | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | United States2 | 325719178 | 4819291 | 636317 | 3853001 | 2830172 | 29269378 | 3651605 | 3195105 | 733755 | 1434102 | 10174369 | 7679965 | 1286137 | 1356234 | 13459855 | 6568485 | 3628171 | 2979707 | 4468613 | 5240805 | 1324951 | 4428255 | 6509537 | 10636568 | 5249525 | 3350350 | 5925790 | 971481 | 2080281 | 1213229 | 964928 | 7713941 | 1846115 | 20501673 | 7668068 | 976036 | 12414993 | 3619114 | 2856761 | 13428323 | 1052785 | 3889643 | 1033941 | 5625247 | 20571420 | 2638445 | 563014 | 6251937 | 4907375 | 2312726 | 5690495 | 544576 | 1795234 |
| 3 | Alabama | 4874747 | 3407382 | 5430 | 9000 | 16533 | 53374 | 10149 | 9236 | 3173 | 4464 | 132877 | 175851 | 5007 | 960 | 57341 | 27688 | 9126 | 8722 | 28081 | 45417 | 4392 | 14181 | 12296 | 46059 | 7437 | 97255 | 22303 | 1576 | 4435 | 2500 | 1554 | 17081 | 5423 | 42854 | 26666 | 2166 | 48065 | 14023 | 3331 | 35681 | 1653 | 20761 | 3361 | 84499 | 69069 | 3291 | 688 | 25019 | 10287 | 8525 | 12473 | 1305 | 6313 |
| 4 | Alaska | 739795 | 3991 | 314056 | 6559 | 1627 | 43742 | 10323 | 1537 | 474 | 1148 | 7587 | 5034 | 3811 | 9818 | 10613 | 5129 | 4980 | 4704 | 2546 | 4587 | 2363 | 4230 | 4222 | 12650 | 12929 | 1937 | 5060 | 8818 | 3575 | 2726 | 1057 | 5739 | 2847 | 12572 | 6461 | 3112 | 9564 | 4469 | 18024 | 10005 | 744 | 2338 | 2793 | 5438 | 18463 | 5294 | 898 | 5049 | 31539 | 1176 | 11258 | 2463 | 2522 |
| 5 | Arizona | 7016270 | 14021 | 12983 | 2784137 | 20233 | 652807 | 87055 | 26940 | 4003 | 9211 | 39205 | 22879 | 16074 | 31030 | 257495 | 79342 | 83347 | 49881 | 19228 | 19911 | 14839 | 23390 | 48111 | 159073 | 91090 | 12312 | 58189 | 24368 | 41120 | 28346 | 6472 | 65519 | 76508 | 192370 | 23575 | 27504 | 142308 | 38762 | 51188 | 100980 | 9005 | 10601 | 24286 | 20483 | 146511 | 76697 | 4672 | 29153 | 88250 | 12397 | 92486 | 18785 | 11247 |
| 6 | Arkansas | 3004279 | 15983 | 4409 | 14529 | 1821927 | 107432 | 12931 | 2190 | 1715 | 2020 | 20306 | 18159 | 2644 | 2056 | 64821 | 20731 | 17580 | 32122 | 8799 | 56318 | 2132 | 6010 | 7257 | 29916 | 12589 | 37875 | 82749 | 1586 | 8324 | 2159 | 988 | 6919 | 7594 | 16899 | 9996 | 1835 | 18336 | 63845 | 7480 | 15281 | 4298 | 4580 | 3599 | 61770 | 158075 | 3709 | 359 | 8283 | 9683 | 2594 | 13472 | 2678 | 2666 |
| 7 | California | 39536653 | 69927 | 28487 | 198430 | 89517 | 21966372 | 166373 | 92521 | 15407 | 46434 | 153382 | 86968 | 130806 | 59304 | 448980 | 130370 | 111311 | 94108 | 45968 | 131506 | 33968 | 85647 | 185171 | 275751 | 135575 | 66220 | 143140 | 39627 | 72229 | 99018 | 20047 | 194887 | 81754 | 611925 | 80326 | 36512 | 266201 | 123251 | 163346 | 281889 | 28960 | 35468 | 37645 | 70068 | 468340 | 93290 | 10063 | 113554 | 236877 | 27837 | 123740 | 23090 | 38761 |
ggplot(data = subset(slope,Region=='Northeast'), aes(x = Status, y = Population, group = State2)) +
geom_line(aes(color = State2, alpha = 1), size = 2) +
geom_point(aes(color = State2), size = 4) +
geom_text_repel(data = subset(slope,Region=='Northeast') %>% filter(Status == "Actual"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = 1.5,
fontface = "plain",
size = 4) +
geom_text_repel(data = subset(slope,Region=='Northeast')%>% filter(Status == "Born-in"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = -.35,
fontface = "plain",
size = 4) +
scale_x_discrete(position = "top") +
theme_bw() +
# Format tweaks
# Remove the legend
theme(legend.position = "none") +
theme(panel.border = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.text.y = element_blank()) +
theme(panel.grid.major.y = element_blank()) +
theme(panel.grid.minor.y = element_blank()) +
# Remove a few things from the x axis and increase font size
theme(axis.title.x = element_blank()) +
theme(panel.grid.major.x = element_blank()) +
theme(axis.text.x.top = element_text(size=12)) +
# Remove x & y tick marks
theme(axis.ticks = element_blank()) +
ggtitle("2017 Northeast State Pop, Actual vs Born-in, on Log scale")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5))
ggplot(data = subset(slope,Region=='West'), aes(x = Status, y = Population, group = State2)) +
geom_line(aes(color = State2, alpha = 1), size = 2) +
geom_point(aes(color = State2), size = 4) +
geom_text_repel(data = subset(slope,Region=='West') %>% filter(Status == "Actual"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = 1.5,
fontface = "plain",
size = 4) +
geom_text_repel(data = subset(slope,Region=='West')%>% filter(Status == "Born-in"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = -.35,
fontface = "plain",
size = 4) +
scale_x_discrete(position = "top") +
theme_bw() +
# Format tweaks
# Remove the legend
theme(legend.position = "none") +
theme(panel.border = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.text.y = element_blank()) +
theme(panel.grid.major.y = element_blank()) +
theme(panel.grid.minor.y = element_blank()) +
# Remove a few things from the x axis and increase font size
theme(axis.title.x = element_blank()) +
theme(panel.grid.major.x = element_blank()) +
theme(axis.text.x.top = element_text(size=14)) +
# Remove x & y tick marks
theme(axis.ticks = element_blank()) +
ggtitle("2017 West State Pop, Current vs Born-in, on Log scale")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5))
ggplot(data = subset(slope,Region=='South'), aes(x = Status, y = Population, group = State2)) +
geom_line(aes(color = State2, alpha = 1), size = 2) +
geom_point(aes(color = State2), size = 4) +
geom_text_repel(data = subset(slope,Region=='South') %>% filter(Status == "Actual"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = 1.5,
fontface = "plain",
size = 4) +
geom_text_repel(data = subset(slope,Region=='South')%>% filter(Status == "Born-in"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = -.35,
fontface = "plain",
size = 4) +
scale_x_discrete(position = "top") +
theme_bw() +
# Format tweaks
# Remove the legend
theme(legend.position = "none") +
theme(panel.border = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.text.y = element_blank()) +
theme(panel.grid.major.y = element_blank()) +
theme(panel.grid.minor.y = element_blank()) +
# Remove a few things from the x axis and increase font size
theme(axis.title.x = element_blank()) +
theme(panel.grid.major.x = element_blank()) +
theme(axis.text.x.top = element_text(size=12)) +
# Remove x & y tick marks
theme(axis.ticks = element_blank()) +
ggtitle("2017 South State Pop, Current vs Born-in, on Log scale")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5))
ggplot(data = subset(slope,Region=='North Central'), aes(x = Status, y = Population, group = State2)) +
geom_line(aes(color = State2, alpha = 1), size = 2) +
geom_point(aes(color = State2), size = 4) +
geom_text_repel(data = subset(slope,Region=='North Central') %>% filter(Status == "Actual"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = 1.5,
fontface = "plain",
size = 4) +
geom_text_repel(data = subset(slope,Region=='North Central')%>% filter(Status == "Born-in"),
aes(label = paste0(State2," - ", round(exp(Population)/1000000,digits=2)," mln")) ,
hjust = -.35,
fontface = "plain",
size = 4) +
scale_x_discrete(position = "top") +
theme_bw() +
# Format tweaks
# Remove the legend
theme(legend.position = "none") +
theme(panel.border = element_blank()) +
theme(axis.title.y = element_blank()) +
theme(axis.text.y = element_blank()) +
theme(panel.grid.major.y = element_blank()) +
theme(panel.grid.minor.y = element_blank()) +
# Remove a few things from the x axis and increase font size
theme(axis.title.x = element_blank()) +
theme(panel.grid.major.x = element_blank()) +
theme(axis.text.x.top = element_text(size=12)) +
# Remove x & y tick marks
theme(axis.ticks = element_blank()) +
ggtitle("2017 North Central State Pop, Current vs Born-in, on Log scale")+
theme(plot.title = element_text(size = 12, face = "bold",hjust = 0.5))
Observations:
Net migration is not a major factor for Northeast. New Hampshire was the only state to gain considerable population due to migration. And DC (might not be technically Northaast) lost significant population.
Most Western states gained population due to migration. Nevada gained the most, due to a low starting point.
South has somewhat mixed picture. Big gainers were Florida, Texas, Georgia and North Carolina. A couple of states lost population due to migration - West Virginia, Missisipi, and Lousina.
North Central lost population accross most states. Even oil rich North Dakota lost population due to migration.
borndf$State<-gsub("^\\s+|\\s+$", "", borndf$State)
actual$State<-gsub("^\\s+|\\s+$", "", actual$State)
#borndf$State
#actual$State
colnames(borndf)<-c("State","BPop","Status1")
scat<-merge(borndf,actual,x.by="State",y.by="State")
#scat
scat<-scat[-40,]
scat$Population<-log(scat$Population)
scat$BPop<-log(scat$BPop)
scat$Region<-state.region[match(scat$State,state.name)]
scat$Region[scat$State=='District of Columbia']<-'Northeast'
# Scatterplot
gg1 <- ggplot(scat, aes(x=Population, y=BPop)) +
geom_point(aes(col=Region, size=Population)) +
geom_smooth(method="loess", se=F) +
xlim(c(12, 18)) +
ylim(c(12, 18)) +
labs(subtitle=str_wrap("Pop by Residence vs Pop by Birth",width=35),
y="Pop by Birth, log",
x="Pop by Residence, log",
title="Scatterplot",
caption = "Source: US Census")+ geom_abline(slope = 1)
plot(gg1)
# Scatterplot
gg1 <- ggplot(scat, aes(x=Population, y=BPop)) +
geom_point(aes(color=Region)) +
geom_smooth(method="loess", se=F) +
xlim(c(12, 18)) +
ylim(c(12, 18)) +
labs(subtitle=str_wrap("Pop by Residence vs Pop by Birth",width=35),
y="Pop by Birth, log",
x="Pop by Residence, log",
title="Scatterplot",
caption = "Source: US Census")+facet_wrap(~Region)+ geom_abline(slope = 1)
plot(gg1)
Western States are mostly below 45 degree line, confirming that they are gaining population because of migration. And most Northeastern and North Central States are above the line again confirming that they are losing population due to migration.
g <- ggplot(mdf, aes(x=reorder(State, -gl),y=gl))
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of USA States population gain/loss due to migration, as of 2017") +
xlab("States") + ylab("Population Gain/Loss, in mlns, as of 2017")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
mdf$gl<-mdf$gl*1000000/mdf$Population
theme_set(theme_bw())
# Data Prep
mdf$z <- round((mdf$gl - mean(mdf$gl))/sd(mdf$gl), 2)
# create new column for car names
#mdf
mdf$mpg_type <- ifelse(mdf$z< 0, "Lost", "Gained") # above / below avg flag
mdf <- mdf[order(mdf$z), ] # sort
mdf$State <- factor(mdf$State, levels = mdf$State) # convert to factor to retain sorted order in plot.
# Diverging Barcharts
ggplot(mdf, aes(x=State, y=z, label=z)) +
geom_bar(stat='identity', aes(fill=mpg_type), width=0.5) +
scale_fill_manual(name="Migration Pop Gains, Adjusted for Pop Size",
labels = c("Above Average", "Below Average"),
values = c("Lost"="#00ba38", "Gained"="#f8766d")) +
labs(subtitle="Normalised Population Gains due to Migration, Adjusted for Pop Size'",
title= "Diverging Bars") +
coord_flip()
Nevada, Florida, and Arizona gained the most net migrants, after adjustment for population size. DC, North Dakota, and West Virginia were the biggest losers. Surprisingly New York lost population due to migration, with internal out migrants offseting gains from immigrants outside of USA. North Dakota, which is experiencing oil boom, unexpectingly lost population due to migration.
ggplot(data=mdf)+geom_density(aes(x=gl), fill="grey50")+ ggtitle("Density Plot of States' Migration Pop Gain/Loss, Adjusted for Population Size") + xlab("Migration Gain/Loss, millions")
Migration gain/loss density function is heavily left-tailed. DC is skewing the results. There are a couple of states that gained heavily from migraion, both domestic and foreign . Most states though are close to 0.
theme_set(theme_classic())
# Plot
g <- ggplot(mdf, aes(Region, gl))
g + geom_boxplot(varwidth=T, fill="plum") +
labs(title="Box plot",
subtitle="Pop Gain due to Migration Grouped by Region, Adjusted for Population Size",
caption="Source: Census",
x="Region",
y="Pop Gain due to Migration")
Another look at Migration Gain/Losses by Region. West and South are gaining. With Northeast and North Central are close to breakeven. Dc and North Dakota are outliers.
df_clean1$Stayed<-1
for (i in (1:51)){
df_clean1[i+1,ncol(df_clean1)]<-df_clean1[i+1,i+2]/df_clean1[1,i+2]
}
g <- ggplot(df_clean1[-c(1,53),], aes(x=reorder(State, -Stayed),y=Stayed))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of USA States by % of people who reside in a State of their Birth, as of 2017") +
xlab("States") + ylab("% of people who reside in a State of their Birth")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
There must be something special about Texas - its residence do not want to leave the state. And DC apperantly sucks, residence run away from it.
vp<-df_clean1[-c(1,53),]
theme_set(theme_bw())
vp$Region<-state.region[match(vp$State,state.name)]
vp$Region[vp$State=='District of Columbia ']<-'Northeast'
# plot
g <- ggplot(vp, aes(Region, Stayed))
g + geom_violin() +
labs(title="Violin plot",
subtitle="% of People Born and Residing in the same State by Region",
caption="Source: Census",
x="Region",
y="% of People Born and Residing in the same State")
South is where peope are less mobile. All other regions are roughly equally mobile. Northeast gets a peculiar figuration because of DC.
df_clean1$Bout<-1
for (i in (1:51)){
df_clean1[i+1,56]<-(df_clean[i+1,2]-df_clean1[i+1,i+2])/df_clean1[i+1,2]
}
g <- ggplot(df_clean1[-c(1,53),], aes(x=reorder(State, -Bout),y=Bout))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of % of State Residents Born outside of the State, As of 2017") +
xlab("States") + ylab("% of people who were born out of state")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
Nevada is a true state of new migrants. Over 70% were born out of the state. Lousina has very few new comers. Only ~20% were born out of the state.
df_clean1$Bos<-1
for (i in (1:51)){
df_clean1[i+1,57]<-(rowSums(df_clean1[i+1,3:54])-df_clean1[i+1,i+2])/df_clean1[i+1,2]
}
g <- ggplot(df_clean1[-c(1,53),], aes(x=reorder(State, -Bos),y=Bos))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of % of State Residents Born in other US State, As of 2017") +
xlab("States") + ylab("% of people who were born in other US state")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
Wyoming has the most domestic migrants. Even more than Nevada. It appears that Natives abominate New York; therefore, such low numbers appear.
df_bir<-df_clean1[,-c(2,54:60)]
df_bir1<-df_bir%>%gather("StateFrom","Value",-c(State))
df_bir1$Region<-state.region[match(df_bir1$State,state.name)]
df_bir1$Region[df_bir1$State=='District of Columbia ']<-'Northeast'
df_bir1$RegionF<-state.region[match(df_bir1$StateFrom,state.name)]
df_bir1$RegionF[df_bir1$StateFrom=='District of Columbia ']<-'Northeast'
library(dplyr)
library(plyr)
df_bir2<-df_bir1%>%filter(df_bir1$Region!=df_bir1$RegionF)%>%dplyr::select(State,Value)
cdata <- ddply(df_bir2, c("State"), summarise,
N = sum(Value))
df_bir3<-merge(cdata, df_clean1, by.x="State", by.y="State")
df_bir3$dr<-df_bir3$N/df_bir3$Population
g <- ggplot(df_bir3, aes(x=reorder(State, -dr),y=dr))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of % of State Residents that came from another US Region, As of 2017") +
xlab("States") + ylab("% of residents who came from different US Region")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
US has distinct subcultures that can be traced to different regions. So, to move from one region to another a person might have to adapt to local specifics. Also, it is move expensive to move father away, which is more likely when relocating to a different region.
Colorado attracts many migrants from other US regions, indicating that the state has a lot to offer. DC being a major political center also attracts migrants from other regions. Delaware is a surprise to me. I am not sure why it has so many migrants from other regions.
On another side of the spectrium, there is New Jersey, Louisiana, and New York, which do not seem to be an attractive distination for major relocation.
g <- ggplot(df_bir3, aes(x=reorder(StateFrom, -dr),y=dr))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of % Born In State Residents who has relocated to different US Region") +
xlab("States") + ylab("% Born In State Resid's who has moved to dif Reg")+ theme(axis.text=element_text(size=10), axis.text.x = element_text(angle = 90, hjust = 1),axis.title.y = element_text(size = rel(0.8), angle = 90))
DC seems to be an outlier. Surprisingly, North Dakota with booming economy has almost 1/3 of residents who left the state for another Region. New York is very high on the list too.
Southern states such as Georgia, North Carolina, and Louisina show very little mobility.
df_clean1$Boos<-1
for (i in (1:51)){
df_clean1[i+1,58]<-(df_clean1[i+1,2]-rowSums(df_clean1[i+1,3:54]))/df_clean1[i+1,2]
}
g <- ggplot(df_clean1[-c(1,53),], aes(x=reorder(State, -Boos),y=Boos))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of % of Foreign Born Residents by US States, As of 2017") +
xlab("States") + ylab("% of Foreign Born Residents")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
California, New York, and New Jersey have the most foreign immigrants, while West Virginia has practically none.
#df_clean1
#cor(df_clean1[-c(1,53),]$Bos,df_clean1[-c(1,53),]$Boos)
df_clean1$Region<-state.region[match(df_clean1$State,state.name)]
df_clean1$Region[df_clean1$State=='District of Columbia ']<-'Northeast'
# Scatterplot
cor<-df_clean1[-c(1,53),]
#cor
gg1 <- ggplot(cor, aes(x=Bos, y=Boos)) +
geom_point(aes(col=Region, size=Population)) +
geom_smooth(method="loess", se=F) +
xlim(c(0, 0.6)) +
ylim(c(0, 0.4)) +
labs(subtitle=str_wrap("% of Domestic Migrants vs % of Foreign Born Migrants",width=35),
y="% of Foreign Born Migrants",
x="% of Domestic Migrants",
title="Scatterplot",
caption = "Source: US Census")
plot(gg1)
It is a surprise to me, but there is no correlation between internal and external migrations. It appears that they are driven by different forces. One thing that we can notice from the plot that foreign migrants prefer states with higher population (bigger sized dots), while internal migrants prefer states with smaller population (smaller sized dots).
df_clean1$Btm<-1
for (i in (1:51)){
df_clean1[i+1,ncol(df_clean1)]<-((df_clean1[1,i+2]-df_clean1[i+1,i+2])+(df_clean[i+1,2]-df_clean1[i+1,i+2]))/df_clean[1,i+2]
}
g <- ggplot(df_clean1[-c(1,53),], aes(x=reorder(State, -Btm),y=Btm))
# Number of cars in each class:
g + geom_bar(stat = "identity",fill = 'Dark blue')+ggtitle("Plot of % of Total Migrants (immigrants & emiigrants) by US States, As of 2017") +
xlab("States") + ylab("% of people who moved in out of State")+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
Neveda sees a lot of mobility - people just keep moving in and out. While Lousina sees the least.
Just to summarize what we have already seen - USA is truly diverse country, even when it comes to migration patterns.
df_clean1$statef<-''
#df_clean1
for (i in (1:51)){
df_clean1[i+1,ncol(df_clean1)]<-tolower(colnames(df_clean1[,-c(i+2,2)])[which.max(df_clean1[i+1,-c(i+2,2)])])
}
df_clean1m<-df_clean1[-c(1,3,13,53),]
df_clean1m$State<-tolower(df_clean1m$State)
mystate<-as.character(df_clean1m$State)
mystate<-replace(mystate, mystate=='massachusetts', 'massachusetts:main')
mystate<-replace(mystate, mystate=='michigan', 'michigan:south')
mystate<-replace(mystate, mystate=='new york', 'new york:main')
mystate<-replace(mystate, mystate=='north carolina', 'north carolina:main')
mystate<-replace(mystate, mystate=='virginia', 'virginia:main')
mystate<-replace(mystate, mystate=='washington', 'washington:main')
mystate<-replace(mystate, mystate=='district of columbia ', 'district of columbia')
mylabel<-as.character(df_clean1m$statef)
#df_clean1m$statef
#mystate
#mylabel
library("maps")
map.text("state", regions=mystate, labels=mylabel)
title(main = "For Each State: Where Do Domestic Migrants Come From?")
We can see that New Yorkers moved to the both coasts. They predominate in California, Florida, North Carolina, and Virginia. Californianas moved to Texas, Nevada (now we know where Nevada gets all these migrants), Arizona, and Utah. Illinosians are moving to Wisconsin, Iowa, Missouri, and Tennessee.
Interestingly, that Wyoming - land of internal migration - gets most people from …Colorado. Colorado is a beautiful place, why would you want to move to somewhere else!
Lousina, that gets very few migrants, gets most from Texas. New York, which is apperantly repelent to internal migrants, gets most of them from New Jersey. I guess New Jerseians are the only Americans that can stomach New York.
New Hemshire which is the most attractive state to migrants on East Coast gets most of them from New York.
North Dakota which has had oil boom recently gets the most migrants from Minnesota.
Disitrict of Columbia that loses a lot of population is migrating to Maryland.
op <- par(mar = c(9,4,4,2) + 0.1)
barplot(table(mylabel),main="States Contributing most Migrants to Other States",ylab="Freqency",las=2,cex.axis=0.8, cex.names=0.8)
par(op)
California and New York are top contributors to the most states.
df_clean2<-df_clean1[,-c(2,4,14,54:70)]
df_clean2<-df_clean2[-c(1,3,13,53),]
df_clean2$st<-''
for (i in (1:49)){
df_clean2[i,51]<-tolower(df_clean2[-i,1][which.max(df_clean2[-i,i+1])])
}
df_clean2$State<-tolower(df_clean2$State)
mystate<-as.character(df_clean2$State)
mystate<-replace(mystate, mystate=='massachusetts', 'massachusetts:main')
mystate<-replace(mystate, mystate=='michigan', 'michigan:south')
mystate<-replace(mystate, mystate=='new york', 'new york:main')
mystate<-replace(mystate, mystate=='north carolina', 'north carolina:main')
mystate<-replace(mystate, mystate=='virginia', 'virginia:main')
mystate<-replace(mystate, mystate=='washington', 'washington:main')
mystate<-replace(mystate, mystate=='district of columbia ', 'district of columbia')
mylabel<-as.character(df_clean2$st)
library("maps")
# map of four states
map.text("state", regions=mystate, labels=mylabel)
title(main = "For Each State: Where Do Domestic Migrants Move To?")
Florida is a top destination for a bunch of states: NY, PA, MA, NJ, and IL. California is a top destination for Nevada, Arizona, Washington, and Utah. Texas is for California, New Mexico, Oklahoma, and Louisiana. Surprisingly, Nevada and Wyoming is not a top destination for any of the states. While New York is a top destination for Vermont.
op <- par(mar = c(8,4,4,2) + 0.1)
barplot(table(mylabel),main="Which States are Top Destinations for Other States?",ylab="Freqency",las=2)
par(op)
Florida, famous for being a retirement destination, tops as preferred location by most states.
#install.packages("treemapify")
#install.packages("ggplotify")
library(treemapify)
#df_clean1
df_cleant<-df_clean1[-c(1,53),]
df_cleant<-df_cleant[,-c(2,54:ncol(df_cleant))]
for (i in (1:50)){
df_cleant[i,i+1]<-0
}
#df_cleant
df_cleant1<-df_cleant%>%gather("StateFrom","Value",-State)
df_cleant1$Region<-state.region[match(df_cleant1$StateFrom,state.name)]
df_cleant1$Region[df_cleant1$StateFrom=='District of Columbia ']<-'Northeast'
States<-as.character(df_cleant1$State)
df_cleant1<-df_cleant1%>%arrange(State,Region,Value)
# should be 51!!!
for (i in (1:1)){
mydata<-df_cleant1%>%filter(State==States[i])
myplot<-ggplot(mydata, aes(area = Value, fill = Region, label = StateFrom,
subgroup = Region)) +
geom_treemap() +
geom_treemap_subgroup_border() +
geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour =
"black", fontface = "italic", min.size = 0) +
geom_treemap_text(colour = "white", place = "topleft", reflow = T)+
labs(
title = paste0(mydata$State," migrant population by state of birth"),
caption = "The area of each tile represents how many internal migrants were born in a state",
fill = "Region")
print(myplot)
#myplot
}
Now we have a lot of details on internal migration preferences. If we look at New York, we see that migrants to New York are very diverse. Most come from Northeast, such as NJ, PA, MA, CT. But FL, NC, TX, CA, and OH contribute as well.
#install.packages("treemapify")
#install.packages("ggplotify")
#library(treemapify)
df_cleant<-df_clean1[-c(1,53),]
df_cleant<-df_cleant[,-c(2,54:ncol(df_cleant))]
for (i in (1:51)){
df_cleant[i,i+1]<-0
}
df_cleant1<-df_cleant%>%gather("StateFrom","Value",-State)
df_cleant1$Region<-state.region[match(df_cleant1$State,state.name)]
df_cleant1$Region[df_cleant1$State=='District of Columbia ']<-'Northeast'
States<-as.character(df_cleant$State)
df_cleant1<-df_cleant1%>%arrange(StateFrom,Region,Value)
for (i in (1:1)){
mydata<-df_cleant1%>%filter(StateFrom==States[i])
myplot<-ggplot(mydata, aes(area = Value, fill = Region, label = State,
subgroup = Region)) +
geom_treemap() +
geom_treemap_subgroup_border() +
geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour =
"black", fontface = "italic", min.size = 0) +
geom_treemap_text(colour = "white", place = "topleft", reflow = T)+
labs(
title = paste0(mydata$StateFrom," migrant population by state of destination"),
caption = "The area of each tile represents how many internal migrants moved to a state",
fill = "Region")
print(myplot)
#myplot
}
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States_by_population_density"
temp<-URL%>%read_html%>%html_nodes("table")
Dens<-html_table(temp[[1]],header = NA,fill=TRUE)
Dens<-Dens[,c(1,4,7,9)]
names(Dens) <- c("State", "Density","Population","Area")
Dens$Area[Dens$State=='California']<-substr(Dens$Area,1,7)
Dens[, 3:4] <- sapply(Dens[, 3:4], as.character)
Dens[, 3:4] <- sapply(Dens[, 3:4], function(x) as.numeric(gsub(",", "",x)))
States<-df_clean1[,c(1,55:59)]
States<-States[-53,]
Dens1<-merge(Dens, States, by.x="State", by.y="State", sort = TRUE)
options(scipen=999) # turn-off scientific notation like 1e+48
theme_set(theme_bw()) # pre-set the bw theme.
Dens1$Region<-state.region[match(Dens1$State,state.name)]
Dens1$Region[Dens1$State=='District of Columbia ']<-'Northeast'
# Scatterplot
gg1 <- ggplot(Dens1, aes(x=log(Population), y=Stayed)) +
geom_point(aes(col=Region, size=Density)) +
geom_smooth(method="loess", se=F) +
xlim(c(12, 18)) +
ylim(c(0.3, 0.9)) +
labs(subtitle=str_wrap("Pop vs % of Pop who Resides in State where they were Born",width=35),
y="% Born and Residing in the same state",
x="Log(Population)",
title="Scatterplot",
caption = "Source: US Census")
plot(gg1)
We have strong correlation between state’s population size and % of people born there who choose to reside there as well. It is possible that more populous, urban states provide more opportunities which makes them attractive place to reside in.
States<-df_clean1[,c(1,55:ncol(df_clean1))]
States<-States[-c(1,53),]
States$pe<-'T'
States$pe[c(5:9,12,14,20,21,22,24,29,30,31,32,33,38,40,46,47,48)]<-'C'
g <- ggplot(States, aes(pe, Stayed))
g + geom_violin() +
labs(title="Violin plot",
subtitle="% of People Born and Residing in the same State by 2016 Presidential Election Voting",
caption="Source: Census",
x="States voted for Trump vs Clinton",
y="% of People Born and Residing in the same State")
g1 <- ggplot(States, aes(pe, Boos))
g1 + geom_violin() +
labs(title="Violin plot",
subtitle="States by % Foreign Born People vs 2016 Presidential Elections",
caption="Source: Census",
x="States voted for Trump vs Clinton",
y="% of Foreign Born Residents")
States that voted for Trump on average are less mobile.
Trump’s states have significantly less foreign immigrants.
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/List_of_geographic_centers_of_the_United_States"
temp<-URL%>%read_html%>%html_nodes("table")
tax<-html_table(temp[[2]],header = NA,fill=TRUE)
tax<-tax[,c(1,3)]
names(tax) <- c("State", "Coor")
form<-function(x){as.numeric(gsub("([0-9]+).*$", "\\1",x))}
tax$lon<-form(substr(tax$Coor,1,3))+ifelse(form(substr(tax$Coor,4,5))<10,form(substr(tax$Coor,4,5))/10,form(substr(tax$Coor,4,5))/100)
tax$lat<-form(substr(tax$Coor,12,14))
tax1<-tax%>%dplyr::select(State,lon,lat)
df_l<-df_clean1[,-c(2,54:ncol(df_clean1))]
df_l<-df_l[-c(1,53),]
df_l1<-df_l%>%gather("StateFrom","Value",-c(State))
df_l1$Region<-state.region[match(df_l1$State,state.name)]
df_l1$Region[df_l1$State=='District of Columbia ']<-'Northeast'
df_l1$RegionF<-state.region[match(df_l1$StateFrom,state.name)]
df_l1$RegionF[df_l1$StateFrom=='District of Columbia ']<-'Northeast'
tax2<-merge(tax1, df_l1, by.x="State", by.y="State", sort = TRUE)
colnames(tax1)<-c("State","lonf","latf")
tax3<-merge(tax2, tax1, by.x="StateFrom", by.y="State", sort = TRUE)
tax3$NS<-ifelse((tax3$lon-tax3$lonf)>7.5,'1N',ifelse((tax3$lon-tax3$lonf)>0,'2N',
ifelse((tax3$lon-tax3$lonf)>-7.5,'3S','4S')))
tax3$EW<-ifelse((tax3$lat-tax3$latf)>10,'4N',ifelse((tax3$lat-tax3$latf)>0,'3N',
ifelse((tax3$lat-tax3$latf)>-10,'2S','1S')))
tax3$Value<-ifelse(tax3$State==tax3$StateFrom,0,tax3$Value)
NS<-aggregate(Value/1000000~NS,tax3,sum)
EW<-aggregate(Value/1000000~EW,tax3,sum)
#install.packages("waffle")
library(waffle)
waffle(setNames(as.vector(NS$Value),
c(
paste('North, 500+ mls (',as.character(round(NS[1,2],0)),'mln)'),
paste('North, 0-500 mls (',as.character(round(NS[2,2],0)),'mln)'),
paste('South 0-500 mls (',as.character(round(NS[3,2],0)),'mln)'),
paste('South 500+ mls (',as.character(round(NS[4,2],0)),'mln)')
)), rows=8, size=1,
title="US Migrationt: Moving North vs South",
xlab="One square == 1m ppl")
waffle(setNames(as.vector(EW$Value),c(
paste('East, 500+ mls (',as.character(round(EW[1,2],0)),'mln)'),
paste('East, 0-500 mls (',as.character(round(EW[2,2],0)),'mln)'),
paste('West 0-500 mls (',as.character(round(EW[3,2],0)),'mln)'),
paste('West 500+ mls (',as.character(round(EW[4,2],0)),'mln)')
)), rows=8, size=1,
title="US Migration: Moving East vs West",
xlab="One square == 1m ppl")
There is a clear pattern of people moving South. The top big move to South (more than 500 miles) was from New York to Florida - 1.6 million. And the top big move to North (more than 500 miles) was from California to Washington - 0.6 million. The biggest move to the West was from New York to California - 0.6 million.
When it comes East-West direction, pattern is not as clear. However, Americans who make a big move (more than 500 miles) still prefer West. The top big move to the West was from New York to California - 0.6 million. The top big move to East was from California to Texas - 0.8 million.
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_elevation"
temp<-URL%>%read_html%>%html_nodes("table")
Dens<-html_table(temp[[1]],header = NA,fill=TRUE)
Dens<-Dens[,c(1,7)]
colnames(Dens) <- c("State", "Density")
Dens[, 2] <- sapply(Dens[, 2], as.character)
library(DescTools)
#Dens[, 2] <- sapply(substr(Dens[, 2],1,5), function(x) as.numeric(gsub("f","",gsub(",","",x))))
Dens$e<-StrTrim(gsub("t","",gsub("f","",gsub(",", "",substr(Dens[,2],1,5)))))
#substr(Dens[, 2],1,5)
#Dens$e<-form(substr(Dens$Density,1,5))
#gsub(" ","",gsub("f","",gsub(",", "",substr(Dens[,2],1,5))))
#form(substr(Dens$Density,1,5))
Dens$e<-as.integer(gsub("([0-9]+).*$", "\\1", Dens$e))
States<-df_clean1[,c(1:2,55:59)]
States<-States[-c(1,53),]
Dens1<-merge(Dens, States, by.x="State", by.y="State", sort = TRUE)
options(scipen=999) # turn-off scientific notation like 1e+48
theme_set(theme_bw()) # pre-set the bw theme.
Dens1$Region<-state.region[match(Dens1$State,state.name)]
Dens1$Region[Dens1$State=='District of Columbia ']<-'Northeast'
Dens1<-Dens1%>%dplyr::select(State,e)
#cor(Dens1$Stayed,Dens1$e)
a<-merge(df_cleant1,Dens1,by.x="State",by.y="State")
colnames(Dens1)<-c("State","ef")
a<-merge(a,Dens1,by.x="StateFrom",by.y="State")
a$hl<-ifelse((a$e-a$ef)>1000,'1mh',ifelse((a$e-a$ef)>=0,'2h',ifelse((a$e-a$ef)>=-1000,'3l','4ml')))
hl<-aggregate(Value/1000000~hl,a,sum)
#install.packages("waffle")
library(waffle)
waffle(
setNames(
as.vector(hl$Value),
c(paste('Much Higher, 1000+ ft (',as.character(round(hl[1,2],0)),'mln)'),
paste('Higher, 0-1000ft (',as.character(round(hl[2,2],0)),'mln)'),
paste('Lower, 0-1000ft (',as.character(round(hl[3,2],0)),'mln)'),
paste('Much Lower, 1000+ ft (',as.character(round(hl[4,2],0)),'mln)'))), rows=8, size=1,
title="US Migrationt: Migration by Altitude",
xlab="One square == 1m ppl")
Migrating Americans do not have clear preference by altitude. There was some migration to mountanious states, such as Arizona. At the same time, low-elevation states, such as Florida attracts a lot migrants as well.
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/List_of_U.S._states_by_American_Human_Development_Index"
temp<-URL%>%read_html%>%html_nodes("table")
Dens<-html_table(temp[[1]],header = NA,fill=TRUE)
Dens<-Dens[,c(3,4)]
colnames(Dens) <- c("State", "Density")
Dens[, 2] <- sapply(Dens[, 2], as.character)
#library(DescTools)
#Dens[, 2] <- sapply(substr(Dens[, 2],1,5), function(x) as.numeric(gsub("f","",gsub(",","",x))))
Dens$Density<-as.numeric(Dens$Density)
States<-df_clean1[,c(1:2,55:59)]
States<-States[-c(1,53),]
Dens1<-merge(Dens, States, by.x="State", by.y="State", sort = TRUE)
options(scipen=999) # turn-off scientific notation like 1e+48
theme_set(theme_bw()) # pre-set the bw theme.
Dens1$Region<-state.region[match(Dens1$State,state.name)]
Dens1$Region[Dens1$State=='District of Columbia ']<-'Northeast'
Dens1<-Dens1%>%dplyr::select(State,Density)
#cor(Dens1$Stayed,Dens1$e)
a<-merge(df_cleant1,Dens1,by.x="State",by.y="State")
colnames(Dens1)<-c("State","df")
a<-merge(a,Dens1,by.x="StateFrom",by.y="State")
a$hl<-ifelse((a$Density-a$df)/a$df>0.1,'1mh',ifelse((a$Density-a$df)/a$df>=0,'2h',ifelse((a$Density-a$df)/a$df>-0.1,'3l','4ml')))
hl<-aggregate(Value/1000000~hl,a,sum)
#install.packages("waffle")
library(waffle)
waffle(setNames(as.vector(hl$Value),
c(
paste( 'Much Higher HDI, 10%+ (',as.character(round(hl[1,2],0)),'mln)' ),
paste( 'Higher HDI, 0-10% (',as.character(round(hl[2,2],0)),'mln)' ),
paste( 'Lower HDI, 0-10% (',as.character(round(hl[3,2],0)),'mln)' ),
paste( 'Much Lower HDI, 10%+ (',as.character(round(hl[4,2],0)),'mln)' )
)
),rows=8, size=1,
title="US Migrationt: Migration by American HDI (Hum Dev Index)",
xlab="One square == 1m ppl"
)
Surprisingly, but Americans do not migrate based HDI score of the destination state. One of the examples, migration from New York to Florida.
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_income"
temp<-URL%>%read_html%>%html_nodes("table")
Dens<-html_table(temp[[3]],header = NA,fill=TRUE)
Dens<-Dens[,c(2,3)]
colnames(Dens) <- c("State", "Density")
Dens[, 2] <- sapply(Dens[, 2], as.character)
#library(DescTools)
#Dens[, 2] <- sapply(substr(Dens[, 2],1,5), function(x) as.numeric(gsub("f","",gsub(",","",x))))
Dens$Density<-as.numeric(gsub('[$,]', '', Dens$Density))
States<-df_clean1[,c(1:2,55:59)]
States<-States[-c(1,53),]
Dens1<-merge(Dens, States, by.x="State", by.y="State", sort = TRUE)
options(scipen=999) # turn-off scientific notation like 1e+48
theme_set(theme_bw()) # pre-set the bw theme.
Dens1$Region<-state.region[match(Dens1$State,state.name)]
Dens1$Region[Dens1$State=='District of Columbia ']<-'Northeast'
Dens1<-Dens1%>%dplyr::select(State,Density)
#cor(Dens1$Stayed,Dens1$e)
a<-merge(df_cleant1,Dens1,by.x="State",by.y="State")
colnames(Dens1)<-c("State","df")
a<-merge(a,Dens1,by.x="StateFrom",by.y="State")
a$hl<-ifelse((a$Density-a$df)/a$df>0.15,'1mh',ifelse((a$Density-a$df)/a$df>=0,'2h',ifelse((a$Density-a$df)/a$df>-0.15,'3l','4ml')))
hl<-aggregate(Value/1000000~hl,a,sum)
#install.packages("waffle")
library(waffle)
waffle(setNames(as.vector(hl$Value),
c(
paste( 'Much Higher Med House Income, 15%+ (',as.character(round(hl[1,2],0)),'mln)' ),
paste( 'Higher MHI, 0-15% (',as.character(round(hl[2,2],0)),'mln)' ),
paste( 'Lower MHI, 0-15% (',as.character(round(hl[3,2],0)),'mln)' ),
paste( 'Much Lower MHI, 15%+ (',as.character(round(hl[4,2],0)),'mln)' )
)
),rows=8, size=1,
title="US Migrationt: Migration based on Median Household Income",
xlab="One square == 1m ppl"
)
Median Household Income is not a driving force behind migration. Possible spliting migration into retirees vs no-retirees will make us to come to a different conclusion. The top group going to the state with much lower MHI is New Yorkers going to Florida - 1.6 million. And top group that going to the state with much higher MHI is New Yorkers agian going to New Jersey - 1 million.
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/List_of_U.S._state_budgets"
temp<-URL%>%read_html%>%html_nodes("table")
Dens<-html_table(temp[[1]],header = NA,fill=TRUE)
Dens<-Dens[,c(1,5)]
colnames(Dens) <- c("State", "Density")
Dens[, 2] <- sapply(Dens[, 2], as.character)
#library(DescTools)
#Dens[, 2] <- sapply(substr(Dens[, 2],1,5), function(x) as.numeric(gsub("f","",gsub(",","",x))))
Dens$Density<-as.numeric(gsub('[$,]', '', Dens$Density))
States<-df_clean1[,c(1:2,55:59)]
States<-States[-c(1,53),]
Dens1<-merge(Dens, States, by.x="State", by.y="State", sort = TRUE)
options(scipen=999) # turn-off scientific notation like 1e+48
theme_set(theme_bw()) # pre-set the bw theme.
Dens1$Region<-state.region[match(Dens1$State,state.name)]
Dens1$Region[Dens1$State=='District of Columbia ']<-'Northeast'
Dens1<-Dens1%>%dplyr::select(State,Density)
#cor(Dens1$Stayed,Dens1$e)
a<-merge(df_cleant1,Dens1,by.x="State",by.y="State")
colnames(Dens1)<-c("State","df")
a<-merge(a,Dens1,by.x="StateFrom",by.y="State")
a$hl<-ifelse((a$Density-a$df)/a$df>0.4,'1mh',ifelse((a$Density-a$df)/a$df>=0,'2h',ifelse((a$Density-a$df)/a$df>-0.4,'3l','4ml')))
hl<-aggregate(Value/1000000~hl,a,sum)
#install.packages("waffle")
library(waffle)
waffle(setNames(as.vector(hl$Value),
c(
paste( 'Much Higher State Gov per Capita Expenditure, 40%+ (',as.character(round(hl[1,2],0)),'mln)' ),
paste( 'Higher State Gov per Cap Exp, 0-40% (',as.character(round(hl[2,2],0)),'mln)' ),
paste( 'Lower State Gov per Cap Exp, 0-40% (',as.character(round(hl[3,2],0)),'mln)' ),
paste( 'Much Lower State Gov per Cap Exp, 40%+ (',as.character(round(hl[4,2],0)),'mln)' )
)
),rows=8, size=1,
title="US Migrationt: Migration vs State Gov per Capita Expenditure",
xlab="One square == 1m ppl"
)
Again, It is hard to categorically say if State Gov Expenduture influences domestic migration. I also do not feel comfartable with the reliability of underlaying data.
library(rvest)
library(dplyr)
library(knitr)
library(rcompanion)
library(MASS)
library(tidyverse)
library(caret)
URL <- "https://en.wikipedia.org/wiki/Irreligion_in_the_United_States#Demographics"
temp<-URL%>%read_html%>%html_nodes("table")
Dens<-html_table(temp[[5]],header = NA,fill=TRUE)
Dens<-Dens[,c(2,4)]
names(Dens) <- c("State", "Density")
Dens$Area[Dens$State=='California']<-substr(Dens$Area,1,7)
Dens[, 2] <- sapply(Dens[, 2], as.character)
Dens[, 2] <- sapply(Dens[, 2], function(x) as.numeric(gsub("%", "",x)))
States<-df_clean1[,c(1:2,55:59)]
States<-States[-53,]
Dens1<-merge(Dens, States, by.x="State", by.y="State", sort = TRUE)
options(scipen=999) # turn-off scientific notation like 1e+48
theme_set(theme_bw()) # pre-set the bw theme.
Dens1$Region<-state.region[match(Dens1$State,state.name)]
Dens1$Region[Dens1$State=='District of Columbia ']<-'Northeast'
# Scatterplot
gg1 <- ggplot(Dens1, aes(x=Density, y=Bout)) +
geom_point(aes(col=Region, size=Population)) +
geom_smooth(method="loess", se=F) +
xlim(c(0, 50)) +
ylim(c(0.3, 0.8)) +
labs(subtitle=str_wrap("Irreligiousity vs % of Migrant Residents, domestic and foreign",width=35),
y="% Migrant Residents, domestic and foreign",
x="% Irreligious",
title="Scatterplot",
caption = "Source: US Census")
plot(gg1)
Medium strength correlation indicates that overall less religious states have more migrants.
To summarize what we have learned from the data and visualizations is:
Wikipedia. 2018. Geographic Mobility. https://en.wikipedia.org/wiki/Geographic_mobility.
———. 2019. Internal Migration. https://en.wikipedia.org/wiki/Internal_migration.