May 2, 2017

Introduction:

Is there any correlation between the number of Airports and Human Population in the states, counties in USA.

To achieve this objective I collected the data about all the airports and the population distribute all accross in USA.

I wanted to look at the role of airports in the economic development of cities and regions. In general, the bigger the city the more airport activities there are, however they could be measured. Metro size is strongly correlated with the total number of flights and the number of passengers.

In USA we have different airport types.

What are the cases, and how many are there?

  • In the data I identified following types of airports availale in the USA.
    • Heliport
    • Small_airport
    • Closed
    • Seaplane_base
    • Balloonport
    • Medium_airport
    • Large_airport
  • Airports have a bigger effect on economic development by moving people and cargo.

cases….

  • US population live around these airports.
  • Identifying the human population distribution around them.
  • Both the number of passengers and flights are related to economic output, wages, and incomes.
  • Correlations could be include in time-sensitive manufacturing and distribution, hotel, entertainment, retail, convention, trade and exhibition complexes;

Data:

  • For this project I am using the data feeds from the following two sites

This site has most updated data about airports in the world. I do filter only the airports in USA
1.http://ourairports.com/data.

USA Census web site
2.https://www2.census.gov/programs-surveys/popest/datasets.

  • In these web sites I found important, accurate and up-to-date data information
  • I made direct downloads from the .csv files into the data frames.
  • Using the R package dplyr hope to do staging of the data.

What type of study is this:
- This is a Observational study, because from the existing data I am trying to identify the correlation.

Response:
- The response variable is number of airports and is numerical.

Explanatory:
- The population in the area is the Explanatory and it is also numerical.

Feeds:

W3Schools.com

Data File "Airports":

W3Schools.com

W3Schools.com

W3Schools.com

Data File "Regions ":

W3Schools.com

W3Schools.com

Data File "Population":

W3Schools.com

W3Schools.com

Staging and Filtering data frames

  • USA Airports

>usairports_df = airports_df %>% filter(iso_country == "US")
>head(usairports_df)

  • USA Regions, USA Population

>usaregions_df = regions_df%>% filter(iso_country == "US"))
>head(usaregions_df)

  • USA Population

>usaPopult_df = Popult_df%>% filter(STATE != 0)
>head(usaPopult_df)

Joining two data frames Airports and Regions to get the State Names

>df1<-inner_join(usairports_df,usaregions_df, by = c("iso_region"="code"))

>allairports_df <- arrange(select(df1,Region=iso_region,Municipality = municipality,State=name.y,AirPortName=name.x,AirPortType=type),(State))

>head(allairports_df)

Calculating Airport Counts by State and Airport Types in to two different Data Frames
>allairports_types_bystat_df <- allairports_df %>%
group_by(State,AirPortType) %>% summarise(airportcount = n())%>% arrange%>% filter(State != '(unassigned)')

>head(allairports_types_bystat_df)

>allairports_bystat_df <- allairports_df %>% group_by(State) %>% summarise(airportcount = n())%>% arrange %>% filter(State != '(unassigned)')

>head(allairports_bystat_df)

Consider only the population in 2016

>allpopu_bystat_df <- arrange(select(usaPopult_df,Region=NAME,Population = POPESTIMATE2016) ,(Region))

head(allpopu_bystat_df)

Join Airport Data Fram with the Population Data Frame and Calculate population in 10k

>allairports_allpopu_bystat_df<-left_join(allairports_bystat_df,allpopu_bystat_df, by = c("State"="Region"))

>allairports_allpopu_bystat_df <-allairports_allpopu_bystat_df%>% select(State,airportcount,Population) %>% mutate(Popu_10k = Population / 10000)

Staging and Filtering data frames

> df3 <- left_join(allairports_types_bystat_df,allpopu_bystat_df, by = c("State"="Region"))

>allairports_types_allpopu_bystat_df<- df3 %>% select(State,AirPortType,airportcount,Population) %>% mutate(avgpop_airport = Population / airportcount) summary(allairports_types_allpopu_bystat_df )

> summary(allairports_allpopu_bystat_df)

Create LM and plot the Graph

>allairports_allpopu_bystat_df.lm1 <- lm(airportcount ~ Popu_10k, data = allairports_allpopu_bystat_df) plot(allairports_allpopu_bystat_df\(airportcount ~ allairports_allpopu_bystat_df\)Popu_10k, main = "Relationship between State Population vs Airports",xlab='Population in States(10Ks)',ylab='Airport Count')

>abline(allairports_allpopu_bystat_df.lm1 )

>abline(h=429,col = "red")

>abline(v=633,col = "blue" )

Relationship between State Population vs Airports

W3Schools.com

Conclusion.

The above graph suggests that Airport Count increases linearly with Population in States(10Ks) so I will fit a simple linear regression model to the data model with the mean of Airport Count and Population in States(10Ks).

>summary(allairports_allpopu_bystat_df.lm1)
W3Schools.com

Conclusion……

-We get a lot of useful information here without being too overwhelmed by pages of output.

-The estimates for the model intercept is 189.87966 and the coefficient measuring the slope of the relationship with Popu_10k is 0.37886 and information about standard errors of these estimates is also provided in the Coefficients table. We see that the test of significance of the model coefficients is also summarised in that table so we can see that there is evidence that the coefficient is significantly different to zero - as the population increases so does airports. It proves Positive Correlation between the number of airports and state population.

Total Airport Count in each State……
>library(ggplot2) ggplot(allairports_allpopu_bystat_df, aes(fill=airportcount, y=airportcount, x=State)) + ggtitle("Total Airport Count in each State ") + geom_bar( stat="identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + geom_point()
W3Schools.com

Airport Type Count in each State……
>ggplot(allairports_types_allpopu_bystat_df, aes(fill=AirPortType, y=airportcount, x=State)) + ggtitle("Airport Type Count in each State ") + geom_bar( stat="identity") + theme(axis.text.x = element_text(angle = 70, hjust = 1))
W3Schools.com

Population in each State (10k)……
>datn1 <- allpopu_bystat_df %>% mutate(Popu_10k = Population / 10000) %>% arrange(Popu_10k) ggplot(data=datn1, aes(y=Popu_10k,x=Region, colour=Region)) + theme(axis.text.x = element_text(angle = 90, hjust = 1), legend.position="none") + geom_point() + geom_bar(stat="identity") + ggtitle("Population in each State (10k )")
W3Schools.com