Email: ninjuchandra@gmail.com

College: NIT, Trichy

1. Introduction

Demographic data refers to data that is statistically socio-economic in nature such as population, race, income, education and employment, which represent specific geographic locations and are often associated with time. For example, when referring to population demographic data, we have characteristics such as area population, population growth or birthrate, ethnicity, density and distribution. With regard to employment, we have employment and unemployment rates, which can be related further to gender and ethnicity.

Demographic data is often gathered by census organizations, both government and private, which may use the data for research, marketing, and environmental and human development. Data such as population and employment and all their related data fields such as density, ethnicity and gender can be used by the government to plan for infrastructure development such as roads, hospitals and law enforcement.

Demographic data is also used by marketers and business entities for targeted advertising and product distribution. For example, in areas with a large population density of Latinos, fast food restaurants often offer Mexican-themed foods as opposed to Mediterranean ones. Branches of famous American fast food companies in foreign countries often tailor their menu according to the local taste. All of these are due to demographic research using demographic data.

2. Overview of the Study

This study deals with the analysis of the 2015 US Demographic Data. The data contains demographic data for each census tract within a county, which is within a state. Census tracts are defined by the census bureau and will have a much more consistent size. A typical census tract has around 5000 or so residents. The demographic data include details about the total population, men, women, origin of the residents, median income per household, percentage employed in different sectors of the industry, etc. Our regression model reveals that the median income per household depend mainly on the population distribution based on origin. Some of our comparison tests indicate that there is a significant difference between the population of men and women in a tract.

3. An empirical field study of US Demographic Data, 2015

3.1 Overview

This study deals with the analysis of the 2015 US Demographic Data to find out the factors which affect the median household income per tract. The median household income was compared between each tract and then tested with our regression model. The regression model is made, assuming that income depends mainly on the population distribution based on origin, percentage of people working in different sectors of the industry, etc.

Hypothesis H1: The income depends mainly on the population distribution based on origin, percentage of people working in different sectors of the industry.

3.2 Column Metadeta and relevant information

The data was collected from the website https://www.kaggle.com/muonneutrino/us-census-demographic-data.

The data contains demographic data for each census tract within a county, which is within a state. Census tracts are defined by the census bureau and will have a much more consistent size. A typical census tract has around 5000 or so residents. The demographic data include details about the total population, men, women, origin of the residents, median income per household, percentage employed in different sectors of the industry, etc. The information about each column is given below. The column name is given in bold, with their meanings un-highlighted.

CensusTract Census tract ID State State, DC, or Puerto Rico County County or county equivalent TotalPop Total population Men Number of men Women Number of women Hispanic percent of population that is Hispanic/Latino White percent of population that is white Black percent of population that is black Native percent of population that is Native American or Native Alaskan Asian percent of population that is Asian Pacific percent of population that is Native Hawaiian or Pacific Islander Citizen Number of citizens Income Median household income IncomeErr Median household income error IncomePerCap Income per capita IncomePerCapErr Income per capita error Poverty percent under poverty level ChildPoverty percent of children under poverty level Professional percent employed in management, business, science, and arts Service percent employed in service jobs Office percent employed in sales and office jobs Construction percent employed in natural resources, construction, and maintenance Production percent employed in production, transportation, and material movement Drive percent commuting alone in a car, van, or truck Carpool percent carpooling in a car, van, or truck Transit percent commuting on public transportation Walk percent walking to work OtherTransp percent commuting via other means WorkAtHome percent working at home MeanCommute Mean commute time (minutes) Employed percent employed (16+) PrivateWork percent employed in private industry PublicWork percent employed in public jobs SelfEmployed percent self-employed FamilyWork percent in unpaid family work Unemployment Unemployment rate (percent)

3.3 Model

In order to test Hypothesis H1, I proposed the following model:

Income = a0 + a1.Hispanic + a2.White + a3.Black + a4.Native + a5.Asian + a6.Pacific + a7.Professional + a8.Service + a9.Office + a10.Construction + a11.Production + error factor

Read your dataset in R and visualize the length and breadth of your dataset.

setwd("D:/R Internship")
US_data<-read.csv(paste("US Demographic Data 2015_Census Tract.csv",sep = ""))
View(US_data)
str(US_data)
## 'data.frame':    74001 obs. of  37 variables:
##  $ CensusTract    : num  1e+09 1e+09 1e+09 1e+09 1e+09 ...
##  $ State          : Factor w/ 52 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ County         : Factor w/ 1928 levels "Añasco","Abbeville",..: 90 90 90 90 90 90 90 90 90 90 ...
##  $ TotalPop       : int  1948 2156 2968 4423 10763 3851 2761 3187 10915 5668 ...
##  $ Men            : int  940 1059 1364 2172 4922 1787 1210 1502 5486 2897 ...
##  $ Women          : int  1008 1097 1604 2251 5841 2064 1551 1685 5429 2771 ...
##  $ Hispanic       : num  0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
##  $ White          : num  87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
##  $ Black          : num  7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
##  $ Native         : num  0.3 0 0.5 1.6 0 0 0 3.1 0 0 ...
##  $ Asian          : num  0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
##  $ Pacific        : num  0 0 0.3 0 0 0 0 0 0 0 ...
##  $ Citizen        : int  1503 1662 2335 3306 7666 2642 2060 2391 7778 4217 ...
##  $ Income         : int  61838 32303 44922 54329 51965 63092 34821 73728 60063 41287 ...
##  $ IncomeErr      : int  11900 13538 5629 7003 6935 9585 7867 2447 8602 7857 ...
##  $ IncomePerCap   : int  25713 18021 20689 24125 27526 30480 20442 32813 24028 24710 ...
##  $ IncomePerCapErr: int  4548 2474 2817 2870 2813 7550 3245 4669 2233 4149 ...
##  $ Poverty        : num  8.1 25.5 12.7 2.1 11.4 14.4 28.9 13 13.9 6.8 ...
##  $ ChildPoverty   : num  8.4 40.3 19.7 1.6 17.5 21.9 41.9 25.9 18.3 10 ...
##  $ Professional   : num  34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
##  $ Service        : num  17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
##  $ Office         : num  21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
##  $ Construction   : num  11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
##  $ Production     : num  15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
##  $ Drive          : num  90.2 86.3 94.8 86.6 88 82.7 92.4 84.3 90.1 88.7 ...
##  $ Carpool        : num  4.8 13.1 2.8 9.1 10.5 6.9 7.6 8.1 8.6 7.9 ...
##  $ Transit        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Walk           : num  0.5 0 0 0 0 0 0 0 0 0 ...
##  $ OtherTransp    : num  2.3 0.7 0 2.6 0.6 6 0 1.7 0 1.2 ...
##  $ WorkAtHome     : num  2.1 0 2.5 1.6 0.9 4.5 0 5.9 1.3 2.1 ...
##  $ MeanCommute    : num  25 23.4 19.6 25.3 24.8 19.8 20 24.3 29.4 32.9 ...
##  $ Employed       : int  943 753 1373 1782 5037 1560 1166 1502 4348 2485 ...
##  $ PrivateWork    : num  77.1 77 64.1 75.7 67.1 79.4 82 78.1 73.3 77.9 ...
##  $ PublicWork     : num  18.3 16.9 23.6 21.2 27.6 14.7 14.6 14.8 22.1 15.2 ...
##  $ SelfEmployed   : num  4.6 6.1 12.3 3.1 5.3 5.8 3.4 7.1 4.6 6.9 ...
##  $ FamilyWork     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Unemployment   : num  5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
dim(US_data) 
## [1] 74001    37

To create a linear regression model with Income as the dependant variable.

fit1<-lm(Income~Hispanic+White+Black+Native+Asian+Pacific+Professional+Service
         +Office+Construction+Production,data = US_data)
summary(fit1)
## 
## Call:
## lm(formula = Income ~ Hispanic + White + Black + Native + Asian + 
##     Pacific + Professional + Service + Office + Construction + 
##     Production, data = US_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -123062  -10576    -842    9236  151257 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   25112.06  107112.79   0.234    0.815    
## Hispanic        -19.76      29.22  -0.676    0.499    
## White            -5.65      29.48  -0.192    0.848    
## Black          -128.99      29.67  -4.347 1.38e-05 ***
## Native         -269.09      34.56  -7.787 6.95e-15 ***
## Asian           331.76      32.25  10.288  < 2e-16 ***
## Pacific         395.81      83.93   4.716 2.41e-06 ***
## Professional   1116.58    1070.88   1.043    0.297    
## Service        -528.32    1070.91  -0.493    0.622    
## Office          178.09    1070.93   0.166    0.868    
## Construction    191.76    1070.97   0.179    0.858    
## Production     -132.75    1070.93  -0.124    0.901    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18590 on 72887 degrees of freedom
##   (1102 observations deleted due to missingness)
## Multiple R-squared:  0.5792, Adjusted R-squared:  0.5791 
## F-statistic:  9120 on 11 and 72887 DF,  p-value: < 2.2e-16

We have created the model with Income per household as the dependant parameter, with the population distributions across different origins, percentage of people employed in each sector of the industry as the independant parameters. The model gives the ‘a’ coefficients for each of the independant parameters.

3.4 Results

We found that our hypothesis wasn’t completely true as all the factors didn’t affect the income per household. The model indicates that the population distribution based on origins are the main determinants of the income parameter with p-values < 0.05 and interestingly, the sectors of the industry haven’t had a major role in this.

4. Conclusion

This analysis was conducted in order to understand the patterns of the US Demographic Data. We investigated the median income per household in every tract and how different factors affect it. We found that the population distribution based on Black, Asian, Native, Pacific origins are the main determinants of the income parameter.

5. References

  1. https://www.kaggle.com/muonneutrino/us-census-demographic-data.

  2. https://www.techopedia.com/definition/30326/demographic-data

Appendix 1

Create a descriptive statistics (min, max, median etc) of each variable.

library(psych)
summary(US_data)
##   CensusTract                 State               County     
##  Min.   :1.001e+09   California  : 8057   Los Angeles: 2346  
##  1st Qu.:1.304e+10   Texas       : 5265   Cook       : 1326  
##  Median :2.805e+10   New York    : 4918   Orange     :  939  
##  Mean   :2.839e+10   Florida     : 4245   Jefferson  :  927  
##  3rd Qu.:4.200e+10   Pennsylvania: 3218   Maricopa   :  916  
##  Max.   :7.215e+10   Illinois    : 3123   Montgomery :  833  
##                      (Other)     :45175   (Other)    :66714  
##     TotalPop          Men            Women          Hispanic     
##  Min.   :    0   Min.   :    0   Min.   :    0   Min.   :  0.00  
##  1st Qu.: 2891   1st Qu.: 1409   1st Qu.: 1461   1st Qu.:  2.40  
##  Median : 4063   Median : 1986   Median : 2066   Median :  7.00  
##  Mean   : 4326   Mean   : 2128   Mean   : 2198   Mean   : 16.86  
##  3rd Qu.: 5442   3rd Qu.: 2674   3rd Qu.: 2774   3rd Qu.: 20.40  
##  Max.   :53812   Max.   :27962   Max.   :27250   Max.   :100.00  
##                                                  NA's   :690     
##      White            Black            Native             Asian       
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.0000   Min.   : 0.000  
##  1st Qu.: 39.40   1st Qu.:  0.70   1st Qu.:  0.0000   1st Qu.: 0.200  
##  Median : 71.40   Median :  3.70   Median :  0.0000   Median : 1.400  
##  Mean   : 62.03   Mean   : 13.27   Mean   :  0.7277   Mean   : 4.588  
##  3rd Qu.: 88.30   3rd Qu.: 14.40   3rd Qu.:  0.4000   3rd Qu.: 4.800  
##  Max.   :100.00   Max.   :100.00   Max.   :100.0000   Max.   :91.300  
##  NA's   :690      NA's   :690      NA's   :690        NA's   :690     
##     Pacific          Citizen          Income         IncomeErr     
##  Min.   : 0.000   Min.   :    0   Min.   :  2611   Min.   :   390  
##  1st Qu.: 0.000   1st Qu.: 2037   1st Qu.: 37683   1st Qu.:  5317  
##  Median : 0.000   Median : 2863   Median : 51094   Median :  7732  
##  Mean   : 0.145   Mean   : 3043   Mean   : 57226   Mean   :  9134  
##  3rd Qu.: 0.000   3rd Qu.: 3838   3rd Qu.: 70117   3rd Qu.: 11258  
##  Max.   :84.700   Max.   :37416   Max.   :248750   Max.   :123116  
##  NA's   :690                      NA's   :1100     NA's   :1100    
##   IncomePerCap    IncomePerCapErr     Poverty        ChildPoverty   
##  Min.   :   128   Min.   :    85   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 19123   1st Qu.:  2312   1st Qu.:  7.20   1st Qu.:  7.00  
##  Median : 25344   Median :  3127   Median : 13.40   Median : 17.80  
##  Mean   : 28491   Mean   :  3943   Mean   : 16.96   Mean   : 22.49  
##  3rd Qu.: 33894   3rd Qu.:  4537   3rd Qu.: 23.10   3rd Qu.: 33.80  
##  Max.   :254204   Max.   :134380   Max.   :100.00   Max.   :100.00  
##  NA's   :740      NA's   :740      NA's   :835      NA's   :1118    
##   Professional       Service          Office        Construction    
##  Min.   :  0.00   Min.   :  0.0   Min.   :  0.00   Min.   :  0.000  
##  1st Qu.: 24.10   1st Qu.: 13.4   1st Qu.: 20.10   1st Qu.:  5.000  
##  Median : 32.60   Median : 17.9   Median : 23.80   Median :  8.400  
##  Mean   : 34.80   Mean   : 19.1   Mean   : 23.95   Mean   :  9.292  
##  3rd Qu.: 43.88   3rd Qu.: 23.6   3rd Qu.: 27.50   3rd Qu.: 12.500  
##  Max.   :100.00   Max.   :100.0   Max.   :100.00   Max.   :100.000  
##  NA's   :807      NA's   :807     NA's   :807      NA's   :807      
##    Production         Drive           Carpool           Transit       
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.000   Min.   :  0.000  
##  1st Qu.:  7.10   1st Qu.: 72.00   1st Qu.:  6.000   1st Qu.:  0.000  
##  Median : 11.80   Median : 79.70   Median :  8.800   Median :  1.100  
##  Mean   : 12.86   Mean   : 75.53   Mean   :  9.627   Mean   :  5.456  
##  3rd Qu.: 17.40   3rd Qu.: 84.90   3rd Qu.: 12.300   3rd Qu.:  4.700  
##  Max.   :100.00   Max.   :100.00   Max.   :100.000   Max.   :100.000  
##  NA's   :807      NA's   :797      NA's   :797       NA's   :797      
##       Walk          OtherTransp        WorkAtHome       MeanCommute   
##  Min.   :  0.000   Min.   :  0.000   Min.   :  0.000   Min.   : 1.20  
##  1st Qu.:  0.400   1st Qu.:  0.400   1st Qu.:  1.800   1st Qu.:20.80  
##  Median :  1.400   Median :  1.100   Median :  3.500   Median :25.00  
##  Mean   :  3.123   Mean   :  1.892   Mean   :  4.368   Mean   :25.67  
##  3rd Qu.:  3.500   3rd Qu.:  2.500   3rd Qu.:  5.900   3rd Qu.:29.80  
##  Max.   :100.000   Max.   :100.000   Max.   :100.000   Max.   :80.00  
##  NA's   :797       NA's   :797       NA's   :797       NA's   :949    
##     Employed      PrivateWork       PublicWork      SelfEmployed    
##  Min.   :    0   Min.   :  0.00   Min.   :  0.00   Min.   :  0.000  
##  1st Qu.: 1249   1st Qu.: 74.60   1st Qu.:  9.60   1st Qu.:  3.500  
##  Median : 1846   Median : 80.10   Median : 13.40   Median :  5.500  
##  Mean   : 1984   Mean   : 78.98   Mean   : 14.62   Mean   :  6.234  
##  3rd Qu.: 2553   3rd Qu.: 84.60   3rd Qu.: 18.20   3rd Qu.:  8.100  
##  Max.   :24075   Max.   :100.00   Max.   :100.00   Max.   :100.000  
##                  NA's   :807      NA's   :807      NA's   :807      
##    FamilyWork       Unemployment    
##  Min.   : 0.0000   Min.   :  0.000  
##  1st Qu.: 0.0000   1st Qu.:  5.100  
##  Median : 0.0000   Median :  7.700  
##  Mean   : 0.1698   Mean   :  9.029  
##  3rd Qu.: 0.0000   3rd Qu.: 11.400  
##  Max.   :26.5000   Max.   :100.000  
##  NA's   :807       NA's   :802
describe(US_data) 
##                 vars     n         mean           sd     median
## CensusTract        1 74001 2.839113e+10 1.647593e+10 2.8047e+10
## State*             2 74001 2.534000e+01 1.510000e+01 2.5000e+01
## County*            3 74001 9.879300e+02 5.229400e+02 1.0340e+03
## TotalPop           4 74001 4.325590e+03 2.129310e+03 4.0630e+03
## Men                5 74001 2.127650e+03 1.072330e+03 1.9860e+03
## Women              6 74001 2.197940e+03 1.095730e+03 2.0660e+03
## Hispanic           7 73311 1.686000e+01 2.294000e+01 7.0000e+00
## White              8 73311 6.203000e+01 3.068000e+01 7.1400e+01
## Black              9 73311 1.327000e+01 2.176000e+01 3.7000e+00
## Native            10 73311 7.300000e-01 4.490000e+00 0.0000e+00
## Asian             11 73311 4.590000e+00 8.790000e+00 1.4000e+00
## Pacific           12 73311 1.500000e-01 1.040000e+00 0.0000e+00
## Citizen           13 74001 3.043080e+03 1.475490e+03 2.8630e+03
## Income            14 72901 5.722556e+04 2.866333e+04 5.1094e+04
## IncomeErr         15 72901 9.134490e+03 5.920340e+03 7.7320e+03
## IncomePerCap      16 73261 2.849123e+04 1.504707e+04 2.5344e+04
## IncomePerCapErr   17 73261 3.942910e+03 3.023030e+03 3.1270e+03
## Poverty           18 73166 1.696000e+01 1.320000e+01 1.3400e+01
## ChildPoverty      19 72883 2.249000e+01 1.919000e+01 1.7800e+01
## Professional      20 73194 3.480000e+01 1.501000e+01 3.2600e+01
## Service           21 73194 1.910000e+01 8.280000e+00 1.7900e+01
## Office            22 73194 2.395000e+01 5.960000e+00 2.3800e+01
## Construction      23 73194 9.290000e+00 6.020000e+00 8.4000e+00
## Production        24 73194 1.286000e+01 7.670000e+00 1.1800e+01
## Drive             25 73204 7.553000e+01 1.537000e+01 7.9700e+01
## Carpool           26 73204 9.630000e+00 5.370000e+00 8.8000e+00
## Transit           27 73204 5.460000e+00 1.172000e+01 1.1000e+00
## Walk              28 73204 3.120000e+00 5.880000e+00 1.4000e+00
## OtherTransp       29 73204 1.890000e+00 2.600000e+00 1.1000e+00
## WorkAtHome        30 73204 4.370000e+00 3.900000e+00 3.5000e+00
## MeanCommute       31 73052 2.567000e+01 6.960000e+00 2.5000e+01
## Employed          32 74001 1.983910e+03 1.073430e+03 1.8460e+03
## PrivateWork       33 73194 7.898000e+01 8.350000e+00 8.0100e+01
## PublicWork        34 73194 1.462000e+01 7.540000e+00 1.3400e+01
## SelfEmployed      35 73194 6.230000e+00 4.040000e+00 5.5000e+00
## FamilyWork        36 73194 1.700000e-01 4.600000e-01 0.0000e+00
## Unemployment      37 73199 9.030000e+00 5.960000e+00 7.7000e+00
##                      trimmed          mad         min          max
## CensusTract     2.803846e+10 2.080416e+10 1.00102e+09 7.215375e+10
## State*          2.516000e+01 2.076000e+01 1.00000e+00 5.200000e+01
## County*         9.921900e+02 6.553100e+02 1.00000e+00 1.928000e+03
## TotalPop        4.165830e+03 1.866590e+03 0.00000e+00 5.381200e+04
## Men             2.041010e+03 9.236600e+02 0.00000e+00 2.796200e+04
## Women           2.117140e+03 9.622100e+02 0.00000e+00 2.725000e+04
## Hispanic        1.161000e+01 8.450000e+00 0.00000e+00 1.000000e+02
## White           6.496000e+01 3.039000e+01 0.00000e+00 1.000000e+02
## Black           7.770000e+00 5.340000e+00 0.00000e+00 1.000000e+02
## Native          1.700000e-01 0.000000e+00 0.00000e+00 1.000000e+02
## Asian           2.520000e+00 2.080000e+00 0.00000e+00 9.130000e+01
## Pacific         0.000000e+00 0.000000e+00 0.00000e+00 8.470000e+01
## Citizen         2.935270e+03 1.319510e+03 0.00000e+00 3.741600e+04
## Income          5.378624e+04 2.265116e+04 2.61100e+03 2.487500e+05
## IncomeErr       8.270770e+03 4.123110e+03 3.90000e+02 1.231160e+05
## IncomePerCap    2.649608e+04 1.054425e+04 1.28000e+02 2.542040e+05
## IncomePerCapErr 3.422590e+03 1.460360e+03 8.50000e+01 1.343800e+05
## Poverty         1.509000e+01 1.082000e+01 0.00000e+00 1.000000e+02
## ChildPoverty    2.013000e+01 1.838000e+01 0.00000e+00 1.000000e+02
## Professional    3.383000e+01 1.423000e+01 0.00000e+00 1.000000e+02
## Service         1.847000e+01 7.410000e+00 0.00000e+00 1.000000e+02
## Office          2.383000e+01 5.490000e+00 0.00000e+00 1.000000e+02
## Construction    8.740000e+00 5.490000e+00 0.00000e+00 1.000000e+02
## Production      1.224000e+01 7.560000e+00 0.00000e+00 1.000000e+02
## Drive           7.831000e+01 8.900000e+00 0.00000e+00 1.000000e+02
## Carpool         9.140000e+00 4.600000e+00 0.00000e+00 1.000000e+02
## Transit         2.470000e+00 1.630000e+00 0.00000e+00 1.000000e+02
## Walk            1.920000e+00 2.080000e+00 0.00000e+00 1.000000e+02
## OtherTransp     1.410000e+00 1.480000e+00 0.00000e+00 1.000000e+02
## WorkAtHome      3.860000e+00 2.820000e+00 0.00000e+00 1.000000e+02
## MeanCommute     2.528000e+01 6.520000e+00 1.20000e+00 8.000000e+01
## Employed        1.899360e+03 9.562800e+02 0.00000e+00 2.407500e+04
## PrivateWork     7.963000e+01 7.410000e+00 0.00000e+00 1.000000e+02
## PublicWork      1.387000e+01 6.230000e+00 0.00000e+00 1.000000e+02
## SelfEmployed    5.800000e+00 3.260000e+00 0.00000e+00 1.000000e+02
## FamilyWork      6.000000e-02 0.000000e+00 0.00000e+00 2.650000e+01
## Unemployment    8.210000e+00 4.450000e+00 0.00000e+00 1.000000e+02
##                        range  skew kurtosis          se
## CensusTract     7.115273e+10  0.13    -0.92 60566306.31
## State*          5.100000e+01  0.02    -1.33        0.06
## County*         1.927000e+03 -0.08    -1.04        1.92
## TotalPop        5.381200e+04  1.83    14.56        7.83
## Men             2.796200e+04  1.96    17.07        3.94
## Women           2.725000e+04  1.79    13.80        4.03
## Hispanic        1.000000e+02  2.00     3.39        0.08
## White           1.000000e+02 -0.67    -0.86        0.11
## Black           1.000000e+02  2.31     4.77        0.08
## Native          1.000000e+02 15.89   289.71        0.02
## Asian           9.130000e+01  3.95    19.99        0.03
## Pacific         8.470000e+01 26.92  1295.51        0.00
## Citizen         3.741600e+04  1.61    12.68        5.42
## Income          2.461390e+05  1.48     3.46      106.16
## IncomeErr       1.227260e+05  3.00    21.40       21.93
## IncomePerCap    2.540760e+05  2.33    10.80       55.59
## IncomePerCapErr 1.342950e+05  5.81    96.59       11.17
## Poverty         1.000000e+02  1.46     2.69        0.05
## ChildPoverty    1.000000e+02  1.02     0.63        0.07
## Professional    1.000000e+02  0.60     0.09        0.06
## Service         1.000000e+02  0.97     2.44        0.03
## Office          1.000000e+02  0.73     7.23        0.02
## Construction    1.000000e+02  1.49     6.34        0.02
## Production      1.000000e+02  0.97     2.55        0.03
## Drive           1.000000e+02 -2.19     5.67        0.06
## Carpool         1.000000e+02  1.77    12.50        0.02
## Transit         1.000000e+02  3.63    14.68        0.04
## Walk            1.000000e+02  5.61    46.31        0.02
## OtherTransp     1.000000e+02  5.12    70.50        0.01
## WorkAtHome      1.000000e+02  3.95    50.07        0.01
## MeanCommute     7.880000e+01  0.63     0.85        0.03
## Employed        2.407500e+04  1.63    10.65        3.95
## PrivateWork     1.000000e+02 -1.32     5.17        0.03
## PublicWork      1.000000e+02  1.75     7.57        0.03
## SelfEmployed    1.000000e+02  2.92    38.47        0.01
## FamilyWork      2.650000e+01  7.29   188.67        0.00
## Unemployment    1.000000e+02  2.16    10.67        0.02

Creating 2 subsets of the US data, which have different no.of rows but data from 2 states only

US_data1<-US_data[c(1:50,1182:1231),]
US_data2<-US_data[(US_data$State=="Alabama") | (US_data$State=="Alaska"),]
US_data1$State_1<-droplevels(US_data1$State)
US_data1$County_1<-droplevels(US_data1$County)
US_data2$State_1<-droplevels(US_data2$State)
US_data2$County_1<-droplevels(US_data2$County)
dim(US_data1)
## [1] 100  39
View(US_data1)
dim(US_data2)
## [1] 1348   39
View(US_data2)
str(US_data2) 
## 'data.frame':    1348 obs. of  39 variables:
##  $ CensusTract    : num  1e+09 1e+09 1e+09 1e+09 1e+09 ...
##  $ State          : Factor w/ 52 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ County         : Factor w/ 1928 levels "Añasco","Abbeville",..: 90 90 90 90 90 90 90 90 90 90 ...
##  $ TotalPop       : int  1948 2156 2968 4423 10763 3851 2761 3187 10915 5668 ...
##  $ Men            : int  940 1059 1364 2172 4922 1787 1210 1502 5486 2897 ...
##  $ Women          : int  1008 1097 1604 2251 5841 2064 1551 1685 5429 2771 ...
##  $ Hispanic       : num  0.9 0.8 0 10.5 0.7 13.1 3.8 1.3 1.4 0.4 ...
##  $ White          : num  87.4 40.4 74.5 82.8 68.5 72.9 74.5 84 89.5 85.5 ...
##  $ Black          : num  7.7 53.3 18.6 3.7 24.8 11.9 19.7 10.7 8.4 12.1 ...
##  $ Native         : num  0.3 0 0.5 1.6 0 0 0 3.1 0 0 ...
##  $ Asian          : num  0.6 2.3 1.4 0 3.8 0 0 0 0 0.3 ...
##  $ Pacific        : num  0 0 0.3 0 0 0 0 0 0 0 ...
##  $ Citizen        : int  1503 1662 2335 3306 7666 2642 2060 2391 7778 4217 ...
##  $ Income         : int  61838 32303 44922 54329 51965 63092 34821 73728 60063 41287 ...
##  $ IncomeErr      : int  11900 13538 5629 7003 6935 9585 7867 2447 8602 7857 ...
##  $ IncomePerCap   : int  25713 18021 20689 24125 27526 30480 20442 32813 24028 24710 ...
##  $ IncomePerCapErr: int  4548 2474 2817 2870 2813 7550 3245 4669 2233 4149 ...
##  $ Poverty        : num  8.1 25.5 12.7 2.1 11.4 14.4 28.9 13 13.9 6.8 ...
##  $ ChildPoverty   : num  8.4 40.3 19.7 1.6 17.5 21.9 41.9 25.9 18.3 10 ...
##  $ Professional   : num  34.7 22.3 31.4 27 49.6 24.2 19.5 42.8 31.5 29.3 ...
##  $ Service        : num  17 24.7 24.9 20.8 14.2 17.5 29.6 10.7 17.5 13.7 ...
##  $ Office         : num  21.3 21.5 22.1 27 18.2 35.4 25.3 34.2 26.1 17.7 ...
##  $ Construction   : num  11.9 9.4 9.2 8.7 2.1 7.9 10.1 5.5 7.8 11 ...
##  $ Production     : num  15.2 22 12.4 16.4 15.8 14.9 15.5 6.8 17.1 28.3 ...
##  $ Drive          : num  90.2 86.3 94.8 86.6 88 82.7 92.4 84.3 90.1 88.7 ...
##  $ Carpool        : num  4.8 13.1 2.8 9.1 10.5 6.9 7.6 8.1 8.6 7.9 ...
##  $ Transit        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Walk           : num  0.5 0 0 0 0 0 0 0 0 0 ...
##  $ OtherTransp    : num  2.3 0.7 0 2.6 0.6 6 0 1.7 0 1.2 ...
##  $ WorkAtHome     : num  2.1 0 2.5 1.6 0.9 4.5 0 5.9 1.3 2.1 ...
##  $ MeanCommute    : num  25 23.4 19.6 25.3 24.8 19.8 20 24.3 29.4 32.9 ...
##  $ Employed       : int  943 753 1373 1782 5037 1560 1166 1502 4348 2485 ...
##  $ PrivateWork    : num  77.1 77 64.1 75.7 67.1 79.4 82 78.1 73.3 77.9 ...
##  $ PublicWork     : num  18.3 16.9 23.6 21.2 27.6 14.7 14.6 14.8 22.1 15.2 ...
##  $ SelfEmployed   : num  4.6 6.1 12.3 3.1 5.3 5.8 3.4 7.1 4.6 6.9 ...
##  $ FamilyWork     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Unemployment   : num  5.4 13.3 6.2 10.8 4.2 10.9 11.4 8.2 8.7 7.2 ...
##  $ State_1        : Factor w/ 2 levels "Alabama","Alaska": 1 1 1 1 1 1 1 1 1 1 ...
##  $ County_1       : Factor w/ 96 levels "Aleutians East Borough",..: 4 4 4 4 4 4 4 4 4 4 ...

Appendix 2

Creating one-way contingency tables for the State and County Columns

table(droplevels(US_data2$State))                               
## 
## Alabama  Alaska 
##    1181     167
table(droplevels(US_data1$County)) 
## 
##     Aleutians East Borough Aleutians West Census Area 
##                          1                          2 
##     Anchorage Municipality                    Autauga 
##                         47                         12 
##                    Baldwin                    Barbour 
##                         32                          6

Create two-way contingency tables for the categorical variables in your dataset

xtabs(~County_1+State_1,data = US_data1)
##                             State_1
## County_1                     Alabama Alaska
##   Aleutians East Borough           0      1
##   Aleutians West Census Area       0      2
##   Anchorage Municipality           0     47
##   Autauga                         12      0
##   Baldwin                         32      0
##   Barbour                          6      0

Boxplot of Income in 2 specific states in US - Alabama and Alaska

boxplot(Income~State_1,data = US_data2,
        main="Boxplot of Income in 2 specific states in US",
        xlab="States",ylab="Income")

Boxplot of Population in 6 counties

boxplot(TotalPop~County_1,data = US_data1,
        main="Boxplot of Population in 6 counties",
        xlab="Counties",ylab="Population")
axis(side=1,at=2,labels = "Aleutians West")

Histogram of No.of Employed people above 16 years in US

hist(US_data$Employed,main="Histogram of No.of Employed people above 16 years in US",
     xlab="No.of employed people per tract",ylab = "Count",xlim = c(0,10000),
     col = "light blue")

Histogram of percent of people who commute alone in US

hist(US_data$Drive,main="Histogram of percent of people who commute alone in US",
     xlab="percent of people who commute alone per tract",ylab = "Count",col = "grey")

Appendix 3

Scatterplot between the percent of White people and percent of Professionals

library(car)
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
scatterplot(Professional~White,data = US_data2,spread=FALSE,
            smoother.args=list(lty=2),pch=1,
main="Plot between percent of White people and percent employed in Management,Business,Science and Arts")

Scatterplot between the no.of men and percent of people employed in construction sector

scatterplot(Construction~Men,data = US_data2,spread=FALSE,
            smoother.args=list(lty=2),pch=1,
main="Plot between no.of men and percent employed in Construction sector")

Create a correlation matrix between the numeric variables

corr.test(US_data[,4:20])
## Call:corr.test(x = US_data[, 4:20])
## Correlation matrix 
##                 TotalPop   Men Women Hispanic White Black Native Asian
## TotalPop            1.00  0.98  0.98     0.11 -0.03 -0.11  -0.04  0.10
## Men                 0.98  1.00  0.93     0.12 -0.02 -0.13  -0.03  0.10
## Women               0.98  0.93  1.00     0.10 -0.03 -0.09  -0.04  0.10
## Hispanic            0.11  0.12  0.10     1.00 -0.66 -0.12  -0.04  0.03
## White              -0.03 -0.02 -0.03    -0.66  1.00 -0.58  -0.07 -0.25
## Black              -0.11 -0.13 -0.09    -0.12 -0.58  1.00  -0.05 -0.11
## Native             -0.04 -0.03 -0.04    -0.04 -0.07 -0.05   1.00 -0.04
## Asian               0.10  0.10  0.10     0.03 -0.25 -0.11  -0.04  1.00
## Pacific             0.02  0.03  0.02     0.02 -0.09 -0.04   0.01  0.16
## Citizen             0.94  0.92  0.92    -0.11  0.17 -0.13  -0.04  0.03
## Income              0.17  0.18  0.17    -0.23  0.31 -0.31  -0.07  0.28
## IncomeErr          -0.01  0.00 -0.01    -0.10  0.10 -0.13  -0.06  0.22
## IncomePerCap        0.03  0.02  0.04    -0.31  0.38 -0.28  -0.07  0.20
## IncomePerCapErr    -0.10 -0.10 -0.09    -0.16  0.17 -0.12  -0.05  0.13
## Poverty            -0.15 -0.15 -0.15     0.34 -0.52  0.40   0.09 -0.12
## ChildPoverty       -0.15 -0.15 -0.14     0.32 -0.50  0.41   0.07 -0.16
## Professional        0.08  0.07  0.08    -0.33  0.35 -0.25  -0.04  0.26
##                 Pacific Citizen Income IncomeErr IncomePerCap
## TotalPop           0.02    0.94   0.17     -0.01         0.03
## Men                0.03    0.92   0.18      0.00         0.02
## Women              0.02    0.92   0.17     -0.01         0.04
## Hispanic           0.02   -0.11  -0.23     -0.10        -0.31
## White             -0.09    0.17   0.31      0.10         0.38
## Black             -0.04   -0.13  -0.31     -0.13        -0.28
## Native             0.01   -0.04  -0.07     -0.06        -0.07
## Asian              0.16    0.03   0.28      0.22         0.20
## Pacific            1.00    0.00   0.01      0.01        -0.03
## Citizen            0.00    1.00   0.20      0.01         0.11
## Income             0.01    0.20   1.00      0.61         0.83
## IncomeErr          0.01    0.01   0.61      1.00         0.60
## IncomePerCap      -0.03    0.11   0.83      0.60         1.00
## IncomePerCapErr   -0.01   -0.05   0.50      0.52         0.77
## Poverty            0.01   -0.24  -0.70     -0.35        -0.61
## ChildPoverty       0.00   -0.24  -0.66     -0.35        -0.59
## Professional      -0.03    0.18   0.73      0.49         0.80
##                 IncomePerCapErr Poverty ChildPoverty Professional
## TotalPop                  -0.10   -0.15        -0.15         0.08
## Men                       -0.10   -0.15        -0.15         0.07
## Women                     -0.09   -0.15        -0.14         0.08
## Hispanic                  -0.16    0.34         0.32        -0.33
## White                      0.17   -0.52        -0.50         0.35
## Black                     -0.12    0.40         0.41        -0.25
## Native                    -0.05    0.09         0.07        -0.04
## Asian                      0.13   -0.12        -0.16         0.26
## Pacific                   -0.01    0.01         0.00        -0.03
## Citizen                   -0.05   -0.24        -0.24         0.18
## Income                     0.50   -0.70        -0.66         0.73
## IncomeErr                  0.52   -0.35        -0.35         0.49
## IncomePerCap               0.77   -0.61        -0.59         0.80
## IncomePerCapErr            1.00   -0.29        -0.29         0.53
## Poverty                   -0.29    1.00         0.90        -0.54
## ChildPoverty              -0.29    0.90         1.00        -0.57
## Professional               0.53   -0.54        -0.57         1.00
## Sample Size 
##                 TotalPop   Men Women Hispanic White Black Native Asian
## TotalPop           74001 74001 74001    73311 73311 73311  73311 73311
## Men                74001 74001 74001    73311 73311 73311  73311 73311
## Women              74001 74001 74001    73311 73311 73311  73311 73311
## Hispanic           73311 73311 73311    73311 73311 73311  73311 73311
## White              73311 73311 73311    73311 73311 73311  73311 73311
## Black              73311 73311 73311    73311 73311 73311  73311 73311
## Native             73311 73311 73311    73311 73311 73311  73311 73311
## Asian              73311 73311 73311    73311 73311 73311  73311 73311
## Pacific            73311 73311 73311    73311 73311 73311  73311 73311
## Citizen            74001 74001 74001    73311 73311 73311  73311 73311
## Income             72901 72901 72901    72901 72901 72901  72901 72901
## IncomeErr          72901 72901 72901    72901 72901 72901  72901 72901
## IncomePerCap       73261 73261 73261    73261 73261 73261  73261 73261
## IncomePerCapErr    73261 73261 73261    73261 73261 73261  73261 73261
## Poverty            73166 73166 73166    73166 73166 73166  73166 73166
## ChildPoverty       72883 72883 72883    72883 72883 72883  72883 72883
## Professional       73194 73194 73194    73194 73194 73194  73194 73194
##                 Pacific Citizen Income IncomeErr IncomePerCap
## TotalPop          73311   74001  72901     72901        73261
## Men               73311   74001  72901     72901        73261
## Women             73311   74001  72901     72901        73261
## Hispanic          73311   73311  72901     72901        73261
## White             73311   73311  72901     72901        73261
## Black             73311   73311  72901     72901        73261
## Native            73311   73311  72901     72901        73261
## Asian             73311   73311  72901     72901        73261
## Pacific           73311   73311  72901     72901        73261
## Citizen           73311   74001  72901     72901        73261
## Income            72901   72901  72901     72901        72901
## IncomeErr         72901   72901  72901     72901        72901
## IncomePerCap      73261   73261  72901     72901        73261
## IncomePerCapErr   73261   73261  72901     72901        73261
## Poverty           73166   73166  72901     72901        73120
## ChildPoverty      72883   72883  72748     72748        72874
## Professional      73194   73194  72899     72899        73156
##                 IncomePerCapErr Poverty ChildPoverty Professional
## TotalPop                  73261   73166        72883        73194
## Men                       73261   73166        72883        73194
## Women                     73261   73166        72883        73194
## Hispanic                  73261   73166        72883        73194
## White                     73261   73166        72883        73194
## Black                     73261   73166        72883        73194
## Native                    73261   73166        72883        73194
## Asian                     73261   73166        72883        73194
## Pacific                   73261   73166        72883        73194
## Citizen                   73261   73166        72883        73194
## Income                    72901   72901        72748        72899
## IncomeErr                 72901   72901        72748        72899
## IncomePerCap              73261   73120        72874        73156
## IncomePerCapErr           73261   73120        72874        73156
## Poverty                   73120   73166        72883        73144
## ChildPoverty              72874   72883        72883        72880
## Professional              73156   73144        72880        73194
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##                 TotalPop  Men Women Hispanic White Black Native Asian
## TotalPop            0.00 0.00  0.00        0     0     0   0.00     0
## Men                 0.00 0.00  0.00        0     0     0   0.00     0
## Women               0.00 0.00  0.00        0     0     0   0.00     0
## Hispanic            0.00 0.00  0.00        0     0     0   0.00     0
## White               0.00 0.00  0.00        0     0     0   0.00     0
## Black               0.00 0.00  0.00        0     0     0   0.00     0
## Native              0.00 0.00  0.00        0     0     0   0.00     0
## Asian               0.00 0.00  0.00        0     0     0   0.00     0
## Pacific             0.00 0.00  0.00        0     0     0   0.02     0
## Citizen             0.00 0.00  0.00        0     0     0   0.00     0
## Income              0.00 0.00  0.00        0     0     0   0.00     0
## IncomeErr           0.06 0.23  0.01        0     0     0   0.00     0
## IncomePerCap        0.00 0.00  0.00        0     0     0   0.00     0
## IncomePerCapErr     0.00 0.00  0.00        0     0     0   0.00     0
## Poverty             0.00 0.00  0.00        0     0     0   0.00     0
## ChildPoverty        0.00 0.00  0.00        0     0     0   0.00     0
## Professional        0.00 0.00  0.00        0     0     0   0.00     0
##                 Pacific Citizen Income IncomeErr IncomePerCap
## TotalPop           0.00    0.00   0.00      0.35            0
## Men                0.00    0.00   0.00      0.69            0
## Women              0.00    0.00   0.00      0.10            0
## Hispanic           0.00    0.00   0.00      0.00            0
## White              0.00    0.00   0.00      0.00            0
## Black              0.00    0.00   0.00      0.00            0
## Native             0.15    0.00   0.00      0.00            0
## Asian              0.00    0.00   0.00      0.00            0
## Pacific            0.00    0.93   0.23      0.01            0
## Citizen            0.46    0.00   0.00      0.40            0
## Income             0.03    0.00   0.00      0.00            0
## IncomeErr          0.00    0.10   0.00      0.00            0
## IncomePerCap       0.00    0.00   0.00      0.00            0
## IncomePerCapErr    0.01    0.00   0.00      0.00            0
## Poverty            0.06    0.00   0.00      0.00            0
## ChildPoverty       0.54    0.00   0.00      0.00            0
## Professional       0.00    0.00   0.00      0.00            0
##                 IncomePerCapErr Poverty ChildPoverty Professional
## TotalPop                   0.00    0.00         0.00            0
## Men                        0.00    0.00         0.00            0
## Women                      0.00    0.00         0.00            0
## Hispanic                   0.00    0.00         0.00            0
## White                      0.00    0.00         0.00            0
## Black                      0.00    0.00         0.00            0
## Native                     0.00    0.00         0.00            0
## Asian                      0.00    0.00         0.00            0
## Pacific                    0.05    0.35         0.93            0
## Citizen                    0.00    0.00         0.00            0
## Income                     0.00    0.00         0.00            0
## IncomeErr                  0.00    0.00         0.00            0
## IncomePerCap               0.00    0.00         0.00            0
## IncomePerCapErr            0.00    0.00         0.00            0
## Poverty                    0.00    0.00         0.00            0
## ChildPoverty               0.00    0.00         0.00            0
## Professional               0.00    0.00         0.00            0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

Visualize your correlation matrix using corrgram

library(corrgram)
## Warning: package 'corrgram' was built under R version 3.3.3
corrgram(US_data2[,4:20],order = TRUE,lower.panel = panel.shade,
         upper.panel = panel.pie,text.panel = panel.txt,
         main="Corrgram of numeric variables in the US Demographic Dataset")

Create a scatter plot matrix for your data set.

scatterplotMatrix(formula=~Professional+Service+Office+Construction+Production,
                  cex=0.6,data = US_data2)

T-test between percent of native people and percent of asian people

t.test(US_data2$Native,US_data2$Asian)  
## 
##  Welch Two Sample t-test
## 
## data:  US_data2$Native and US_data2$Asian
## t = 2.9768, df = 1735, p-value = 0.002953
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.2857514 1.3896023
## sample estimates:
## mean of x mean of y 
##  2.435071  1.597394

p-value suggests that there isn’t any significant difference between the percent of native and asian people.

T-test between no.of men and no.of women

t.test(US_data2$Men,US_data2$Women)
## 
##  Welch Two Sample t-test
## 
## data:  US_data2$Men and US_data2$Women
## t = -2.0056, df = 2694, p-value = 0.045
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -166.502622   -1.878683
## sample estimates:
## mean of x mean of y 
##  2021.701  2105.892

p-value suggests that there is a significant difference between the no.of men and women.