There are various saying/implication all across the world that if you have high education then you will earn more money. Is this saying true for everyone all over the world? Will it be true for people living in New York, one of the 50 States in U.S. where about 19 million people lives, and the State that holds New York City\(^2\) where over 8 million people leave, and place where very diverse people lives. Therefore for this project I ask the question: Does education have a role in a persons’ earning, in New York? Does it change if you are a female or if you are not a U.S. Citizen?
As a female New Yorker, who was not a U.S. citizen by birth I want to know the answer to those two question. I believe there are many people in New York who wants to know the answer, especially people who are currently in school/college/Universities. Almost everyone thinks about a well-established live with a good earning/income source, and a comfortable living. However to achieve those goals many people have tried to obtain an education. Education is expensive, for most people in the world, so is it worth the money to get a higher educational attainment? Do only education have a role in our earning or does something like our gender or citizenship status have an effect in it? In this project I will study less than 1% of New Yorkers to see if educational attainment have an effect on total personal earnings.
I obtained the data from: DATA.GOV (“The home of the U.S. Government’s open data”) \(^1\), it can also be obtained from United States Census Bureau \(^5\). I obtained the details about the data and codes for variables “2009 Data Dictionary” and the accuracy of the data from “2009 PUMS Accuracy”\(^6\).
The data was obtained by the American Community Survey (ACS). The data is personal information from people leaving in New York State in 2009. This is an observational study, where New Yorkers were survived (voluntary) about themselves and their life. Here is a description of the data taken from the PDF that comes with the data \(^4\):
“The Public Use Microdata Sample (PUMS) contains a sample of actual responses to the American Community Survey (ACS). The PUMS dataset include variables for nearly every question on the survey… Each record in the file represents a single person… In the person-level file, individuals are organized into households, making possible the study people within the contexts of their families and other household members. The PUMS contain data on approximately one percent of the United States population.”
The original data contain 188,767 cases with 279 variables. Each case in this data set is responses from one person, living in New York in 2009.\(^5\) Since it is too big to work with, I created a subset with all the cases, and only 6 variables. I put the final data set that I will be working with in this project in GitHub. The variables that I choose for this project are: “Citizenship Status”, “Educational Attainment”, “SEX”, and “Total Personal Earnings”. I also included two extra variables “Age” and “Total Personal Income” because I will be discussing them in my conclusion. I put a .zip file with the original data and data dictionary in GitHub, for anyone who wants to explore or study the original data.
| variable | description | type | data dictionary |
|---|---|---|---|
Age |
age of the respondent | numerical, discrete / data as numerical | 0 - under 1 year; 1-99 - ages 1 to 99 |
Citizen_Stat |
citizenship status of the respondent | categorical, variable | 1 - born in U.S.; 2 - born in Puerto Rico and surrounding area; 3- born abroad of American parent(s); 4 - U.S. citizen by naturalization; 5 - Not U.S. citizen. |
Edu_Attainment |
educational attainment of the respondent | categorical, ordinal / data as numerical | N/A - Less than 3 years; 1 - no school; 2-14 - pre-K to 11th grade; 15 - 12th grade no diploma; 16 - regular HS diploma; 17 - GED or equivalent; 18-19 some college; 20 - associate’s degree; 21 - bachelor’s degree; 22-24 beyond a bachelor’s degree. |
SEX |
gender of the respondent | categorical, variable / data as numerical | 1 - male; 2 - female |
Total_Per_Earn |
total person’s earnings of the respondent | numerical, discrete | N/A - less than 15 years old; 0 - no earnings; -9999 - loss of $9999 or more; -1 to -9998 - loss $1 to $9998; 1-9999999 - earn of $1 to $9999999 (all whole numbers) |
Total_Per_Income |
total person’s income of the respondent | numerical, discrete | N/A - less than 15 years old; 0 - no earnings; -9999 - loss of $9999 or more; -1 to -9998 - loss $1 to $9998; 1-9999999 - income of $1 to $9999999 (all whole numbers) |
Here is a look at summary of the data distribution:
| Age | Citizen_Stat | Edu_Attainment | SEX | Total_Per_Earn | Total_Per_Income | |
|---|---|---|---|---|---|---|
| Min. : 0.00 | Min. :1.000 | Min. : 1.00 | Min. :1.000 | Min. : -7400 | Min. : -13200 | |
| 1st Qu.:20.00 | 1st Qu.:1.000 | 1st Qu.:13.00 | 1st Qu.:1.000 | 1st Qu.: 0 | 1st Qu.: 7000 | |
| Median :41.00 | Median :1.000 | Median :17.00 | Median :2.000 | Median : 12000 | Median : 22400 | |
| Mean :40.04 | Mean :1.632 | Mean :15.88 | Mean :1.521 | Mean : 31315 | Mean : 38837 | |
| 3rd Qu.:58.00 | 3rd Qu.:1.000 | 3rd Qu.:20.00 | 3rd Qu.:2.000 | 3rd Qu.: 42100 | 3rd Qu.: 50000 | |
| Max. :94.00 | Max. :5.000 | Max. :24.00 | Max. :2.000 | Max. :957000 | Max. :1225000 | |
| NA | NA | NA’s :6108 | NA | NA’s :35877 | NA’s :33251 |
I created a sub set of data, with the four variables that I will be looking at during this project, Total_Per_Earn, Edu_Attainment, SEX, and Citizen_Stat. I only included the cases where Total_Per_Earn\(>=500\). I excluded the negative and less than 500 earnings because I do not know the details of how the respondent and their family members could have a loss in earnings or have little to no earnings. There could be many factors that could have resulted in those earnings (like: unemployment, child support…), therefore I will only look at the cases where there is a positive earnings of \(>=500\). Here is a summary statistic of the data, when I only include positive earnings.
I main focus will be on the two variables Total_Per_Earn and Edu_Attainment. I will be looking at if there is a relationship between “total personal earnings” and “education attainment”. Then I will see if the relationship between “total personal earnings” and “education attainment” changes if an individual is a male or a female living in New York. Lastly I will look to see if there is a relationship between “total personal earnings” and “education attainment” if an individual is a U.S. citizen vs. non-U.S. citizen. Therefore I will be performing three hypothesis tests to the data. The response variable in this project will be “Total Personal Earnings”, which is a numerical variable. The explanatory variables in this project are “Educational Attainment”, “Citizenship Status” and “SEX” the variables are categorical, but also can be numerical.
The reason I wish to look at the four variables mention above is to get an understanding of the relationship between “education attainment” and “total personal earning”. If there is a relationship between the two variables then we can say that there is chance of it being true for all people leaving in New York. Also, I want to see that if a person is a female or a male, with the same “education attainment” will their “total personal earning” change. Also will it change if the person is a U.S. citizen vs non-U.S. Citizen, with the same “education attainment” will their “total personal earning”. By the end of this project we will be able have an understanding about all people leaving in New York, that whether their gender or citizenship status affect the relationship between their “total personal earning” and “education attainment”. There might be other factors that can contribute to any links between these variables (like type of work, hours of work…). However for this project we will exclude those factors and only focus on the four variables.
Let’s look at the summary statistics of Citizen_Stat, Edu_Attainment and SEX, where Total_Per_Earn is greater then or equal to 500:
| vars | n | mean | sd | median | trimmed | mad | min | max | range | skew | kurtosis | se | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Citizen_Stat | 1 | 97349 | 1.75 | 1.42 | 1 | 1.45 | 0.00 | 1 | 5 | 4 | 1.46 | 0.32 | 0.00 |
| Edu_Attainment | 2 | 97349 | 18.60 | 3.34 | 19 | 18.93 | 2.97 | 1 | 24 | 23 | -1.58 | 4.98 | 0.01 |
| SEX | 3 | 97349 | 1.49 | 0.50 | 1 | 1.49 | 0.00 | 1 | 2 | 1 | 0.02 | -2.00 | 0.00 |
| Total_Per_Earn | 4 | 97349 | 49189.42 | 69351.07 | 33000 | 37253.27 | 31134.60 | 500 | 957000 | 956500 | 5.35 | 39.03 | 222.27 |
Next let’s study each of the variables using visualization. I created five graphs below, graph 1 and graph 2 are histograms about total personal earning. graph 1 is total personal earning to the nearest dollars, the distribution looks to be extremely skewed to the left and graph 2 is the log of total personal earning which is a little right skewed. Graph 3 and graph 4 shows the distribution of education attainment, first one as numerical and the second one as categorical with 5 levels. We see in Graph3 that most respondents in this survey have as least a high school diploma or above (number 16 or above). Graph 4 shows us that out of the 5 levels, there are many of the respondents have an associate or bachelor degree, and least number of respondents have no diploma. Graph 5 shows the distribution of gender in the dataset. There seems to be about the same number of male respondents as female respondents. Graph 6 shows the distribution of citizenship status, as two levels, U.S. Citizen and not U.S. Citizen. We see that most of the respondents are U.S. citizen and only a few are non-U.S. citizen.
Now that we have looked at the distribution for each of the variables, we will visualize the relationship between the variables. Graph 7 shows the relationship between educational attainment and total personal earning. We see that the best line to describe the relationship between these two variables seems to be exponential. Graph 8 portrays the distribution of log of total personal earnings by two levels: “high school diploma or below” and “associate degree or above”.
Graph 9a and graph 9b portrays the distribution of log of total personal earnings by five levels. Looking at the histogram and the box plots, we see that as the educational attainment goes from “no diploma” to “more than bachelor degree”, the center and the spread of log(Total_Per_Earn) moves from below 10 to above 10. If we look at table 3 below we see that the mean total personal earnings is different for the five categories of education attainment. We see that the mean increases as the education attainment goes from “no diploma” to “more than bachelor degree” as it is shown by the visualization.
| group1 | vars | n | mean | sd | median | min | max | se |
|---|---|---|---|---|---|---|---|---|
| a. no diploma | 1 | 9362 | 21280.27 | 27367.55 | 15400 | 500 | 591000 | 282.85 |
| b. HS diploma or equivalent | 1 | 23512 | 30863.62 | 31400.45 | 25000 | 500 | 957000 | 204.78 |
| c. some college, no degree | 1 | 19485 | 33868.34 | 38915.51 | 25000 | 500 | 691000 | 278.79 |
| d. associate/bachelor degree | 1 | 29547 | 57898.03 | 70645.20 | 43000 | 500 | 957000 | 410.98 |
| e. higher than bachelor degree | 1 | 15443 | 96678.85 | 114998.47 | 65000 | 500 | 957000 | 925.39 |
Graph 10 and graph 11 portrays the data distribution of the relationship between educational attainment and log of total personal earning, by gender and citizenship status respectively. These two graphs shows that that the center of the data for the different gender and citizenship status, are different for each of the 5 levels. Graph 12 is a graph that shows the relationship between the four variables: log(Total_Per_Earn), Edu_Attainment, SEX, and Citizen_Stat. We see in this graphs that as the levels of educational attainment goes from “no diploma” to “more than bachelor degree”, the center and spread of log(Total_Per_Earn) goes higher no matter the gender or the citizenship status.
Table 4 shows the relationship between earnings and education attainment, by gender and citizenship status. If we look at the summary statistics for the relationship between the variables below we see that they change as the education attainment goes from “no diploma” to “more than bachelor degree”. For example if we look at the first 5 rows of the table which holds the information about the respondents who were non-U.S. citizen female, we see that the means of the total earning increases from 16 thousand dollars to 64 thousand dollars as there is higher the educational attainment. I think this shows there is a high probability that there is a relationship between educational attainment and total personal earning, even if a person is a female and/or a non-U.S. citizen.
| group1 | group2 | group3 | vars | n | mean | sd | median | min | max | se |
|---|---|---|---|---|---|---|---|---|---|---|
| a. no diploma | non-U.S. citizen | female | 1 | 874 | 16462.73 | 12911.96 | 14000 | 500 | 114000 | 436.75 |
| b. HS diploma or equivalent | non-U.S. citizen | female | 1 | 929 | 19751.25 | 16546.79 | 15800 | 500 | 250000 | 542.88 |
| c. some college, no degree | non-U.S. citizen | female | 1 | 545 | 23384.77 | 40409.17 | 15000 | 590 | 591000 | 1730.94 |
| d. associate/bachelor degree | non-U.S. citizen | female | 1 | 880 | 43455.14 | 60890.46 | 30000 | 500 | 591000 | 2052.62 |
| e. higher than bachelor degree | non-U.S. citizen | female | 1 | 495 | 65207.64 | 77236.13 | 50000 | 600 | 591000 | 3471.51 |
| a. no diploma | U.S. citizen | female | 1 | 2961 | 15624.69 | 19720.74 | 11000 | 500 | 591000 | 362.41 |
| b. HS diploma or equivalent | U.S. citizen | female | 1 | 9834 | 24876.77 | 24242.66 | 20200 | 500 | 591000 | 244.46 |
| c. some college, no degree | U.S. citizen | female | 1 | 9195 | 26874.19 | 27357.98 | 21000 | 500 | 591000 | 285.30 |
| d. associate/bachelor degree | U.S. citizen | female | 1 | 14651 | 45391.87 | 48045.36 | 37000 | 500 | 957000 | 396.93 |
| e. higher than bachelor degree | U.S. citizen | female | 1 | 7741 | 70227.04 | 70520.44 | 57000 | 500 | 957000 | 801.52 |
| a. no diploma | non-U.S. citizen | male | 1 | 1554 | 23548.29 | 22917.65 | 19000 | 500 | 366000 | 581.36 |
| b. HS diploma or equivalent | non-U.S. citizen | male | 1 | 1354 | 30705.91 | 36383.76 | 25000 | 600 | 591000 | 988.78 |
| c. some college, no degree | non-U.S. citizen | male | 1 | 633 | 31008.71 | 31304.95 | 24600 | 500 | 366000 | 1244.26 |
| d. associate/bachelor degree | non-U.S. citizen | male | 1 | 978 | 60728.99 | 85770.97 | 38000 | 500 | 591000 | 2742.65 |
| e. higher than bachelor degree | non-U.S. citizen | male | 1 | 651 | 116441.38 | 143115.49 | 72000 | 500 | 957000 | 5609.14 |
| a. no diploma | U.S. citizen | male | 1 | 3973 | 25667.94 | 34396.61 | 18400 | 500 | 591000 | 545.70 |
| b. HS diploma or equivalent | U.S. citizen | male | 1 | 11395 | 36955.04 | 35633.46 | 30900 | 500 | 957000 | 333.81 |
| c. some college, no degree | U.S. citizen | male | 1 | 9112 | 41751.89 | 46895.41 | 32500 | 500 | 691000 | 491.27 |
| d. associate/bachelor degree | U.S. citizen | male | 1 | 13038 | 72713.87 | 86633.65 | 52000 | 500 | 957000 | 758.72 |
| e. higher than bachelor degree | U.S. citizen | male | 1 | 6556 | 128325.62 | 144429.16 | 82000 | 500 | 957000 | 1783.76 |
In this section we will look at the interaction between the variables Total_Per_Earn, Edu_Attainment, SEX, and Citizen_Stat. We will see if there is a relationship between educational attainment and total personal earning for people leaving in New York State. Then we will see if the gender plays a role in creating a relationship between educational attainment and total personal earning. Lastly, we will look at citizenship statues as factor which might or might not create a link between educational attainment and total personal earning. Statistical inference is the theory, methods, and practice of understanding the quality of parameter estimates or estimating the population mean. Here we will be using various methods to perform statistical inference, like hypothesis test, p-value test, ANOVA test, confidence interval and regression model.
Primarily we will look at Total_Per_Earn and Edu_Attainment by conduct the first hypothesis test, where Edu_Attainment is categorical by two levels. Before we do any analysis let’s check if the conditions necessary for inference are satisfied. I will create a check list to the conditions are meet:
table 4 we see that n for both variables and their levels are more \(n\ge{30}\).graph 8.Since conditions necessary for inference are satisfied, I will perform hypothesis test. During hypothesis test I will start with two hypothesis the null hypothesis (H0), and the alternative hypothesis (HA). I will not reject the null hypothesis, unless the evidence in favor of the alternative hypothesis is very strong, because if I do not do that there is a high probability that I would perform Type 1 Error. Type 1 Error is when someone rejects the null hypothesis when it is true.\(^3\) Here is the first hypothesis test:
\(H_0\): There is no difference between the average total personal earnings respondents, who have an educational attainment of HS diploma or lower and associate degree or higher.
\[\mu_{diff}=0\]
\(H_A\): There is a difference between the average total personal earnings respondents, who have an educational attainment of HS diploma or lower and associate degree or higher.
\[\mu_{diff}\ne0\]
Here is result of hypothesis test and confidence interval:
inference(y=Cat_Ed_2level$Total_Per_Earn, x= Cat_Ed_2level$Edu_Attainment, est = "mean", type = "ht", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_associate degree or above = 46183, mean_associate degree or above = 69931.17, sd_associate degree or above = 89542.75
## n_high school diploma or below = 51166, mean_high school diploma or below = 30467.68, sd_high school diploma or below = 34277.59
## Observed difference between means (associate degree or above-high school diploma or below) = 39463.49
## H0: mu_associate degree or above - mu_high school diploma or below = 0
## HA: mu_associate degree or above - mu_high school diploma or below != 0
## Standard error = 443.368
## Test statistic: Z = 89.008
## p-value = 0
inference(y=Cat_Ed_2level$Total_Per_Earn, x= Cat_Ed_2level$Edu_Attainment, est = "mean", type = "ci", null = 0, alternative = "twosided", method = "theoretical")
## Response variable: numerical, Explanatory variable: categorical
## Difference between two means
## Summary statistics:
## n_associate degree or above = 46183, mean_associate degree or above = 69931.17, sd_associate degree or above = 89542.75
## n_high school diploma or below = 51166, mean_high school diploma or below = 30467.68, sd_high school diploma or below = 34277.59
## Observed difference between means (associate degree or above-high school diploma or below) = 39463.49
## Standard error = 443.368
## 95 % Confidence interval = ( 38594.5079 , 40332.4783 )
Looking at the results above we see that on the hypothesis test the \(p-value\) is \(0\) which is less than \(0.05\), meaning that we can reject the null hypothesis. On the confidence interval test we see that the we are 95% confident that that the difference between average total personal earnings of respondents who have a education attainment of associate degree or higher vs respondents who have an education attainment of HS diploma or lower is between \(\$38594.5\) and \(\$40332.5\). A confidence interval is the most likely range of values for the population parameter in my case is people living in New York State.
By looking at graph 2 and graph 7 I noticed that it is better transform the total personal earnings in the format of log\(_{10}\), then keeping them to the nearest dollars, therefore from here on I will log total personal earning. Next I will conduct a hypothesis test using ANOVA, where Edu_Attainment is categorical, by five levels. Analysis of Variance (ANOVA) model, is a model which uses F test statistic, which is the ratio of the between group and within group variability. The ANOVA uses a single hypothesis test to check whether the means across many groups are equal.\(^3\) The conditions necessary for ANOVA test are satisfied as we see below:
table 4 we see that n for both variables and their levels are more \(n\ge{30}\).graph 9a.graph 9b, we see that the variability of the five levels is roughly constant.Since conditions necessary for ANOVA are satisfied, I will perform hypothesis test:
\(H_0\): There is no difference between the average total personal earnings respondents, between the 5 educational attainment levels.
\[\mu_{a}=\mu_{b}=\mu_{c}=\mu_{d}=\mu_{e}\]
\(H_A\): At least one of the average total personal earnings respondents, between the 5 educational attainment levels is different.
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| Edu_Attainment | 4 | 4269.535 | 1067.3836281 | 4501.907 | 0 |
| Residuals | 97344 | 23079.860 | 0.2370959 | NA | NA |
Looking at the ANOVA test above we see that F-Value is large \(>4000\). Since the F-value is so big the p-value is 0, which means we can reject the null hypothesis. The p-value is “the probability of observing data at least as favorable to the alternative hypothesis as our current data set, if the null hypothesis is true.”\(^3\) This test shows us that at least one of the mean of the total personal earning is different.
Next we will find the linear regression, the best fit line to best represent the relationship between educational attainment (in numerical form from 1 to 25) and log\(_{10}\) of total personal earnings. For the linear regression we use the least squares line to best represent the data, since it is easier to do, it is commonly used and in many cases it shows a residual twice as large as another which is usually more than twice as bad. The strength of the fit of a linear model is most commonly evaluated using \(R^2\), which tells us what percent of variability in the response variable is explained by the model. Lets check the condition for the least squares line:
Linearity: Looking at the residual plot below we see that the points are mostly scattered around \(y=0\), with a pattern. This means that the relationship between ‘Edu_Attainment` and ’Total_Per_Earn’ is not linear.
Nearly normal residuals: Looking at the histogram and the normal probability plot below we see that residuals are nearly normal.
Constant variability: Looking at the plots below and the below, I can say that the variability of points around the least squares line is __ not constant__.
Since the condition test fail for linear regression, the least squares line, I will not analysis the graph or the correlation of the line. This shows that the relationship between educational attainment (in numerical form from 1 to 25) and log\(_{10}\) of total personal earnings is not linear. I also tried original data without any transformation but that does not work either.
cor(Sub_Per_Earn$Edu_Attainment, log10(Sub_Per_Earn$Total_Per_Earn))
## [1] 0.3315994
Next I will perform ANOVA test to see if there is a relationship between education attainment and total personal earnings, if we put gender as a second explanatory variable. Let’s check the conditions necessary for ANOVA test below:
table 4 we see that n for both variables and their levels are more \(n\ge{30}\).graph 9, we see that the variability of both genders are roughly constant across five levels.Since conditions necessary for ANOVA are satisfied, I will perform hypothesis test:
\(H_0\): There is no difference between the average total personal earnings respondents, between the 5 educational attainment levels, by gender.
\(H_A\): At least one of the average total personal earnings respondents is different.
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| Edu_Attainment | 4 | 4269.5345 | 1067.3836281 | 4661.025 | 0 |
| SEX | 1 | 788.1268 | 788.1268342 | 3441.573 | 0 |
| Residuals | 97343 | 22291.7333 | 0.2290019 | NA | NA |
Looking at the ANOVA test above we see that F-Value is large \(>4000\) for Edu_Attainment and large \(>3000\) for SEX. Since the F-value is so big the p-value is 0, which means I can reject my null hypothesis.
Next I will perform another ANOVA test to see if there is a relationship between education attainment and total personal earnings, if we put citizenship status as a secondary explanatory variable. Let’s check the conditions necessary for ANOVA test below:
table 4 we see that n for both variables and their levels are more \(n\ge{30}\).graph 10, we see that the variability of the variables are roughly constant.Since conditions necessary for ANOVA are satisfied, I will perform hypothesis test:
\(H_0\): There is no difference between the average total personal earnings respondents, between the 5 educational attainment levels, by citizenship status.
\(H_A\): At least one of the average total personal earnings respondents is different.
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| Edu_Attainment | 4 | 4269.53451 | 1067.3836281 | 4502.141213 | 0.0000000 |
| Citizen_Stat | 1 | 1.43595 | 1.4359504 | 6.056727 | 0.0138551 |
| Residuals | 97343 | 23078.42415 | 0.2370836 | NA | NA |
Looking at the ANOVA test above we see that F-Value is large \(>4000\) for Edu_Attainment and large \(>6\) for Citizen_Stat. Since the F-value is so big for Edu_Attainment the p-value is 0, and for Citizen_Stat the p-value is \(0.014\) which is less \(<0.05\) then I can reject my null hypothesis.
After completing various statistical analysis on the data, I come to the conclusion that in New York State there is a difference between an individual’s total earnings and the highest degree of education an individual has completed. This project also showed that if the individual is female or male also affects the difference between their earning and educational attainment. Lastly this project showed that citizenship status plays a role in the difference between their earning and educational attainment. However, I would like to look at data for the surveys for other years and states to see if my conclusion holds true all individual, no matter the time period or the state.\(^5\)
During this project I learned a lot about my data. I found out that there were a lot of respondents who have 0 to negative earnings. I learned that negative earnings means loss of money, also that some people earn as small as <10 dollars. I also learned was surprised to know that there were some respondents who had little to no education. I learned there were many respondents who went to high school however did not get their high school degree. I learned that in New York State about equal number both gender seem to have an earning greater than 499 dollars. One of the most surprising thing about the data was that the range for respondents’ age was from \(0\) to \(94\). I really would like to know more about this data and how they were obtained (was it simple random sample, cluster sample or others).
There is room for a lot of future research, with this data. I only analyzes a little bit of it. In the future we can look at if education have a role in income. We can look at is there a relationship between age and earnings/income. Does income and earnings have a liner relationship? There is so much more we can do to study this data set. The possibilities are endless especially with the original 2009 survey data.\(^4\)
2009 American Community Survey 1-Year PUMS Housing File. (2015, May 20). Retrieved March 1, 2016, from https://catalog.data.gov/dataset/2009-american-community-survey-1-year-pums-housing-file.
2010 Census Interactive Population Map. (n.d.). Retrieved May 1, 2016, from https://www.census.gov/2010census/popmap/.
Diez, D. M., Barr, C. D., & Cetinkaya-Rundel, M. (2012). OpenIntro statistics. Lexington, KY: CreateSpace. Can be downloaded from https://www.openintro.org/stat/textbook.php?stat_book=os
Hossain, N. (March 2016). GitHub. from https://github.com/nabilahossain/Class-DATA606/tree/master/Project
PUMS Data 2000 - current. (2016, January 15). Retrieved May 21, 2016, from http://www.census.gov/programs-surveys/acs/data/pums.html It is the link: “2009 ACS 1-year PUMS”.
PUMS Technical Documentation. (2015, October 14). Retrieved March 1, 2016, from https://www.census.gov/programs-surveys/acs/technical-documentation/pums/documentation.2009.html The two links/pdf that I used are: “2009 Data Dictionary” and “2009 PUMS Accuracy”
Ross, S. (2015). What is the difference between earnings and income? | Investopedia. Retrieved May 1, 2016, from http://www.investopedia.com/ask/answers/070615/what-difference-between-earnings-and-income.asp