The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work.
Your task is to:
- Choose any three of the “wide” datasets identified in the Week 5 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 5 assignment!)
For each of the three chosen datasets:
Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below.
Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!]
Perform the analysis requested in the discussion item.
Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.
- Please include in your homework submission, for each of the three chosen datasets:
The URL to the .Rmd file in your GitHub repository, and The URL for your rpubs.com web page.
PROCEDURE
Library definitions
library(knitr)
library(gdata)
library(stringr)
library(tidyr)
library(dplyr)
library(zipcode)
library(ggplot2)
OASDI NY State Only.
Dataset url location:
url: https://www.ssa.gov/policy/docs/statcomps/
I will be exploring the OASDI Beneficiaries by State and ZIP Code, 2015.
This annual publication focuses on the Social Security beneficiary population at the ZIP Code level. It presents basic program data on the number and type of beneficiaries and the amount of benefits paid in each state, Social Security Administration field office, and ZIP Code. It also shows the number of beneficiaries aged 65 or older.
This annual publication focuses on the Social Security beneficiary population-people receiving Old-Age, Survivors, and Disability Insurance (OASDI) benefits-at the ZIP Code level. It presents basic program data on the number and type of beneficiaries and the amount of benefits paid in each state, Social Security Administration field office, and ZIP Code. It also shows the number of men and women aged 65 or older receiving benefits. The data include only persons whose benefits are currently payable. Those whose benefits were withheld are excluded.
Cherice Jefferies in the Office of Statistical Analysis and Support programmed and compiled the data for this report. Staff of the Office of Information Resources edited the report and prepared it for web publication.
This is a complete Dataset from the federal government website managed by the Social Security Administration.
Last Updated:
This is a complete set of all data for 2015.
Date: October 2016.
Data Provided by:
Office of Retirement and Disability Policy. Office of Research, Evaluation, and Statistics.
Dataset Owner:
Social Security Administration (SSA) and the Government of the United States of America.
Dictionary
This dataset does not seem to have a dictionary. The download link is for a .xlsx file containing the desired data. However there’s a .pdf file describing the data.
Filename: oasdi_zip15.xlsx
For simplicity reasons, I will read the raw data directly from the source.
URL and Raw data name and location definitions:
url <- "https://www.ssa.gov/policy/docs/statcomps/oasdi_zip/2015/"
xlsxfile <- "oasdi_zip15.xlsx"
rm(xlsxfile)
I tried loading the data from the original location as raw as possible from the .XLSX file but found several problems trying to read it. I ended up unmerging the cells by opening the excel file for the State of New York sheet and saving as a .csv file.
For reproducibility purposes I have uploaded the untoched .csv file onto my GitHub repository.
url <- "https://raw.githubusercontent.com/dvillalobos/MSDA/master/607/Projects/Project2/"
csvfile <- "oasdi_zip15.csv"
Function to download .csv file, and extract information from it
downloadCSV <- function(myurl, mycsvfile){
myurl <- paste(myurl,mycsvfile, sep="")
my.data <- read.csv(myurl, header=FALSE, stringsAsFactors =FALSE )
head(my.data)
return(my.data)
}
Imported file structure display
my.data <- downloadCSV(myurl= url, mycsvfile= csvfile)
kable(head(my.data))
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
New York | ||||||||||||
Number of beneficiaries with benefits in current-payment status and total monthly benefits, by field office and ZIP Code, December 2015 | ||||||||||||
Field office and ZIP Code | Number | Total monthly benefits | ||||||||||
(thousands of dollars) Number of OASDI beneficiaries aged 65 or older | ||||||||||||
Total | Retired workers | Disabled workers | Widow(er)s and parents | Spouses | Children | All beneficiaries | Retired workers | Widow(er)s and parents | ||||
All areas a | 59,963,425 | 40,089,061 | 8,909,430 | 4,190,676 | 2,477,567 | 4,296,691 | 73,642,029 | 53,790,278 | 5,194,659 | 43,243,199 | ||
Total, New York | 3,513,125 | 2,388,547 | 510,196 | 222,704 | 145,297 | 246,381 | 4,498,413 | 3,341,923 | 289,049 | 2,576,899 |
In summary, this data needs to be cleaned up.
Data transformation
Now that I have the data frame I will transform it in order to create some possible outcomes from the given information; for this, I will start by excluding small portion of it.
Excluding Information:
Excluding top and bottom unwanted Rows:
This procedure will exclude the unwanted information contained in the first six rows, then I will exclude the information contained in at the bottom of the file 1995 to 2002 becoming from 1990 to the end in the new data frame.
my.new.data <- my.data[-c(1:6), ]
my.new.data <-my.new.data[-c(1989:3003), ]
V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | Albany | 54,020 | 37,555 | 8,015 | 3,175 | 1,575 | 3,700 | 71,518 | 54,461 | 4,301 | 39,255 | ||
8 | 12007 | 50 | 40 | 10 | 0 | 0 | 0 | 74 | 58 | 0 | 40 | ||
9 | 12009 | 1,520 | 1,145 | 175 | 75 | 65 | 60 | 2,095 | 1,669 | 105 | 1,175 | ||
10 | 12023 | 490 | 340 | 60 | 35 | 20 | 35 | 630 | 470 | 46 | 350 | ||
11 | 12024 | 40 | 25 | 10 | 5 | 0 | 0 | 58 | 41 | 7 | 30 | ||
12 | 12033 | 1,790 | 1,390 | 185 | 105 | 45 | 65 | 2,458 | 1,996 | 142 | 1,415 |
Exclude unwanted V3 column:
str(my.new.data)
## 'data.frame': 1988 obs. of 13 variables:
## $ V1 : chr "Albany" "" "" "" ...
## $ V2 : chr "" "12007" "12009" "12023" ...
## $ V3 : chr "" "" "" "" ...
## $ V4 : chr "54,020" "50" "1,520" "490" ...
## $ V5 : chr "37,555" "40" "1,145" "340" ...
## $ V6 : chr "8,015" "10" "175" "60" ...
## $ V7 : chr "3,175" "0" "75" "35" ...
## $ V8 : chr "1,575" "0" "65" "20" ...
## $ V9 : chr "3,700" "0" "60" "35" ...
## $ V10: chr "71,518" "74" "2,095" "630" ...
## $ V11: chr "54,461" "58" "1,669" "470" ...
## $ V12: chr "4,301" "0" "105" "46" ...
## $ V13: chr "39,255" "40" "1,175" "350" ...
my.new.data <- my.new.data %>% subset(select=-c(V3))
str(my.new.data)
## 'data.frame': 1988 obs. of 12 variables:
## $ V1 : chr "Albany" "" "" "" ...
## $ V2 : chr "" "12007" "12009" "12023" ...
## $ V4 : chr "54,020" "50" "1,520" "490" ...
## $ V5 : chr "37,555" "40" "1,145" "340" ...
## $ V6 : chr "8,015" "10" "175" "60" ...
## $ V7 : chr "3,175" "0" "75" "35" ...
## $ V8 : chr "1,575" "0" "65" "20" ...
## $ V9 : chr "3,700" "0" "60" "35" ...
## $ V10: chr "71,518" "74" "2,095" "630" ...
## $ V11: chr "54,461" "58" "1,669" "470" ...
## $ V12: chr "4,301" "0" "105" "46" ...
## $ V13: chr "39,255" "40" "1,175" "350" ...
V1 | V2 | V4 | V5 | V6 | V7 | V8 | V9 | V10 | V11 | V12 | V13 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | Albany | 54,020 | 37,555 | 8,015 | 3,175 | 1,575 | 3,700 | 71,518 | 54,461 | 4,301 | 39,255 | |
8 | 12007 | 50 | 40 | 10 | 0 | 0 | 0 | 74 | 58 | 0 | 40 | |
9 | 12009 | 1,520 | 1,145 | 175 | 75 | 65 | 60 | 2,095 | 1,669 | 105 | 1,175 | |
10 | 12023 | 490 | 340 | 60 | 35 | 20 | 35 | 630 | 470 | 46 | 350 | |
11 | 12024 | 40 | 25 | 10 | 5 | 0 | 0 | 58 | 41 | 7 | 30 | |
12 | 12033 | 1,790 | 1,390 | 185 | 105 | 45 | 65 | 2,458 | 1,996 | 142 | 1,415 |
Renaming Columns
names(my.new.data) <- c("County","Zipcode","n Total", "n Retired", "n Disabled", "n Widow & Parents", "n Spouses", "n Children", "$ All Beneficiaries", "$ Retired", "$ Widow & Parents", "n OASDI 65 & older")
County | Zipcode | n Total | n Retired | n Disabled | n Widow & Parents | n Spouses | n Children | $ All Beneficiaries | $ Retired | $ Widow & Parents | n OASDI 65 & older | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | Albany | 54,020 | 37,555 | 8,015 | 3,175 | 1,575 | 3,700 | 71,518 | 54,461 | 4,301 | 39,255 | |
8 | 12007 | 50 | 40 | 10 | 0 | 0 | 0 | 74 | 58 | 0 | 40 | |
9 | 12009 | 1,520 | 1,145 | 175 | 75 | 65 | 60 | 2,095 | 1,669 | 105 | 1,175 | |
10 | 12023 | 490 | 340 | 60 | 35 | 20 | 35 | 630 | 470 | 46 | 350 | |
11 | 12024 | 40 | 25 | 10 | 5 | 0 | 0 | 58 | 41 | 7 | 30 | |
12 | 12033 | 1,790 | 1,390 | 185 | 105 | 45 | 65 | 2,458 | 1,996 | 142 | 1,415 |
Need to split data into 2 data frames
- Zip Code Table which is going to include:
Data for Numbers
Data for Monthly Benefits
- County Table which is going to include:
Data for Numbers
Data for Monthly Benefits
Separate results:
First I will separate County Summary data from zip code data.
# Creating a County Data Frame
my.new.data$County <- str_replace_all(my.new.data$County," ","")
my.county.data <- my.new.data %>% subset(County != "")
rownames(my.county.data) <- NULL
County Summary Table.
County | Zipcode | n Total | n Retired | n Disabled | n Widow & Parents | n Spouses | n Children | $ All Beneficiaries | $ Retired | $ Widow & Parents | n OASDI 65 & older |
---|---|---|---|---|---|---|---|---|---|---|---|
Albany | 54,020 | 37,555 | 8,015 | 3,175 | 1,575 | 3,700 | 71,518 | 54,461 | 4,301 | 39,255 | |
Babylon | 36,020 | 23,460 | 5,905 | 2,530 | 1,380 | 2,745 | 48,936 | 34,249 | 3,571 | 25,630 | |
Batavia | 43,525 | 29,635 | 6,980 | 2,750 | 1,395 | 2,765 | 54,743 | 40,389 | 3,547 | 30,755 | |
Binghamton | 68,620 | 45,655 | 11,070 | 4,305 | 2,325 | 5,265 | 83,909 | 61,674 | 5,493 | 48,415 | |
Bronx,East | 47,275 | 29,245 | 8,655 | 3,260 | 2,075 | 4,040 | 54,612 | 37,140 | 3,693 | 32,220 | |
Bronx,HuntsPoint | 7,525 | 3,895 | 1,810 | 515 | 345 | 960 | 6,528 | 3,753 | 456 | 4,300 | |
Bronx,LaconiaAvenue | 44,445 | 29,245 | 7,825 | 2,295 | 1,420 | 3,660 | 52,091 | 37,586 | 2,547 | 31,190 | |
Bronx,North | 30,275 | 17,235 | 7,005 | 1,810 | 1,315 | 2,910 | 30,207 | 19,638 | 1,830 | 18,685 | |
Bronx,South | 36,405 | 19,000 | 8,840 | 2,495 | 1,765 | 4,305 | 31,846 | 18,607 | 2,217 | 21,295 | |
Bronx,WestFarms | 15,550 | 7,930 | 4,055 | 1,010 | 665 | 1,890 | 13,631 | 7,805 | 936 | 8,790 | |
Brooklyn,BedfordHeights | 36,995 | 24,440 | 6,500 | 2,040 | 1,055 | 2,960 | 40,677 | 29,266 | 2,118 | 25,845 | |
Brooklyn,BoroHall | 69,915 | 46,875 | 8,960 | 5,020 | 4,145 | 4,915 | 81,060 | 59,536 | 5,751 | 52,305 | |
Brooklyn,Bushwick | 29,115 | 16,940 | 5,785 | 2,160 | 1,650 | 2,580 | 26,963 | 17,039 | 2,001 | 19,210 | |
Brooklyn,Canarsie | 28,570 | 18,040 | 5,175 | 1,645 | 950 | 2,760 | 30,478 | 21,173 | 1,695 | 19,140 | |
Brooklyn,CypressHills | 54,775 | 34,110 | 9,660 | 3,460 | 2,880 | 4,665 | 58,221 | 39,805 | 3,734 | 37,380 | |
Brooklyn,Flatbush | 68,195 | 48,450 | 8,695 | 3,575 | 3,040 | 4,435 | 81,623 | 62,552 | 4,247 | 51,745 | |
Brooklyn,NewUtrecht | 71,155 | 49,090 | 9,255 | 4,680 | 4,550 | 3,580 | 76,385 | 55,917 | 5,484 | 54,545 | |
Buffalo | 108,520 | 69,925 | 18,335 | 8,005 | 3,670 | 8,585 | 138,299 | 99,446 | 10,549 | 75,815 | |
Corning | 26,640 | 17,415 | 4,600 | 1,770 | 925 | 1,930 | 32,019 | 22,969 | 2,148 | 18,245 | |
Dunkirk | 13,310 | 8,665 | 2,370 | 925 | 455 | 895 | 16,074 | 11,489 | 1,181 | 9,055 | |
Elmira | 28,885 | 18,585 | 5,375 | 1,875 | 910 | 2,140 | 34,713 | 24,719 | 2,309 | 19,670 | |
Flushing | 77,595 | 57,980 | 6,345 | 5,085 | 5,150 | 3,035 | 95,031 | 75,492 | 6,414 | 64,140 | |
Freeport | 121,675 | 86,210 | 14,520 | 8,145 | 5,260 | 7,540 | 176,241 | 133,447 | 11,818 | 94,310 | |
Geneva | 63,290 | 43,880 | 9,240 | 3,475 | 1,790 | 4,905 | 80,288 | 60,580 | 4,525 | 45,235 | |
Gloversville | 20,940 | 13,705 | 3,640 | 1,280 | 550 | 1,765 | 24,537 | 17,657 | 1,553 | 14,125 | |
Hudson | 27,125 | 18,610 | 4,090 | 1,690 | 815 | 1,920 | 34,411 | 25,580 | 2,188 | 19,650 | |
Ithaca | 15,985 | 11,490 | 2,050 | 875 | 595 | 975 | 21,414 | 16,814 | 1,197 | 12,065 | |
Jamaica | 77,395 | 55,295 | 9,960 | 4,055 | 3,000 | 5,085 | 96,590 | 74,257 | 4,965 | 58,845 | |
Jamestown | 20,360 | 13,580 | 3,330 | 1,340 | 645 | 1,465 | 24,334 | 17,917 | 1,646 | 14,290 | |
LongIslandCity | 50,430 | 35,720 | 5,940 | 3,450 | 3,000 | 2,320 | 57,061 | 43,205 | 3,916 | 39,645 | |
Melville | 77,860 | 54,895 | 8,735 | 5,020 | 3,845 | 5,365 | 113,445 | 86,229 | 7,407 | 60,940 | |
Mineola | 129,535 | 95,300 | 11,205 | 8,880 | 7,290 | 6,860 | 195,095 | 153,392 | 13,434 | 106,485 | |
Monticello | 17,775 | 11,395 | 3,085 | 1,070 | 550 | 1,675 | 21,809 | 15,377 | 1,317 | 12,045 | |
NewRochelle | 46,695 | 34,215 | 4,790 | 2,865 | 2,220 | 2,605 | 67,414 | 53,321 | 4,157 | 37,690 | |
NewYorkCity,Downtown | 37,075 | 28,185 | 3,510 | 1,605 | 1,360 | 2,415 | 51,998 | 42,765 | 2,089 | 30,195 | |
NewYorkCity,EastHarlem | 21,930 | 14,500 | 3,270 | 1,600 | 1,060 | 1,500 | 28,409 | 21,305 | 1,931 | 16,400 | |
NewYorkCity,EastVillage | 25,900 | 18,110 | 3,435 | 1,700 | 1,460 | 1,195 | 25,035 | 18,534 | 1,628 | 19,675 | |
NewYorkCity,Midtown | 83,785 | 66,955 | 5,485 | 4,445 | 3,890 | 3,010 | 137,182 | 116,375 | 7,425 | 73,450 | |
NewYorkCity,Uptown | 53,220 | 35,600 | 9,335 | 2,940 | 1,815 | 3,530 | 62,682 | 46,502 | 3,193 | 38,210 | |
NewYorkCity,WashingtonHeights | 43,770 | 30,030 | 6,820 | 2,430 | 2,085 | 2,405 | 46,565 | 34,835 | 2,545 | 32,295 | |
Newburgh | 65,100 | 42,410 | 10,510 | 4,060 | 2,225 | 5,895 | 86,889 | 61,579 | 5,507 | 45,145 | |
NiagaraFalls | 52,475 | 33,495 | 9,435 | 3,910 | 1,935 | 3,700 | 66,728 | 46,907 | 5,200 | 36,055 | |
Ogdensburg | 25,220 | 15,275 | 4,705 | 2,050 | 1,180 | 2,010 | 29,552 | 19,921 | 2,462 | 16,595 | |
Olean | 30,505 | 19,810 | 5,260 | 2,040 | 1,080 | 2,315 | 35,638 | 25,568 | 2,454 | 20,830 | |
Oneonta | 26,535 | 18,300 | 3,720 | 1,675 | 945 | 1,895 | 32,023 | 23,988 | 2,027 | 19,335 | |
Oswego | 30,345 | 18,785 | 5,850 | 2,060 | 1,125 | 2,525 | 37,340 | 25,723 | 2,626 | 19,830 | |
Patchogue | 139,600 | 92,085 | 21,455 | 9,265 | 5,605 | 11,190 | 195,906 | 140,767 | 13,271 | 100,710 | |
Peekskill | 40,585 | 30,710 | 3,550 | 2,385 | 1,755 | 2,185 | 61,997 | 49,766 | 3,652 | 33,300 | |
Plattsburgh | 37,785 | 23,570 | 7,145 | 2,655 | 1,380 | 3,035 | 44,243 | 30,467 | 3,166 | 25,060 | |
Poughkeepsie | 118,085 | 80,280 | 17,455 | 7,195 | 4,130 | 9,025 | 161,599 | 119,069 | 10,071 | 85,770 | |
Queensbury | 45,700 | 30,190 | 7,520 | 3,110 | 1,585 | 3,295 | 56,738 | 41,213 | 3,980 | 31,855 | |
RegoPark | 73,955 | 53,815 | 8,200 | 4,500 | 4,450 | 2,990 | 85,773 | 66,599 | 5,337 | 58,910 | |
RidgeRoad | 94,935 | 63,455 | 14,050 | 7,450 | 3,475 | 6,505 | 123,072 | 89,258 | 10,112 | 68,690 | |
Riverhead | 41,565 | 31,290 | 3,835 | 2,465 | 1,685 | 2,290 | 60,630 | 48,474 | 3,598 | 33,665 | |
Rochester,Downtown | 87,335 | 56,480 | 15,905 | 4,795 | 2,870 | 7,285 | 112,433 | 82,133 | 6,489 | 60,140 | |
Rochester,Greece | 70,080 | 50,400 | 9,185 | 4,120 | 2,335 | 4,040 | 95,769 | 73,580 | 5,821 | 52,425 | |
RockawayPark | 18,660 | 11,350 | 3,665 | 1,135 | 680 | 1,830 | 21,830 | 14,575 | 1,331 | 12,415 | |
Schenectady | 82,075 | 57,055 | 11,330 | 5,165 | 2,825 | 5,700 | 106,721 | 81,082 | 6,909 | 60,015 | |
StatenIsland | 63,730 | 41,000 | 10,060 | 4,455 | 3,125 | 5,090 | 87,833 | 60,587 | 6,138 | 44,925 | |
StatenIsland,HylanBlvd | 25,050 | 15,500 | 4,650 | 1,615 | 1,020 | 2,265 | 31,883 | 21,641 | 2,055 | 16,820 | |
Syracuse | 125,495 | 84,985 | 19,815 | 7,855 | 3,920 | 8,920 | 161,487 | 120,009 | 10,499 | 89,285 | |
Troy | 42,255 | 28,415 | 7,070 | 2,590 | 1,115 | 3,065 | 53,195 | 39,219 | 3,341 | 29,415 | |
Utica | 80,270 | 52,280 | 13,535 | 5,070 | 2,445 | 6,940 | 95,228 | 68,537 | 6,148 | 55,260 | |
Watertown | 27,705 | 17,585 | 4,615 | 2,085 | 1,190 | 2,230 | 32,080 | 22,442 | 2,501 | 18,815 | |
WestNyack | 55,535 | 39,935 | 5,535 | 3,025 | 2,595 | 4,445 | 79,804 | 62,368 | 4,327 | 43,640 | |
WhitePlains | 39,775 | 30,350 | 3,195 | 2,205 | 1,950 | 2,075 | 62,300 | 50,886 | 3,444 | 33,335 | |
Yonkers | 39,910 | 28,285 | 5,085 | 2,495 | 1,740 | 2,305 | 54,832 | 42,200 | 3,398 | 30,990 |
Zip Code Table by employing anti_join() from dplyr function.
my.zipcode.data <- anti_join(my.new.data, my.county.data, by="County")
my.zipcode.data <- my.zipcode.data %>% subset(select=-c(County))
Zip Code table.
Zipcode | n Total | n Retired | n Disabled | n Widow & Parents | n Spouses | n Children | $ All Beneficiaries | $ Retired | $ Widow & Parents | n OASDI 65 & older |
---|---|---|---|---|---|---|---|---|---|---|
12007 | 50 | 40 | 10 | 0 | 0 | 0 | 74 | 58 | 0 | 40 |
12009 | 1,520 | 1,145 | 175 | 75 | 65 | 60 | 2,095 | 1,669 | 105 | 1,175 |
12023 | 490 | 340 | 60 | 35 | 20 | 35 | 630 | 470 | 46 | 350 |
12024 | 40 | 25 | 10 | 5 | 0 | 0 | 58 | 41 | 7 | 30 |
12033 | 1,790 | 1,390 | 185 | 105 | 45 | 65 | 2,458 | 1,996 | 142 | 1,415 |
12041 | 115 | 90 | 10 | 10 | 0 | 5 | 161 | 126 | 12 | 95 |
Data Exploration
From the above table we can explore a few things as follows:
Geographical distribution:
Distribution by Region:
Distribution of OASDI Beneficiaries during 2015 by the zipcode (Region).
# Merge Zipcodes with the zipcode library
USzipCodes <- my.zipcode.data
USzipCodes$Zipcode <- clean.zipcodes(USzipCodes$Zipcode)
data(zipcode)
USzipCodes <- merge(USzipCodes, zipcode, by.x='Zipcode', by.y='zip')
# Creating ggplot of matches ZipCodes
g <- ggplot(data=USzipCodes) + geom_point(aes(x=longitude, y=latitude, colour=1))
# simplify display and limit to the "lower 48"
g <- g + theme_bw() + scale_x_continuous(limits = c(-80,-72), breaks = NULL)
g <- g + scale_y_continuous(limits = c(40,45), breaks = NULL)
g <- g + theme(legend.position="none")
g
Zip codes in which children have not received benefits.
Zipcode | city | state | latitude | longitude |
---|---|---|---|---|
06390 | Fishers Island | NY | 41.26194 | -72.00708 |
10020 | New York | NY | 40.75867 | -73.98024 |
10101 | New York | NY | 40.78075 | -73.97718 |
10107 | New York | NY | 40.76643 | -73.98273 |
10123 | New York | NY | 40.75149 | -73.99054 |
10129 | New York | NY | 40.78075 | -73.97718 |
Geographical distribution.
# Creating ggplot of matches ZipCodes
g <- ggplot(data=USzipCodesChild) + geom_point(aes(x=longitude, y=latitude, colour=3))
# simplify display and limit to the "lower 48"
g <- g + theme_bw() + scale_x_continuous(limits = c(-80,-72), breaks = NULL)
g <- g + scale_y_continuous(limits = c(40,45), breaks = NULL)
g <- g + theme(legend.position="none")
g
Conclusions
This is an interesting analysis and I believe it can play a great role in local discoveries related to OASDI Beneficiaries since it covers immediate surrounding areas.
For example, from the Region distribution we can visualize how the distribution is over the respective zipcodes having a better perspective on how respective populations are distributed.
Final conclusion:
Since there’s more data available, it will be interesting to perform more comparisons in regards of the years and states and any other “correlation” that we could find related to other entities as well.