The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv
and load the data into R. The code book, describing the variable names is here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
Create a logical vector that identifies the households on greater than 10 acres who sold more than $10,000 worth of agriculture products. Assign that logical vector to the variable agricultureLogical. Apply the which() function like this to identify the rows of the data frame where the logical vector is TRUE.
which(agricultureLogical)
What are the first 3 values that result?
A 125, 238,262
B 25, 36, 45
C 59, 460, 474
D 403, 756, 798
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
setwd("E:/Personal/especializacion/ciencia de datos/curso3/semana3/")
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(url = fileUrl, destfile = "data.csv", method = "curl")
data <- read.table("data.csv", header = TRUE, sep = ",")
head(data)
## RT SERIALNO DIVISION PUMA REGION ST ADJUST WGTP NP TYPE ACR AGS BDS BLD BUS
## 1 H 186 8 700 4 16 1015675 89 4 1 1 NA 4 2 2
## 2 H 306 8 700 4 16 1015675 310 1 1 NA NA 1 7 NA
## 3 H 395 8 100 4 16 1015675 106 2 1 1 NA 3 2 2
## 4 H 506 8 700 4 16 1015675 240 4 1 1 NA 4 2 2
## 5 H 835 8 800 4 16 1015675 118 4 1 2 1 5 2 2
## 6 H 989 8 700 4 16 1015675 115 4 1 1 NA 3 2 2
## CONP ELEP FS FULP GASP HFL INSP KIT MHP MRGI MRGP MRGT MRGX PLM RMS RNTM RNTP
## 1 NA 180 0 2 3 3 600 1 NA 1 1300 1 1 1 9 NA NA
## 2 NA 60 0 2 3 3 NA 1 NA NA NA NA NA 1 2 2 600
## 3 NA 70 0 2 30 1 200 1 NA NA NA NA 3 1 7 NA NA
## 4 NA 40 0 2 80 1 200 1 NA 1 860 1 1 1 6 NA NA
## 5 NA 250 0 2 3 3 700 1 NA 1 1900 1 1 1 7 NA NA
## 6 NA 130 0 2 3 3 250 1 NA 1 700 1 1 1 6 NA NA
## SMP TEL TEN VACS VAL VEH WATP YBL FES FINCP FPARC GRNTP GRPIP HHL HHT HINCP
## 1 NA 1 1 NA 17 3 840 5 2 105600 2 NA NA 1 1 105600
## 2 NA 1 3 NA NA 1 1 3 NA NA NA 660 23 1 4 34000
## 3 NA 1 2 NA 18 2 50 5 7 9400 2 NA NA 1 3 9400
## 4 400 1 1 NA 19 3 500 2 1 66000 1 NA NA 1 1 66000
## 5 650 1 1 NA 20 5 2 3 1 93000 2 NA NA 1 1 93000
## 6 400 1 1 NA 15 2 1200 5 2 61000 1 NA NA 1 1 61000
## HUGCL HUPAC HUPAOC HUPARC LNGI MV NOC NPF NPP NR NRC OCPIP PARTNER PSF R18
## 1 0 2 2 2 1 4 2 4 0 0 2 18 0 0 1
## 2 0 4 4 4 1 3 0 NA 0 0 0 NA 0 0 0
## 3 0 2 2 2 1 2 1 2 0 0 1 23 0 0 1
## 4 0 1 1 1 1 3 2 4 0 0 2 26 0 0 1
## 5 0 2 2 2 1 1 1 4 0 0 1 36 0 0 1
## 6 0 1 1 1 1 4 2 4 0 0 2 26 0 0 1
## R60 R65 RESMODE SMOCP SMX SRNT SVAL TAXP WIF WKEXREL WORKSTAT FACRP FAGSP
## 1 0 0 1 1550 3 0 1 24 3 2 3 0 0
## 2 0 0 2 NA NA 1 0 NA NA NA NA 0 0
## 3 0 0 1 179 NA 0 1 16 1 13 13 0 0
## 4 0 0 2 1422 1 0 1 31 2 2 1 0 0
## 5 0 0 1 2800 1 0 1 25 3 1 1 0 0
## 6 0 0 2 1330 2 0 1 7 1 7 3 0 0
## FBDSP FBLDP FBUSP FCONP FELEP FFSP FFULP FGASP FHFLP FINSP FKITP FMHP FMRGIP
## 1 0 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 1 0 0 0
## FMRGP FMRGTP FMRGXP FMVYP FPLMP FRMSP FRNTMP FRNTP FSMP FSMXHP FSMXSP FTAXP
## 1 0 0 0 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0 0 0 1
## FTELP FTENP FVACSP FVALP FVEHP FWATP FYBLP wgtp1 wgtp2 wgtp3 wgtp4 wgtp5
## 1 0 0 0 0 0 0 0 87 28 156 95 26
## 2 0 0 0 0 0 0 1 539 363 293 422 566
## 3 0 0 0 0 0 0 0 187 35 184 178 83
## 4 0 0 0 0 0 0 0 232 406 234 270 249
## 5 0 0 0 0 0 0 0 107 194 129 41 156
## 6 0 0 0 0 0 1 0 191 197 127 115 115
## wgtp6 wgtp7 wgtp8 wgtp9 wgtp10 wgtp11 wgtp12 wgtp13 wgtp14 wgtp15 wgtp16
## 1 25 95 93 93 91 87 166 90 25 153 89
## 2 289 87 242 453 453 334 358 414 102 281 99
## 3 95 31 32 177 118 110 114 184 107 95 115
## 4 242 406 249 287 67 72 413 399 77 245 424
## 5 174 47 113 101 33 115 52 113 95 135 206
## 6 107 119 34 32 30 123 199 117 33 109 117
## wgtp17 wgtp18 wgtp19 wgtp20 wgtp21 wgtp22 wgtp23 wgtp24 wgtp25 wgtp26 wgtp27
## 1 148 82 25 180 90 24 140 92 25 27 86
## 2 108 278 131 407 447 264 352 238 390 336 122
## 3 33 118 120 37 184 35 176 176 110 103 29
## 4 67 63 226 254 238 69 238 255 239 248 69
## 5 100 185 135 279 116 33 105 244 38 30 230
## 6 31 115 201 190 184 198 113 109 117 111 110
## wgtp28 wgtp29 wgtp30 wgtp31 wgtp32 wgtp33 wgtp34 wgtp35 wgtp36 wgtp37 wgtp38
## 1 84 87 93 90 149 91 28 143 81 144 95
## 2 374 482 468 335 251 613 104 284 116 91 326
## 3 30 197 127 92 118 177 99 99 109 34 100
## 4 234 247 437 423 74 61 401 267 72 388 335
## 5 123 123 243 120 238 98 90 107 44 122 32
## 6 33 37 36 110 183 114 35 134 119 32 121
## wgtp39 wgtp40 wgtp41 wgtp42 wgtp43 wgtp44 wgtp45 wgtp46 wgtp47 wgtp48 wgtp49
## 1 27 22 90 171 27 83 153 148 92 91 91
## 2 102 361 107 253 321 289 96 343 564 274 118
## 3 105 33 173 36 168 175 99 103 30 35 155
## 4 229 236 239 65 259 247 230 225 82 220 233
## 5 127 195 116 36 135 237 33 33 249 102 84
## 6 188 33 34 32 109 115 115 112 119 192 186
## wgtp50 wgtp51 wgtp52 wgtp53 wgtp54 wgtp55 wgtp56 wgtp57 wgtp58 wgtp59 wgtp60
## 1 93 90 26 94 142 24 91 29 84 148 30
## 2 118 321 261 130 463 294 479 391 307 476 283
## 3 102 95 107 185 120 114 113 36 115 103 29
## 4 419 390 69 74 391 276 70 422 409 223 245
## 5 224 119 250 119 125 126 32 112 33 131 45
## 6 213 106 34 124 179 106 107 190 112 34 35
## wgtp61 wgtp62 wgtp63 wgtp64 wgtp65 wgtp66 wgtp67 wgtp68 wgtp69 wgtp70 wgtp71
## 1 93 143 24 88 147 145 91 83 83 86 81
## 2 116 353 323 374 106 236 380 313 90 94 292
## 3 183 35 179 169 95 110 28 34 233 97 123
## 4 269 488 221 250 247 240 415 234 219 66 68
## 5 101 165 125 41 191 195 49 119 92 44 127
## 6 32 34 119 123 122 121 123 196 196 207 120
## wgtp72 wgtp73 wgtp74 wgtp75 wgtp76 wgtp77 wgtp78 wgtp79 wgtp80
## 1 27 93 151 28 79 25 101 157 129
## 2 401 81 494 346 496 615 286 454 260
## 3 119 168 107 95 101 30 124 106 31
## 4 359 385 71 234 421 76 77 242 231
## 5 36 119 121 116 209 97 176 144 38
## 6 34 109 199 116 110 211 120 31 189
agricultureLogical <- data$ACR == 3 & data$AGS ==6
#agricultureLogical
cat("Los tres primeros valores son :")
## Los tres primeros valores son :
head(which(agricultureLogical), 3)
## [1] 125 238 262
Using the jpeg package read in the following picture of your instructor into R
https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg
Use the parameter native=TRUE. What are the 30th and 80th quantiles of the resulting data? (some Linux systems may produce an answer 638 different for the 30th quantile)
A 10904118 -594524
B 15259150 -10575416
C 10904118 -10575416
D 16776430 -15390165
library(jpeg)
fileUrl <- "https://d396qusza40orc.cloudfront.net/getdata%2Fjeff.jpg"
download.file(url = fileUrl, destfile = "jeff.jpg", method = "curl")
jpg <- readJPEG("jeff.jpg", native = TRUE)
cat (" The 30th and 80th quntiles are:")
## The 30th and 80th quntiles are:
quantile(jpg, probs = c(0.3, 0.8))
## 30% 80%
## -15258512 -10575416
Load the Gross Domestic Product data for the 190 ranked countries in this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv
Load the educational data from this data set:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv
Match the data based on the country shortcode. How many of the IDs match? Sort the data frame in descending order by GDP rank (so United States is last). What is the 13th country in the resulting data frame?
Original data sources:
http://data.worldbank.org/data-catalog/GDP-ranking-table
http://data.worldbank.org/data-catalog/ed-stats
A 190 matches, 13th country is St. Kitts and Nevis
B 190 matches, 13th country is Spain
C 189 matches, 13th country is Spain
D 189 matches, 13th country is St. Kitts and Nevis
E 234 matches, 13th country is Spain
F 234 matches, 13th country is St. Kitts and Nevis
fileUrl1 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv"
fileUrl2 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv"
download.file(fileUrl1, destfile = "GDP.csv", method = "curl")
download.file(fileUrl2, destfile = "country.csv", method = "curl")
gdp <- read.csv("GDP.csv", header = TRUE, skip =4, sep = ",", nrows = 190)
edu <- read.csv("country.csv", header = TRUE, nrows =234 )
gdps<-select(gdp, X, X.1 , X.3 , X.4)
names(gdps)<-c("CountryCode", "Rank","Economy","Total")
gdps[,2]<- as.numeric(gdps[,2])
head(gdps)
## CountryCode Rank Economy Total
## 1 USA 1 United States 16,244,600
## 2 CHN 2 China 8,227,103
## 3 JPN 3 Japan 5,959,718
## 4 DEU 4 Germany 3,428,131
## 5 FRA 5 France 2,612,878
## 6 GBR 6 United Kingdom 2,471,784
head(edu)
## CountryCode Long.Name Income.Group
## 1 ABW Aruba High income: nonOECD
## 2 ADO Principality of Andorra High income: nonOECD
## 3 AFG Islamic State of Afghanistan Low income
## 4 AGO People's Republic of Angola Lower middle income
## 5 ALB Republic of Albania Upper middle income
## 6 ARE United Arab Emirates High income: nonOECD
## Region Lending.category Other.groups Currency.Unit
## 1 Latin America & Caribbean Aruban florin
## 2 Europe & Central Asia Euro
## 3 South Asia IDA HIPC Afghan afghani
## 4 Sub-Saharan Africa IDA Angolan kwanza
## 5 Europe & Central Asia IBRD Albanian lek
## 6 Middle East & North Africa U.A.E. dirham
## Latest.population.census Latest.household.survey
## 1 2000
## 2 Register based
## 3 1979 MICS, 2003
## 4 1970 MICS, 2001, MIS, 2006/07
## 5 2001 MICS, 2005
## 6 2005
## Special.Notes
## 1
## 2
## 3 Fiscal year end: March 20; reporting period for national accounts data: FY.
## 4
## 5
## 6
## National.accounts.base.year National.accounts.reference.year
## 1 1995 NA
## 2 NA
## 3 2002/2003 NA
## 4 1997 NA
## 5 1996
## 6 1995 NA
## System.of.National.Accounts SNA.price.valuation Alternative.conversion.factor
## 1 NA
## 2 NA
## 3 NA VAB
## 4 NA VAP 1991-96
## 5 1993 VAB
## 6 NA VAB
## PPP.survey.year Balance.of.Payments.Manual.in.use
## 1 NA
## 2 NA
## 3 NA
## 4 2005 BPM5
## 5 2005 BPM5
## 6 NA BPM4
## External.debt.Reporting.status System.of.trade Government.Accounting.concept
## 1 Special
## 2 General
## 3 Actual General Consolidated
## 4 Actual Special
## 5 Actual General Consolidated
## 6 General Consolidated
## IMF.data.dissemination.standard
## 1
## 2
## 3 GDDS
## 4 GDDS
## 5 GDDS
## 6 GDDS
## Source.of.most.recent.Income.and.expenditure.data Vital.registration.complete
## 1
## 2 Yes
## 3
## 4 IHS, 2000
## 5 LSMS, 2005 Yes
## 6
## Latest.agricultural.census Latest.industrial.data Latest.trade.data
## 1 NA 2008
## 2 NA 2006
## 3 NA 2008
## 4 1964-65 NA 1991
## 5 1998 2005 2008
## 6 1998 NA 2008
## Latest.water.withdrawal.data X2.alpha.code WB.2.code Table.Name
## 1 NA AW AW Aruba
## 2 NA AD AD Andorra
## 3 2000 AF AF Afghanistan
## 4 2000 AO AO Angola
## 5 2000 AL AL Albania
## 6 2005 AE AE United Arab Emirates
## Short.Name
## 1 Aruba
## 2 Andorra
## 3 Afghanistan
## 4 Angola
## 5 Albania
## 6 United Arab Emirates
joinData <- inner_join(edu,gdps, by = "CountryCode")
cat (" the number of matching rows are: ")
## the number of matching rows are:
nrow( joinData )
## [1] 189
joinData <- arrange(joinData,desc(Rank))
sum(!is.na(joinData[,2]))
## [1] 189
head(joinData)
## CountryCode Long.Name Income.Group
## 1 TUV Tuvalu Lower middle income
## 2 KIR Republic of Kiribati Lower middle income
## 3 MHL Republic of the Marshall Islands Lower middle income
## 4 PLW Republic of Palau Upper middle income
## 5 STP Democratic Republic of São Tomé and Principe Lower middle income
## 6 FSM Federated States of Micronesia Lower middle income
## Region Lending.category Other.groups Currency.Unit
## 1 East Asia & Pacific Australian dollar
## 2 East Asia & Pacific IDA Australian dollar
## 3 East Asia & Pacific IBRD U.S. dollar
## 4 East Asia & Pacific IBRD U.S. dollar
## 5 Sub-Saharan Africa IDA HIPC São Tomé and Principe dobra
## 6 East Asia & Pacific IBRD U.S. dollar
## Latest.population.census Latest.household.survey
## 1
## 2 2005
## 3 1999
## 4 2005
## 5 2001
## 6 2000
## Special.Notes
## 1
## 2 The government statistical office has revised national accounts data for 1970-2008.
## 3
## 4
## 5
## 6 The government statistical office has revised national accounts data for 1995-2008.
## National.accounts.base.year National.accounts.reference.year
## 1 NA
## 2 1991 NA
## 3 1991 NA
## 4 1995 NA
## 5 2001 NA
## 6 1998 NA
## System.of.National.Accounts SNA.price.valuation Alternative.conversion.factor
## 1 NA
## 2 NA VAB
## 3 NA VAB
## 4 NA VAB
## 5 NA VAP
## 6 NA VAB
## PPP.survey.year Balance.of.Payments.Manual.in.use
## 1 NA
## 2 NA
## 3 NA
## 4 NA
## 5 2005
## 6 NA
## External.debt.Reporting.status System.of.trade Government.Accounting.concept
## 1
## 2 General
## 3
## 4
## 5 Preliminary Special
## 6
## IMF.data.dissemination.standard
## 1
## 2 GDDS
## 3
## 4
## 5 GDDS
## 6
## Source.of.most.recent.Income.and.expenditure.data Vital.registration.complete
## 1
## 2
## 3
## 4 Yes
## 5 PS 2000-01
## 6
## Latest.agricultural.census Latest.industrial.data Latest.trade.data
## 1 NA NA
## 2 NA 2005
## 3 NA NA
## 4 NA NA
## 5 NA 2008
## 6 NA NA
## Latest.water.withdrawal.data X2.alpha.code WB.2.code Table.Name
## 1 NA TV TV Tuvalu
## 2 NA KI KI Kiribati
## 3 NA MH MH Marshall Islands
## 4 NA PW PW Palau
## 5 NA ST ST São Tomé and Principe
## 6 NA FM FM Micronesia, Fed. Sts.
## Short.Name Rank Economy Total
## 1 Tuvalu 190 Tuvalu 40
## 2 Kiribati 189 Kiribati 175
## 3 Marshall Islands 188 Marshall Islands 182
## 4 Palau 187 Palau 228
## 5 São Tomé and Principe 186 São Tomé and Principe 263
## 6 Micronesia 185 Micronesia, Fed. Sts. 326
arrange(joinData, desc(Rank))[13, "Economy"]
## [1] "St. Kitts and Nevis"
What is the average GDP ranking for the “High income: OECD” and “High income: nonOECD” group?
a 23, 45
b 23.966667, 30.91304
c 133.72973, 32.96667
d 32.96667, 91.91304
e 23, 30
f 30, 37
#La función tapply aplica (de ahí parte de su nombre) una #función a un vector en los subvectores que define otro #vector máscara:
#en este caso aplica la funcion mean de rank del joinData de # IncomeGroup
str(joinData)
## 'data.frame': 189 obs. of 34 variables:
## $ CountryCode : chr "TUV" "KIR" "MHL" "PLW" ...
## $ Long.Name : chr "Tuvalu" "Republic of Kiribati" "Republic of the Marshall Islands" "Republic of Palau" ...
## $ Income.Group : chr "Lower middle income" "Lower middle income" "Lower middle income" "Upper middle income" ...
## $ Region : chr "East Asia & Pacific" "East Asia & Pacific" "East Asia & Pacific" "East Asia & Pacific" ...
## $ Lending.category : chr "" "IDA" "IBRD" "IBRD" ...
## $ Other.groups : chr "" "" "" "" ...
## $ Currency.Unit : chr "Australian dollar" "Australian dollar" "U.S. dollar" "U.S. dollar" ...
## $ Latest.population.census : chr "" "2005" "1999" "2005" ...
## $ Latest.household.survey : chr "" "" "" "" ...
## $ Special.Notes : chr "" "The government statistical office has revised national accounts data for 1970-2008." "" "" ...
## $ National.accounts.base.year : chr "" "1991" "1991" "1995" ...
## $ National.accounts.reference.year : int NA NA NA NA NA NA NA NA NA NA ...
## $ System.of.National.Accounts : int NA NA NA NA NA NA NA 1993 NA NA ...
## $ SNA.price.valuation : chr "" "VAB" "VAB" "VAB" ...
## $ Alternative.conversion.factor : chr "" "" "" "" ...
## $ PPP.survey.year : int NA NA NA NA 2005 NA NA NA 2005 NA ...
## $ Balance.of.Payments.Manual.in.use : chr "" "" "" "" ...
## $ External.debt.Reporting.status : chr "" "" "" "" ...
## $ System.of.trade : chr "" "General" "" "" ...
## $ Government.Accounting.concept : chr "" "" "" "" ...
## $ IMF.data.dissemination.standard : chr "" "GDDS" "" "" ...
## $ Source.of.most.recent.Income.and.expenditure.data: chr "" "" "" "" ...
## $ Vital.registration.complete : chr "" "" "" "Yes" ...
## $ Latest.agricultural.census : chr "" "" "" "" ...
## $ Latest.industrial.data : int NA NA NA NA NA NA NA NA NA NA ...
## $ Latest.trade.data : int NA 2005 NA NA 2008 NA 2007 2008 2007 2008 ...
## $ Latest.water.withdrawal.data : int NA NA NA NA NA NA NA NA NA NA ...
## $ X2.alpha.code : chr "TV" "KI" "MH" "PW" ...
## $ WB.2.code : chr "TV" "KI" "MH" "PW" ...
## $ Table.Name : chr "Tuvalu" "Kiribati" "Marshall Islands" "Palau" ...
## $ Short.Name : chr "Tuvalu" "Kiribati" "Marshall Islands" "Palau" ...
## $ Rank : num 190 189 188 187 186 185 184 183 182 181 ...
## $ Economy : chr "Tuvalu" "Kiribati" "Marshall Islands" "Palau" ...
## $ Total : chr " 40 " " 175 " " 182 " " 228 " ...
tapply(joinData$Rank,joinData$Income.Group,mean)
## High income: nonOECD High income: OECD Low income
## 91.91304 32.96667 133.72973
## Lower middle income Upper middle income
## 107.70370 92.13333
Cut the GDP ranking into 5 separate quantile groups. Make a table versus Income.Group. How many countries are Lower middle income but among the 38 nations with highest GDP?
A 5
B 13
C 12
D 0
joinData$rank.groups <- cut(joinData$Rank,breaks = quantile(joinData$Rank,c(0,0.2,0.4,0.6,0.8,1)))
table(joinData$rank.groups,joinData$Income.Group)
##
## High income: nonOECD High income: OECD Low income
## (1,38.6] 4 17 0
## (38.6,76.2] 5 10 1
## (76.2,114] 8 1 9
## (114,152] 4 1 16
## (152,190] 2 0 11
##
## Lower middle income Upper middle income
## (1,38.6] 5 11
## (38.6,76.2] 13 9
## (76.2,114] 11 8
## (114,152] 9 8
## (152,190] 16 9