When importing the SPSS file, you may see a warning such as “re-encoding from CP1252.” This is harmless and can be ignored, as suggested by the assignment instructions.
Q1. Read the SPSS file and inspect the data
# Q1-a: print the first 10 rows of the data frameinc_dat <-read.spss("inc_real.sav", to.data.frame =TRUE)head(inc_dat, 10) # or inc_dat[1:10, ]
age sex whours educat income hwage edu
1 24 male 40 non-tertiary post-secondary degree 18000 112.500 15
2 43 female 40 academic high school 14500 90.625 12
3 27 male 40 apprenticeship 18000 112.500 10
4 37 male 40 compulsory school 15700 98.125 9
5 50 male 42 academic high school 38000 237.500 12
6 50 male 39 apprenticeship 22000 137.500 10
7 30 female 40 special vocational high school 5200 32.500 13
8 60 male 39 apprenticeship 12000 75.000 10
9 45 male 40 apprenticeship 15000 93.750 10
10 26 male 39 apprenticeship 13000 81.250 10
potexp
1 3
2 25
3 11
4 22
5 32
6 34
7 11
8 44
9 29
10 10
# Q1-b: use dim() to determine how many rows and columns the data frame hasdim(inc_dat) # nrow(inc_dat); ncol(inc_dat)
[1] 1271 8
# Q1-c: get the variable names (use names())names(inc_dat)
##Q2. Summary statistics for all variablessummary(inc_dat)
age sex whours
Min. :16.00 male :839 Min. :36.00
1st Qu.:28.00 female:432 1st Qu.:38.00
Median :36.00 Median :40.00
Mean :36.78 Mean :39.87
3rd Qu.:45.00 3rd Qu.:40.00
Max. :64.00 Max. :80.00
educat income hwage
apprenticeship :599 Min. : 5000 Min. : 31.25
compulsory school :220 1st Qu.:13000 1st Qu.: 81.25
vocational school :127 Median :15000 Median : 93.75
vocational high school :101 Mean :16822 Mean :105.14
tertiary education (BA, MA, PhD): 87 3rd Qu.:20000 3rd Qu.:125.00
academic high school : 66 Max. :80819 Max. :505.12
(Other) : 71
edu potexp
Min. : 9.00 Min. : 0.00
1st Qu.:10.00 1st Qu.:11.00
Median :10.00 Median :19.00
Mean :10.95 Mean :19.84
3rd Qu.:12.00 3rd Qu.:28.00
Max. :17.00 Max. :46.00
# or:#rep(rep(1:0), each = 2), times = 5)#Q3-c: 0 3 6 9 0 3 6 9 0 3 6 9rep(seq(0, 9, by =3), times =5)
[1] 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9
#Read the csv file used in class demos.#Ensure the file "income_exmpl.csv" is in the EP763R folder.incex <-read.csv("income_exmpl.csv", stringsAsFactors =TRUE) #Quick checks#head(incex)#str(incex)##Q1. Ages of the 200th and 250th observations#Select the 'age' values for observations 200 and 250incex$age[c(200, 250)]
[1] 42 39
#or:#incex[c(200, 250), "age"]##Q2. Income for ages 25-35 with low educational level#logical indexing: age between 25 and 35 inclusive AND edu == "low"incex[incex$age >=25& incex$age <=35& incex$edu =="low","income"]
##Q3. Subset where 'occ' is medium or high and 'oexp' > 45subset(incex, (occ =="med."| occ =="high") & oexp >45)
sex age edu occ oexp income
227 f 64 low med. 47 1139
481 m 65 low high 47 1532
858 m 65 low med. 48 1520
1132 m 63 low med. 46 1462
1173 m 65 low high 47 1591
1368 m 65 low med. 48 1368
1383 m 62 low high 46 1800
1718 m 64 low high 46 1607
1747 m 64 low high 46 1442
subset(incex, occ !="low"& oexp >45)
sex age edu occ oexp income
227 f 64 low med. 47 1139
481 m 65 low high 47 1532
858 m 65 low med. 48 1520
1132 m 63 low med. 46 1462
1173 m 65 low high 47 1591
1368 m 65 low med. 48 1368
1383 m 62 low high 46 1800
1718 m 64 low high 46 1607
1747 m 64 low high 46 1442
##Q4. Mean occupational experience by gender among low#Version 1: subset then tapply()incex_lowedu <-subset(incex, edu =="low")tapply(incex_lowedu$oexp, incex_lowedu$sex, mean, na.rm =TRUE)