dim() to determine how many rows and columns the data frame hasnames()),str()).library(foreign)
inc_real <- read.spss('/Users/johnhope/Desktop/DS3003/Data/inc_real.sav')
inc_real <- as.data.frame(inc_real) #converting to a data frame
head(inc_real) #printing first 6 rows
## age sex whours educat income hwage edu
## 1 24 male 40 non-tertiary post-secondary degree 18000 112.500 15
## 2 43 female 40 academic high school 14500 90.625 12
## 3 27 male 40 apprenticeship 18000 112.500 10
## 4 37 male 40 compulsory school 15700 98.125 9
## 5 50 male 42 academic high school 38000 237.500 12
## 6 50 male 39 apprenticeship 22000 137.500 10
## potexp
## 1 3
## 2 25
## 3 11
## 4 22
## 5 32
## 6 34
We see the first 6 rows of the data
dim(inc_real)
## [1] 1271 8
The data has 1271 rows and 8 columns
names(inc_real)
## [1] "age" "sex" "whours" "educat" "income" "hwage" "edu" "potexp"
str(inc_real)
## 'data.frame': 1271 obs. of 8 variables:
## $ age : num 24 43 27 37 50 50 30 60 45 26 ...
## $ sex : Factor w/ 2 levels "male","female": 1 2 1 1 1 1 2 1 1 1 ...
## $ whours: num 40 40 40 40 42 39 40 39 40 39 ...
## $ educat: Factor w/ 9 levels "no degree","compulsory school",..: 8 5 3 2 5 3 7 3 3 3 ...
## $ income: num 18000 14500 18000 15700 38000 22000 5200 12000 15000 13000 ...
## $ hwage : num 112.5 90.6 112.5 98.1 237.5 ...
## $ edu : num 15 12 10 9 12 10 13 10 10 10 ...
## $ potexp: num 3 25 11 22 32 34 11 44 29 10 ...
Above we see the variable names and their associated types
summary(inc_real)
## age sex whours
## Min. :16.00 male :839 Min. :36.00
## 1st Qu.:28.00 female:432 1st Qu.:38.00
## Median :36.00 Median :40.00
## Mean :36.78 Mean :39.87
## 3rd Qu.:45.00 3rd Qu.:40.00
## Max. :64.00 Max. :80.00
##
## educat income hwage
## apprenticeship :599 Min. : 5000 Min. : 31.25
## compulsory school :220 1st Qu.:13000 1st Qu.: 81.25
## vocational school :127 Median :15000 Median : 93.75
## vocational high school :101 Mean :16822 Mean :105.14
## tertiary education (BA, MA, PhD): 87 3rd Qu.:20000 3rd Qu.:125.00
## academic high school : 66 Max. :80819 Max. :505.12
## (Other) : 71
## edu potexp
## Min. : 9.00 Min. : 0.00
## 1st Qu.:10.00 1st Qu.:11.00
## Median :10.00 Median :19.00
## Mean :10.95 Mean :19.84
## 3rd Qu.:12.00 3rd Qu.:28.00
## Max. :17.00 Max. :46.00
##
Each sequence should have a length of 20 (i.e., 20 numbers), only the first 12 numbers are shown below.
rep(c(1,0),10)
## [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
rep(rep(1:0, each = 2), 5)
## [1] 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0
rep(seq(0, 9, by=3), 5)
## [1] 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9
hist(inc_real$income)