1 Read the csv file “inc_real.sav” (contains real income data) into R and then check the data by doing the following:

Print the first 6 or 10 rows of the data frame
Use dim() to determine how many rows and columns the data frame has
Get the variable names (use names()),
Determine the type of variables (numerical, factor, …), i.e., the structure of the data frame (str()).

library(foreign)
inc_real <- read.spss('/Users/johnhope/Desktop/DS3003/Data/inc_real.sav')

inc_real <- as.data.frame(inc_real) #converting to a data frame

head(inc_real) #printing first 6 rows

##   age    sex whours                             educat income   hwage edu
## 1  24   male     40 non-tertiary post-secondary degree  18000 112.500  15
## 2  43 female     40               academic high school  14500  90.625  12
## 3  27   male     40                     apprenticeship  18000 112.500  10
## 4  37   male     40                  compulsory school  15700  98.125   9
## 5  50   male     42               academic high school  38000 237.500  12
## 6  50   male     39                     apprenticeship  22000 137.500  10
##   potexp
## 1      3
## 2     25
## 3     11
## 4     22
## 5     32
## 6     34

We see the first 6 rows of the data

dim(inc_real)

## [1] 1271    8

The data has 1271 rows and 8 columns

names(inc_real)

## [1] "age"    "sex"    "whours" "educat" "income" "hwage"  "edu"    "potexp"

str(inc_real)

## 'data.frame':    1271 obs. of  8 variables:
##  $ age   : num  24 43 27 37 50 50 30 60 45 26 ...
##  $ sex   : Factor w/ 2 levels "male","female": 1 2 1 1 1 1 2 1 1 1 ...
##  $ whours: num  40 40 40 40 42 39 40 39 40 39 ...
##  $ educat: Factor w/ 9 levels "no degree","compulsory school",..: 8 5 3 2 5 3 7 3 3 3 ...
##  $ income: num  18000 14500 18000 15700 38000 22000 5200 12000 15000 13000 ...
##  $ hwage : num  112.5 90.6 112.5 98.1 237.5 ...
##  $ edu   : num  15 12 10 9 12 10 13 10 10 10 ...
##  $ potexp: num  3 25 11 22 32 34 11 44 29 10 ...

Above we see the variable names and their associated types

2 Get the summary statistics for the variables in the data frame.

summary(inc_real)

##       age            sex          whours     
##  Min.   :16.00   male  :839   Min.   :36.00  
##  1st Qu.:28.00   female:432   1st Qu.:38.00  
##  Median :36.00                Median :40.00  
##  Mean   :36.78                Mean   :39.87  
##  3rd Qu.:45.00                3rd Qu.:40.00  
##  Max.   :64.00                Max.   :80.00  
##                                              
##                               educat        income          hwage       
##  apprenticeship                  :599   Min.   : 5000   Min.   : 31.25  
##  compulsory school               :220   1st Qu.:13000   1st Qu.: 81.25  
##  vocational school               :127   Median :15000   Median : 93.75  
##  vocational high school          :101   Mean   :16822   Mean   :105.14  
##  tertiary education (BA, MA, PhD): 87   3rd Qu.:20000   3rd Qu.:125.00  
##  academic high school            : 66   Max.   :80819   Max.   :505.12  
##  (Other)                         : 71                                   
##       edu            potexp     
##  Min.   : 9.00   Min.   : 0.00  
##  1st Qu.:10.00   1st Qu.:11.00  
##  Median :10.00   Median :19.00  
##  Mean   :10.95   Mean   :19.84  
##  3rd Qu.:12.00   3rd Qu.:28.00  
##  Max.   :17.00   Max.   :46.00  
##

3 Generate the following sequences using seq() and rep().

Each sequence should have a length of 20 (i.e., 20 numbers), only the first 12 numbers are shown below.
- 1 0 1 0 1 0 1 0 1 0 1 0 ….
- 1 1 0 0 1 1 0 0 1 1 0 0 ….
- 0 3 6 9 0 3 6 9 0 3 6 9 ….

rep(c(1,0),10)

##  [1] 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0

rep(rep(1:0, each = 2), 5)

##  [1] 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

rep(seq(0, 9, by=3), 5)

##  [1] 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9 0 3 6 9

4 Plot a histrogram with the income variable from data.

hist(inc_real$income)

Assignment 2 - Getting Started with R (4 pts)

John Hope (jah9kqn)

Due Date: 11:59pm, Jan 26

1 Read the csv file “inc_real.sav” (contains real income data) into R and then check the data by doing the following:

2 Get the summary statistics for the variables in the data frame.

3 Generate the following sequences using seq() and rep().

4 Plot a histrogram with the income variable from data.