Param Singh IS607 - week6 assignment
- Choose and load any R dataset (except for diamonds!) that has at least two numeric variables and at least two categorical variables. Identify which variables in your data set are numeric, and which are categorical (factors).
#install.packages("ggplot2")
#install.packages("alr4")
require(ggplot2)
## Loading required package: ggplot2
require(alr4)
## Loading required package: alr4
## Loading required package: car
## Warning: package 'car' was built under R version 3.1.3
## Loading required package: effects
##
## Attaching package: 'effects'
##
## The following object is masked from 'package:car':
##
## Prestige
head(salary)
## degree rank sex year ysdeg salary
## 1 Masters Prof Male 25 35 36350
## 2 Masters Prof Male 13 22 35350
## 3 Masters Prof Male 10 23 28200
## 4 Masters Prof Female 7 27 26775
## 5 PhD Prof Male 19 30 33696
## 6 Masters Prof Male 16 21 28516
# Identify which variables in the diamonds dataset are numeric, and which
# variables are categorical
str(salary)
## 'data.frame': 52 obs. of 6 variables:
## $ degree: Factor w/ 2 levels "Masters","PhD": 1 1 1 1 2 1 2 1 2 2 ...
## $ rank : Factor w/ 3 levels "Asst","Assoc",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ sex : Factor w/ 2 levels "Male","Female": 1 1 1 2 1 1 2 1 1 1 ...
## $ year : int 25 13 10 7 19 16 0 16 13 13 ...
## $ ysdeg : int 35 22 23 27 30 21 32 18 30 31 ...
## $ salary: int 36350 35350 28200 26775 33696 28516 24900 31909 31850 32850 ...
- Generate summary level descriptive statistics: Show the mean, median, 25th and 75th quartiles, min, and max for each of the applicable variables in diamonds.
summary(salary)
## degree rank sex year ysdeg
## Masters:34 Asst :18 Male :38 Min. : 0.000 Min. : 1.00
## PhD :18 Assoc:14 Female:14 1st Qu.: 3.000 1st Qu.: 6.75
## Prof :20 Median : 7.000 Median :15.50
## Mean : 7.481 Mean :16.12
## 3rd Qu.:11.000 3rd Qu.:23.25
## Max. :25.000 Max. :35.00
## salary
## Min. :15000
## 1st Qu.:18247
## Median :23719
## Mean :23798
## 3rd Qu.:27258
## Max. :38045
- Determine the frequency for one of the categeorical variables
table(salary$degree)
##
## Masters PhD
## 34 18
- Determine the frequency for one of the categeorical variables, by a different categorical variable
table(salary$degree, salary$sex)
##
## Male Female
## Masters 24 10
## PhD 14 4
- Create a graph for a single numeric variable
boxplot(salary$salary)

hist(salary$salary)

# ggplot
qplot(salary, data=salary)
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

- Create a scatterplot of two numeric variables
qplot(ysdeg, salary, data=salary)
