Statistics 4869/6620: Statistical Learning with R
Prof. Eric A. Suess
3/29/2017
The main types of data structures in R
x = c(34,45,56)
y = c(178,132,99)
plot(x,y)
gender = factor(c("F", "M", "F"))
gender
[1] F M F
Levels: F M
subject1 = list(x = x[1], y = y[1],
gender = gender[1])
subject1
$x
[1] 34
$y
[1] 178
$gender
[1] F
Levels: F M
mydata = data.frame(x, y, gender)
mydata
x y gender
1 34 178 F
2 45 132 M
3 56 99 F
mydata$x
[1] 34 45 56
mydata$gender
[1] F M F
Levels: F M
mydata = data.frame(x, y, gender)
mydata[1,]
x y gender
1 34 178 F
mydata[,c(2,3)]
y gender
1 178 F
2 132 M
3 99 F
X = matrix(c(x,y), ncol=2)
X
[,1] [,2]
[1,] 34 178
[2,] 45 132
[3,] 56 99
Set the working directory.
getwd()
setwd(“C:\ path to where your data is, with double \”)
Reading and writing .csv files
usedcars <- read.csv(“usedcars.csv”, stringsAsFactors = FALSE)
write.csv(“mydata”, file “mydata.csv”)
In RStudio try to load the data with the
Environment > Import Dataset >
From Text File… OR From Web URL…
When exploring quantitative/numeric variables we use
usedcars <- read.csv("usedcars.csv",
stringsAsFactors = FALSE)
head(usedcars)
year model price mileage color transmission
1 2011 SEL 21992 7413 Yellow AUTO
2 2011 SEL 20995 10926 Gray AUTO
3 2011 SEL 19995 7351 Silver AUTO
4 2011 SEL 17809 11613 Gray AUTO
5 2012 SE 17500 8367 White AUTO
6 2010 SEL 17495 25125 Silver AUTO
usedcars <- read.csv("usedcars.csv",
stringsAsFactors = FALSE)
summary(usedcars$price)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3800 11000 13590 12960 14900 21990
mean(usedcars$price)
[1] 12961.93
sd(usedcars$price)
[1] 3122.482
range(usedcars$price)
[1] 3800 21992
When exploring qualitative/categorical variables we use
When exploring the relationships between quantitative/numeric variables we use
When exploring the relationships between qualitative/categorical variables we use
install.packages(“gmodels”)
library(gmodels)