This is my first markdown project ever. I will use it to generate analysis report nd submit online.
We are going to use data set in R to do data input, cleaning, summary, exploration, and regression.
Session -> set working directory setwd(“c/user/desktop/IE300/Exam1”) setwd(choose.dir()) files->more-> set as working directory
Use Import Dataset tool If the data format is csv file, use read.csv excel file: package in r “xls” read SAS, connect to database (there is a package)
Data Analytics Flowchart +Insert the PNG image into R Markdown
‘’’{r} str(iris) names(iris) dim(iris) attributes(iris) head(iris) tail(iris) iris[1:5,] ‘’’
+is.na(iris) +mean(iris, na.rm=TRUE) +mydata(!complete.cases(iris))-> the list rows of data that have missing value. +newdata<-na.omit(iris)
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
summary(iris, na.rm=TRUE)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
quantile(iris$Sepal.Length)
## 0% 25% 50% 75% 100%
## 4.3 5.1 5.8 6.4 7.9
quantile(iris$Sepal.Length, c(.1,.03,0.65))
## 10% 3% 65%
## 4.800 4.547 6.200
var(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## Sepal.Length 0.6856935 -0.0424340 1.2743154 0.5162707 NA
## Sepal.Width -0.0424340 0.1899794 -0.3296564 -0.1216394 NA
## Petal.Length 1.2743154 -0.3296564 3.1162779 1.2956094 NA
## Petal.Width 0.5162707 -0.1216394 1.2956094 0.5810063 NA
## Species NA NA NA NA NA
hist(iris$Sepal.Length)
table(iris$Sepal.Length)
##
## 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6
## 1 3 1 4 2 5 6 10 9 4 1 6 7 6 8 7 3 6
## 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.6 7.7 7.9
## 6 4 9 7 5 2 8 3 4 1 1 3 1 1 1 4 1
plot(table(iris$Species))
plot(density(iris$Sepal.Length))
attach(iris)
cor(Sepal.Length,Petal.Length)
## [1] 0.8717538
detach(iris)
boxplot(Sepal.Length~Species, data=iris,xlab="Species",ylab="Sepal.Length")
with(iris,plot(Sepal.Length,Sepal.Width,col=(Species)))
pairs(iris)
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.