EXAM 1

This is my first markdown project ever. I will use it to generate analysis report nd submit online.

Objective

We are going to use data set in R to do data input, cleaning, summary, exploration, and regression.

Set Qorking Directory

Session -> set working directory setwd(“c/user/desktop/IE300/Exam1”) setwd(choose.dir()) files->more-> set as working directory

Input Data in R

Use Import Dataset tool If the data format is csv file, use read.csv excel file: package in r “xls” read SAS, connect to database (there is a package)

Data Analytics Workflow

Data Analytics Flowchart +Insert the PNG image into R Markdown

Step 1- Clean Data

  • fix(iris)
  • str(iris)
  • names(iris)

‘’’{r} str(iris) names(iris) dim(iris) attributes(iris) head(iris) tail(iris) iris[1:5,] ‘’’

Step1 -Missing Value

+is.na(iris) +mean(iris, na.rm=TRUE) +mydata(!complete.cases(iris))-> the list rows of data that have missing value. +newdata<-na.omit(iris)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

summary(iris, na.rm=TRUE)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
quantile(iris$Sepal.Length)
##   0%  25%  50%  75% 100% 
##  4.3  5.1  5.8  6.4  7.9
quantile(iris$Sepal.Length, c(.1,.03,0.65))
##   10%    3%   65% 
## 4.800 4.547 6.200
var(iris)
##              Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## Sepal.Length    0.6856935  -0.0424340    1.2743154   0.5162707      NA
## Sepal.Width    -0.0424340   0.1899794   -0.3296564  -0.1216394      NA
## Petal.Length    1.2743154  -0.3296564    3.1162779   1.2956094      NA
## Petal.Width     0.5162707  -0.1216394    1.2956094   0.5810063      NA
## Species                NA          NA           NA          NA      NA
hist(iris$Sepal.Length)

table(iris$Sepal.Length)
## 
## 4.3 4.4 4.5 4.6 4.7 4.8 4.9   5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9   6 
##   1   3   1   4   2   5   6  10   9   4   1   6   7   6   8   7   3   6 
## 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9   7 7.1 7.2 7.3 7.4 7.6 7.7 7.9 
##   6   4   9   7   5   2   8   3   4   1   1   3   1   1   1   4   1
plot(table(iris$Species))

plot(density(iris$Sepal.Length))

attach(iris)
cor(Sepal.Length,Petal.Length)
## [1] 0.8717538
detach(iris)

boxplot(Sepal.Length~Species, data=iris,xlab="Species",ylab="Sepal.Length")

with(iris,plot(Sepal.Length,Sepal.Width,col=(Species)))

pairs(iris)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.