CSC 360 Lecture 3 Notes

Harold Nelson

March 29, 2016

R Markdown

This is an R Markdown presentation. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Getting Organized

The Working Directory

What Kind of File?

Looking for Bad Data

There are some things you should always do, but they won’t find all problems.

The more questions you ask, the more problems you’ll find.

For the dataframe as a whole do head(), tail(), str() and summary().

For numeric variables: - hist(x) - summary(x) - boxplot(x) - plot(density(x))

For qualitative variables.

Relationships

Consequences of Cleaning

You need to examine the data you have left after taking care of the bad data.

Do you still have a random sample of your population?