Collect and document several datasets. There are three main steps
First we need to find a dataset and download the raw data (in .txt or .csv).
# Load the datafile into R as a new object.
# Give the data a short name (in this case, just abalone)
abalone <- read.table("~/Dropbox/datarepository/txt_files/abalone.txt",
sep = ",", # How are columns separated? (usually ",")
header = F, # If there is a header row, use header = T
stringsAsFactors = F # Always include this!
)
Once you’ve loaded the data, take a quick look at itwith head() to make sure it loaded correctly. Here are the first few rows of abalone
head(abalone)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 M 0.455 0.365 0.095 0.5140 0.2245 0.1010 0.150 15
## 2 M 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.070 7
## 3 F 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.210 9
## 4 M 0.440 0.365 0.125 0.5160 0.2155 0.1140 0.155 10
## 5 I 0.330 0.255 0.080 0.2050 0.0895 0.0395 0.055 7
## 6 I 0.425 0.300 0.095 0.3515 0.1410 0.0775 0.120 8
Use short column names (i.e. less than 10 - 15 characters) all in lower-case, and with no spaces.
# Add column names
# The names should all be in lower-case and shouldn't be too long.
names(abalone) <- c("sex", "length", "diameter", "height", "whole_weight",
"shucked_weight", "viscera_weight", "shell_weight", "rings")
Here’s how the data look now
head(abalone)
## sex length diameter height whole_weight shucked_weight viscera_weight
## 1 M 0.455 0.365 0.095 0.5140 0.2245 0.1010
## 2 M 0.350 0.265 0.090 0.2255 0.0995 0.0485
## 3 F 0.530 0.420 0.135 0.6770 0.2565 0.1415
## 4 M 0.440 0.365 0.125 0.5160 0.2155 0.1140
## 5 I 0.330 0.255 0.080 0.2050 0.0895 0.0395
## 6 I 0.425 0.300 0.095 0.3515 0.1410 0.0775
## shell_weight rings
## 1 0.150 15
## 2 0.070 7
## 3 0.210 9
## 4 0.155 10
## 5 0.055 7
## 6 0.120 8
save()save(abalone, file = "~/Dropbox/datarepository/RData_files/abalone.RData")
Now you need to document the dataset in R by creating a new file called NAME_doc.R
template_doc.R file (located in the documentation_files folder) in RStudio.NAME_doc.R (where NAME is the name of the dataset) in the documentation_files folder (e.g.; abalone_doc.R)abalone_docHere’s how a completed version should look:
#' abalone dataset
#'
#' Predict the age of abalone from physical measurements
#'
#' @format A data frame containing 4177 rows and 8 columns
#' \describe{
#' \item{sex}{either M, F, or I (infant)}
#' \item{length}{Longest shell measurement}
#' \item{diameter}{perpendicular to length}
#' \item{height}{with meat in shell}
#' \item{whole_weight}{whole abalone}
#' \item{shucked_weight}{weight of meat}
#' \item{viscera_weight}{gut weight (after bleeding)}
#' \item{shell_weight}{after being dried}
#' \item{rings}{+1.5 gives the age in years}
#' ...
#' }
#' @source http://archive.ics.uci.edu/ml/datasets/Abalone
#' @export
#'
"abalone"