SETUP

Begin the setup of environment by setting a directory to work in.

Install/Load the required packages.

DATA DESCRIPTION

As per the given task, following is the open source link for the data I have considered.

https://vincentarelbundock.github.io/Rdatasets/datasets.html?fbclid=IwAR1zE57wqKeMzPiTb8uFFK0nuIRkmMSXfcDFuifyP4BhCwTy_jEdncyVZIg

File downloand link - https://vincentarelbundock.github.io/Rdatasets/csv/DAAG/worldRecords.csv

The filename is “worldRecords” which is a csv file. The data in the file is about records created by fellow racers in car racing over the years at different locations.

Some of the characteristics of selected dataset are,

READ/IMPORT DATA

Step 1: WR <- read.csv(“worldRecords.csv”) - The file worldRecords which is in csv format is imported into R.

Step 2: head(WR) - The function head() describes the header of the file. On execution, header of WR can be viewed.


WR <- read.csv("worldRecords.csv")
head(WR)
NA

Step 3: WR.df <- data.frame(WR) - The imported file is converted and saved as a data frame.


WR.df <- data.frame(WR)

INSPECT and UNDERSTAND

This step is about analysing and manipulating the data frame with respect to its dimension, data types and structure.

So, Dimensions of the data frame could be obtained by dim(“WR.df”) which gives number of rows and columns as its output data in form of dimension.


dim(WR.df)
[1] 40  6

Data type is the type of data, the variable holds. It could be either of numeric, character, integer, factor, and logical. The following function i,e typef() help us in getting data types of the variable set.


typeof(WR.df)
[1] "list"

attributes(WR.df)
$names
[1] "X"           "Distance"    "roadORtrack" "Place"       "Time"        "Date"       

$class
[1] "data.frame"

$row.names
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

typeof("X")
[1] "character"

typeof("Distance")
[1] "character"

typeof("roadORtrack")
[1] "character"

typeof("Place")
[1] "character"

typeof("Time")
[1] "character"

typeof("Date")
[1] "character"

The categorical variables or factor variables has different labels to categorise the things. These labels follow the levels or ordering which could be renamed or rearanged.


rORt <- factor(WR$roadORtrack, labels = c("road", "track"), levels = c("road", "track"))
levels(rORt)    
[1] "road"  "track"

Column names of a data frame could be obtained by function colnames().


colnames(WR.df)
[1] "X"           "Distance"    "roadORtrack" "Place"       "Time"        "Date"       

The column name of column number 1 has been assigned as “X” as it was null value. It could be renamed as “Sl.No” by following the syntax below.


colnames(WR.df)[1] <- c("Sl.No")   
colnames(WR.df)    
[1] "Sl.No"       "Distance"    "roadORtrack" "Place"       "Time"        "Date"       

SUBSET 1

Subsetting a data frame inclusive of all variables.


WR.sub.df <- WR.df[1:10, ]
WR.sub.df 
NA
NA

Conversion of data frame to matrix.


WR.mat <- matrix(WR.sub.df)
WR.mat    
     [,1]      
[1,] Integer,10
[2,] Numeric,10
[3,] factor,10 
[4,] factor,10 
[5,] Numeric,10
[6,] factor,10 

Structure of matrix,

str(WR.mat)
List of 6
 $ : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ : num [1:10] 0.1 0.15 0.2 0.3 0.4 0.5 0.6 0.8 1 1.5
 $ : Factor w/ 2 levels "road","track": 2 2 2 2 2 2 2 2 2 2
 $ : Factor w/ 33 levels "Alphen aan den Rijn",..: 2 9 3 27 31 7 30 13 28 29
 $ : num [1:10] 0.163 0.247 0.322 0.514 0.72 ...
 $ : Factor w/ 37 levels "1978-10-28","1980-06-07",..: 32 5 14 26 23 7 9 18 24 21
 - attr(*, "dim")= int [1:2] 6 1

SUBSET 2

Subsetting a data frame with only first and last variable.


WR.sub1.df <- WR.df[, c(1,6)]
WR.sub1.df
NA

Saving as R object file.


save(WR.sub1.df, file = "WR.sub1.df.rdata")

CREATING A NEW DATA FRAME

A new data frame with 2 variables and 4 observations is created here. The variables being Building and Level.


newdf <- data.frame(Building = 80:83, Level = c("A", "B", "C", "D"))
newdf
NA

Structure and levels of ordinal variable could be obtained from doing the following,


str("Building")
 chr "Building"

str("Level")
 chr "Level"

levels("Level")
NULL

Creating a numeric vector and adding it to data frame using cbind().


Num <- c(1, 2, 3, 4)
newdf1 <- cbind(newdf, Num)
newdf1
NA

Attributes and dimension of the new data frame,


attributes(newdf1)
$names
[1] "Building" "Level"    "Num"     

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4

dim(newdf1)
[1] 4 3




