Data Frame

Data frame is a 2 dimensional data structure in R. It is a list which have components with equal length. Each column in daata frame can also store different class of object

Creating a data frame

We can create data frame either by importing it by reading from external file(json,csv), reading it from r dataset or manualy creating it.

Manualy creating a data frame

df <- data.frame(int = 1:3, boolean = c(T,T,F), str= c("one","two","three") )
print(df)
##   int boolean   str
## 1   1    TRUE   one
## 2   2    TRUE   two
## 3   3   FALSE three

Data frame can be created using a vector

a = c(1,2,3)
b = c(T,F,T)
df <- data.frame(a,b)
print(df)
##   a     b
## 1 1  TRUE
## 2 2 FALSE
## 3 3  TRUE

When creating a data frame, each component need to have equal length, otherwise it will return an error.

try(data.frame(int = 1:3, boolean = c(T,T)))
## Error in data.frame(int = 1:3, boolean = c(T, T)) : 
##   arguments imply differing number of rows: 3, 2

###Reading from R dataset

data("mtcars")
print(head(mtcars))
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Data frame column name and row names

We can see and edit the column and row label with these methods

Edit and see column name

#see column name
names(df)
## [1] "a" "b"
#edit column name
names(df) <- c("one","two")
names(df)
## [1] "one" "two"

Edit and see column name

#see row names
row.names(df)
## [1] "1" "2" "3"
#edit row names
row.names(df) <- c("a","b","c")
row.names(df)
## [1] "a" "b" "c"

Investigating dataframe dimension

We can see dimension of our dataframe with these methods

check dataframe dimension

#check row size
nrow(df)
## [1] 3
#check col size
ncol(df)
## [1] 2
length(df)
## [1] 2
#check overall dimension
dim(df)
## [1] 3 2

Accessing component

We can access component from dataframe with these methods

Accessing it like a list

df["one"]
##   one
## a   1
## b   2
## c   3

Accessing it like a matrix

df[1,2]
## [1] TRUE
df[,2]
## [1]  TRUE FALSE  TRUE
df[1,]
##   one  two
## a   1 TRUE

Adding new component

We can add new component into dataframe using these methods

Adding new row

df
##   one   two
## a   1  TRUE
## b   2 FALSE
## c   3  TRUE
df = rbind(df,list(1,TRUE))
df
##   one   two
## a   1  TRUE
## b   2 FALSE
## c   3  TRUE
## 4   1  TRUE

Adding new column

df
##   one   two
## a   1  TRUE
## b   2 FALSE
## c   3  TRUE
## 4   1  TRUE
df = cbind(df,newcol=c("one","two","three","four"))
df
##   one   two newcol
## a   1  TRUE    one
## b   2 FALSE    two
## c   3  TRUE  three
## 4   1  TRUE   four

Removing new component

We can remove component from dataframe using these methods

Removing it like a list

df
##   one   two newcol
## a   1  TRUE    one
## b   2 FALSE    two
## c   3  TRUE  three
## 4   1  TRUE   four
df$newcol <- NULL
df
##   one   two
## a   1  TRUE
## b   2 FALSE
## c   3  TRUE
## 4   1  TRUE

removing it like matrix

df
##   one   two
## a   1  TRUE
## b   2 FALSE
## c   3  TRUE
## 4   1  TRUE
df <- df [-2,]
df
##   one  two
## a   1 TRUE
## c   3 TRUE
## 4   1 TRUE

Getting summary of dataframe

we can get the sumary of every column in data frame using these methods

Sumarry and checking structure

str(df)
## 'data.frame':    3 obs. of  2 variables:
##  $ one: num  1 3 1
##  $ two: logi  TRUE TRUE TRUE
summary(df)
##       one          two         
##  Min.   :1.000   Mode:logical  
##  1st Qu.:1.000   TRUE:3        
##  Median :1.000                 
##  Mean   :1.667                 
##  3rd Qu.:2.000                 
##  Max.   :3.000