Question: How to learn about the size, shape, and structure of a dataframe?

The dim() function is an inbuilt R function that either sets or returns the dimension of the matrix, array, or data frame. The dim() function takes the R object as an argument and returns its dimension, or if you assign the value to the dim() function, then it sets the dimension for that R Object.

The str() function in R Language is used for compactly displaying the internal structure of a R object. It can display even the internal structure of large lists which are nested. It provides one liner output for the basic R objects letting the user know about the object and its constituents.

Finally, the commonly used summary() function is applied to each column of the specified data frame. This function returns a data synopsis of the range of values each column posesses.

We’ll use the “palmerpenguins” package for this example.

Data Preparation

library(palmerpenguins)
data(penguins)

Creating sample data frame

penguin_data <- data.frame(Flipper_Length = penguins$flipper_length_mm, 
                           Bill_Length = penguins$bill_length_mm, 
                           Bill_Depth = penguins$bill_depth_mm)

Retreiving info about data frame using dim()

dim(penguin_data)  # retrieve columns and rows
## [1] 344   3

Retreiving info about data frame using str()

str(penguin_data) # summary of structure of data frame
## 'data.frame':    344 obs. of  3 variables:
##  $ Flipper_Length: int  181 186 195 NA 193 190 181 195 193 190 ...
##  $ Bill_Length   : num  39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ Bill_Depth    : num  18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...

Retreiving info about data frame using summary()

summary(penguin_data) # synopsis of data parameters and ranges
##  Flipper_Length   Bill_Length      Bill_Depth   
##  Min.   :172.0   Min.   :32.10   Min.   :13.10  
##  1st Qu.:190.0   1st Qu.:39.23   1st Qu.:15.60  
##  Median :197.0   Median :44.45   Median :17.30  
##  Mean   :200.9   Mean   :43.92   Mean   :17.15  
##  3rd Qu.:213.0   3rd Qu.:48.50   3rd Qu.:18.70  
##  Max.   :231.0   Max.   :59.60   Max.   :21.50  
##  NA's   :2       NA's   :2       NA's   :2

Keywords

  1. dataframe
  2. dim()
  3. str()
  4. summary()
  5. summary
  6. structure
  7. shape
  8. size