The goal of this tutorial is to reduce the dimension of a dataframe into a vector. This could be very useful if we want to create a histogram of all the dataset.
#First we load the libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(ggplot2)
# In this tutorial we will use the dataset co2 levels on Manua Loa
data(co2)
# However this dataset contains a time series
# Let's create the propper dataframe
my_dimnames <- list(month.abb, unique(floor(time(co2))))
co2_df <- as.data.frame(t(matrix(co2, 12, dimnames = my_dimnames)))
head(co2_df)
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 1959 315.42 316.31 316.50 317.56 318.13 318.00 316.39 314.65 313.68 313.18
## 1960 316.27 316.81 317.42 318.87 319.87 319.43 318.01 315.74 314.00 313.68
## 1961 316.73 317.54 318.38 319.31 320.42 319.61 318.42 316.63 314.83 315.16
## 1962 317.78 318.40 319.53 320.42 320.85 320.45 319.45 317.25 316.11 315.27
## 1963 318.58 318.92 319.70 321.22 322.08 321.31 319.58 317.61 316.05 315.83
## 1964 319.41 320.07 320.74 321.40 322.06 321.73 320.27 318.54 316.54 316.71
## Nov Dec
## 1959 314.66 315.43
## 1960 314.84 316.03
## 1961 315.94 316.85
## 1962 316.53 317.53
## 1963 316.91 318.20
## 1964 317.53 318.55
# We can transform a table into a single vector
# First we transform it into a numerical matrix
# With the function c the matrix is then transformed into a vector
co2_vector <- c(as.matrix(co2_df))
str(co2_vector)
## num [1:468] 315 316 317 318 319 ...
ggplot() + geom_histogram(aes(x = co2_vector), bins = 50) + ggtitle("Co2 levels on Manua Loa") +
theme(plot.title = element_text(hjust = 0.5))
In this tutorial we have learnt how to transform a dataframe into a vector in order to study the table as a whole. We can then make histograms of the table to study distributions.