CS50R. Week 5

Data Visualization

Loading libraries

library(ggplot2)
library(tidyr)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(hrbrthemes)

Loading data

Here I will load data from a csv file with the information about prices of different grains throughout the years stating with the year 2000.

grains <- read.csv("GrainPrices.csv")

Exploring data

The number of rows is 5.757

nrow(grains)
## [1] 5757

Here we can see that we have four types of grains: wheat, milo, corn, and beans.

names(grains)
## [1] "Date"  "Wheat" "Milo"  "Corn"  "Beans"

I convert the Date column into the date format and check the earliest and latest date for our data. The range is from 3rd of January 2000 until 14th of December 2022.

grains$Date <- as.Date(grains$Date, format = "%m/%d/%y")
min(grains$Date)
## [1] "2000-01-03"
max(grains$Date)
## [1] "2022-12-14"

Visualization of wheat data

Let’s first try to visualize the prices for wheat.

wheat <- grains[, 1:2]

wheat |>
  ggplot(aes(x = Date, y = Wheat)) +
    geom_area(fill = "#E3E963", alpha = 0.5) +
    geom_line(color = "#E3E963") +
    labs (
      x = "",
      y = "Wheat Price",
      title = "The Distribution of Wheat Prices"
    ) +
    theme_ipsum()

Visualization of all grains

Now let’s visualize the data for all grain types.

grain_types = names(grains)[-1]

grains_long <- grains |>
  pivot_longer(cols = all_of(grain_types), names_to = "Grain", values_to = "Price")

grains_long |>
  ggplot(aes(x = Date, y = Price, color = Grain)) +
  geom_line(
    linetype = 1,
    linewidth = 0.3
  ) +
  labs (
      x = "",
      y = "Wheat Price",
      title = "The Distribution of Grains Prices"
    ) +
    theme_ipsum()