## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.1 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

the first 3 tasks with use mtcars data. check out the data frame using:

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Task 1: stacked bar vs pie chart

visualize car models by gear and carb using both stacked bar and pie charts.

1. Stacked bar chart

2. Pie Chart

separate the pie chart by gear

It is found that most of the Car Model have 3 gears and 2 or 4 carbs.

Task 2: correlation matrix

The correlations are shown in two colors, blue refers to positive while red refers to negative. The lighter the color,the smaller the size, the closer to 0. The significant level chosen is 0.01, the insignificant correlations were removed.

It is found that vs and cyl are the most negatively correlated factors whereas hp and cyl are the most positively correlated factors.

Task 3: heatmap

use the mtcars data for this task.

the task consists of the following steps:

a)use a heatmap to visualize the data
b)normalize the heatmap by column, and remove the dendrogram
c)customize coloring by colorRampPalette. select your own color palette.

Base on the plot, the correlations are shown continuous color palette. Red refers to -1, yellow refers to 0, green refers to 1.

Task 4: streamgraph.

  1. select columns of year and genre. notice that the data has a wide format and you need to convert it into long format first.
    hint: use the function “melt” from reshape2 library to convert the data frame
  2. count the frequency of each genre per year
  3. show the result with a streamgraph, and include a selection menu