Introduction to Data Analysis and Visualization

Anand Lakshmanan (consulting requests : @lan24hd)

About Me

  • Software engineering
  • Hardware-software interfacing
  • Hardware engineeing
  • Data visualization

Data Visualization can be used in these fields

  • Business
  • Engineering
  • Finance
  • Marketing
  • Any other field you can think of !

Data Visualization CANNOT be used in these fields

Why visualize data ?

  • Example!
  • Height data of a family:

    120,110,170,180,176,171,101 cms.

  • Your job is to select t-shirt sizes for them.

  • How many small t-shirts and how many large t-shirts ?

  • What else can we infer about this family ?

Why visualize data ?

plot of chunk unnamed-chunk-1

Visualizations can surprise you !” - Hadley Wickham

3 small children, 2 parents, and 2 grandparents.

Therefore: 3 small t-shirts and 4 large t-shirts!

Visualize data as a table

name height
Anil 120
Bobby 110
Manju 170
Rekha 180
Peter 176
Rohit 171
Ananya 101

1 - variable plot

plot of chunk unnamed-chunk-4

2 - variable plot

plot of chunk unnamed-chunk-5

3 - variable plot

plot of chunk unnamed-chunk-6

Grandparents live in TN; parents and children live in KA!

4 - variable plot

plot of chunk unnamed-chunk-7

Children like soccer !

R and R-Studio Installation

Install R

https://cloud.r-project.org

Install R-Studio https://rstudio.com

Open R-Studio and run

install.packages(“tidyverse”)

Visualize data as a table

library(tidyverse)
height_data <- data_frame(name = c("Anil","Bobby","Manju","Rekha",
"Peter","Rohit","Ananya"),
height = c(120,110,170,180,176,171,101))
kable(height_data)

1 - variable plot

height_data %>% ggplot(aes(height)) + geom_bar() + 
  theme_minimal() + theme(axis.title = element_text(size = 20))

2 - variable plot

height_data %>% ggplot(aes(name,height)) + geom_point(size = 20) + 
  theme_minimal() + theme(axis.title = element_text(size = 20))

3 - variable plot

height_data <- height_data %>% 
  mutate(state = c("KA","KA","TN","KA","KA","TN","KA"))
height_data %>% ggplot(aes(name,height, color = state)) + geom_point(size = 20) + 
  theme_minimal() + theme(axis.title = element_text(size = 20))

4 - variable plot

height_data <- height_data %>% 
  mutate(favorite_sport = c("Soccer","Soccer","Cricket","Hockey",
"Soccer","Hockey","Soccer"))
height_data %>% ggplot(aes(name,height, color = state, shape = favorite_sport)) + 
  geom_point(size = 20) + 
  theme_minimal() + theme(axis.title = element_text(size = 20))