1.1 - Introduction to Data Visualization


Introduction to Data Visualization

🎞 Introduction to Data Visualization

📖 Introduction to Data Visualization

  • Plots of data easily communicate information that is difficult to extract from tables of raw values.
  • Data visualization is a key component of exploratory data analysis (EDA), in which the properties of data are explored through visualization and summarization techniques.
  • Data visualization can help discover biases, systematic errors, mistakes and other unexpected problems in data before those data are incorporated into potentially flawed analysis.
  • This course covers the basics of data visualization and EDA in R using the ggplot2 package and motivating examples from world health, economics and infectious disease.
library(dslabs)
data(murders)
head(murders)

Introduction to Distributions

🎞 Introduction to Distributions

📖 Visualizing data distributions

  • The most basic statistical summary of a list of objects is its distribution.
  • We will learn ways to visualize and analyze distributions in the upcoming videos.
  • In some cases, data can be summarized by a two-number summary: the mean and standard deviation. We will learn to use data visualization to determine when that is appropriate.

Data Types

🎞 Data Types

📖 Variable types

  • Categorical data are variables that are defined by a small number of groups.
    • Ordinal categorical data have an inherent order to the categories (mild/medium/hot, for example).
    • Non-ordinal categorical data have no order to the categories.
  • Numerical data take a variety of numeric values.
    • Continuous variables can take any value.
    • Discrete variables are limited to sets of specific values.