CHAPTER : VARAIBLES AND SUBJECTS
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages -------------------------------- tidyverse 1.2.1 --
## v tibble 2.0.1 v purrr 0.2.5
## v tidyr 0.8.2 v dplyr 0.7.8
## v readr 1.3.1 v stringr 1.3.1
## v tibble 2.0.1 v forcats 0.3.0
## Warning: package 'tibble' was built under R version 3.5.2
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'readr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
str(mpg)
## Classes 'tbl_df', 'tbl' and 'data.frame': 234 obs. of 11 variables:
## $ manufacturer: chr "audi" "audi" "audi" "audi" ...
## $ model : chr "a4" "a4" "a4" "a4" ...
## $ displ : num 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr "f" "f" "f" "f" ...
## $ cty : int 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr "p" "p" "p" "p" ...
## $ class : chr "compact" "compact" "compact" "compact" ...
##Subject means the smallest object or entity that you measure. In the mpg dataset, this is types of cars, and each row is a different type of car.
##The things that you measure are called variables. So for the first car, the manufacturer is Audi and the model is a4. You refer to the manufacturer and model as variables. In statistics, variables are
##classified into four main types:
##categorical ordinal,
##categorical nominal,
##quantitative continuous, and
##quantitative discrete.
##Categorical variables are things that can be classified with labels. Categorical ordinal are labels that have an order, for example the bronze, silver and gold medals in the Olympics, while categorical nominal are labels that do not have an order. In the mpg dataset, manufacturer is a categorical nominal variable, while model may be a categorical ordinal as models are often ordered according to price.
##Quantitative variables are things that are measured using numbers. Quantitative continuous variables can take any numerical value, including fractions and decimals. For example: time, temperature, length and weight, and anything that is derived from them. On the other hand, quantitative discrete variables are things that are counted, so only take whole-number values. For example you may count the number of people infected with a disease, or the number of cars that cross a bridge.
##The main difference is that a sufficiently accurate measuring device can measure quantitative continuous random variable to any value in a range, while discrete will always have gaps between the numbers, for example you can't have 3.5 people infected with a disease.
##In the mpg dataset, the variable cyl stands for the number of cylinders in the car. As cylinders are countable (rather than measured continuously), this variable is a quantitative discrete variable. The variable displ stands for an engine's displacement, which is a volume (in litres). Therefore, being something that is measured, it is a quantitative continuous variable.
##Notice that as displacement is measured to the nearest 0.1 of a litre, this variable ends up being discretised. However, it is still considered to be a continuous variable, as theoretically, volume could be recorded to any number of decimal places. On the other hand, you can never have half a cylinder, so the cyl variable will always be quantitative discrete
##Evaluate yourself
##Q: 1 How many subjects does the storms data.frame contain?
dim(storms)
## [1] 10010 13
#Ans : 10010
##Q: 2 How many variables does the mpg data.frame contain?
dim(mpg)
## [1] 234 11
#Ans : 11
##Q: 3 Which of the following variables are examples of categorical ordinal variables?
#Sizes of T-shirt available for sale at a clothing store
#Countries of Europe
#Severity of an injury, rated as "mild", "moderate", or "severe"
#Goals kicked by Lance Franklin in an AFL game
#Duration of a song in seconds
#Ans :
#Severity of an injury, rated as "mild", "moderate", or "severe"
#Sizes of T-shirt available for sale at a clothing store
##Q:4 Which of the following variables are examples of categorical nominal variables?
#Number of red cars parked in a parking garage
#Weight of bears living in a forest
#Breeds of dog
#Genres of music
#Responses to a survey where options are "Like", "Somewhat Like", "Neutral", "Somewhat Dislike", and "Dislike"
#Ans : Genres of music , Breeds of dog
##Q: 5 Which of the following variables are examples of quantitative discrete variables?
#Colour of cars parked in a parking garage
#Number of questions answered correctly in a multiple-choice quiz
#Torque produced by a car engine in Newton-metres
#Sizes of meal available at McDonalds, listed as "small", "medium", and "large"
#Points scored by LeBron James in the NBA Playoffs
#Ans :
#Number of questions answered correctly in a multiple-choice quiz
#Points scored by LeBron James in the NBA Playoffs
##Q: 6 Which of the following variables are examples of quantitative continuous variables?
#Amount of time taken to travel to work in minutes
#Brands of potato chip available for sale at a grocery store
#Number of patients in a hospital's Intensive Care Unit
#Daily maximum temperature in degrees Celsius
#Grade achieved by a student at university, classified as "fail", "pass", "credit", "distinction", or "high distinction"
#Ans :
#Daily maximum temperature in degrees Celsius
#Amount of time taken to travel to work in minutes