“— title:”DA" author: “Graham Smith” date: “01/28/2016” output: html_document —
library(tigerstats)
We are looking at the data frame verlander in the tigerstats package.
We would like to see whether the speed of a Justin Verlander pitch depends on the type of pitch he is throwing.
We know that the two variables involved in our Research Question are speed and pitch_type.
What kind of variable is the explanatory variable pitch_type: factor or numerical? Factor variable
What kind of variable is the response variable speed: factor or numerical? Numerical variable
Which of the following types of graph could we use to look at the pitching speeds: bar-charts or density plots? Why? Density plot because it will show the amount of speed and where they are huddled under the bell curve. The reason why you use a Density-plot is for a numerical data set.
Here is a code chunk containing code from the slides to make density plots to compare the fastest speeds driven by guys and the fastest speeds driven by gals in the m1111survey data:
densityplot(~fastest|sex,data=m111survey,
main="Fastest Speed,by sex",
xlab="Fastest Speed (mph)",
layout=c(1,2))
In the chunk below, write the code you need in order to make density plots to compare the speeds for the various types of pitches that Justin Verlander throws. (Tip: copy and paste the code from the previous chunk into the chunk below. Then modify it to fit the Verlander situation. **Big Hint: Find out HOW MANY types of pitches Verlander throws, so you can changwe the layout arugment correctly.)
densityplot(~speed|pitch_type,data=verlander,main="Speed, by Pitch Type",xlab="speed",layout=c(1,5))
Study the same Research Question numerically, now.
Should you use xtabs() to make a two-way table, or should you use favstats()? Why? favstats is used to interpret a numerical variable
Use favstats() to break the speeds down by the type of pitch thrown. (Tip: look back in the slides to find the code used to break down the fastest speed driven by sex. Copy this code into the chunk below, then make the needed changes.)
favstats(speed~pitch_type,data=verlander)
## pitch_type min Q1 median Q3 max mean sd n missing
## 1 CH 81.0 85.3 86.7 88.2 91.9 86.91929 2.343242 2550 0
## 2 CU 58.9 79.0 80.2 81.5 86.9 80.23111 1.929778 2716 0
## 3 FF 91.7 94.6 96.0 97.5 102.4 96.02916 2.031015 6756 0
## 4 FT 90.5 94.6 96.2 97.6 102.1 96.06893 2.209033 2021 0
## 5 SL 78.0 84.5 86.3 87.9 93.3 86.16329 2.386953 1264 0
Which types of pitches are the fastest? Which types are the slowest?
Now we will look at the m111survey data, and ask the Research Question:
Who is more likely to believe in love at first sight: a GC gal or a GC guy?
What are the two variables form the m111survey data frame that are involved in this Research Question?
love first is a factoring variable, and sex is a factoring variable.
Which one would you say is the explanatory variable? your sex helps to determine whether or not to believe in love at first sight.
Which one would you say is the response variable? love at first sight is most likely caused by the sex of the person who is being interviewed.
For each variable, say what type of variable it is: factor or numerical. Both are factor variables.
Why would it not be correct to make density plots to study this Research Question?
The following code makes a two-way table of the responses about love at first sight, broken down by the sex of the respondent:
sexLove <- xtabs(~sex+love_first,data=m111survey)
sexLove
## love_first
## sex no yes
## female 22 18
## male 23 8
Run this code. How many people in the study were males who believe in love at first sight?
Males: 8 Females:18
Insert code below to make row percents for the two-way table stored as sexLove:
rowPerc(sexLove)
## love_first
## sex no yes Total
## female 55.00 45.00 100.00
## male 74.19 25.81 100.00
What percentage of the females believe in love at first sight?
45%
What percentage of the males believe in love at first sight?
25.81%
Here is some code (from the slides) that makes a bar-chart to study the relationship between batter hand and type of pitch thrown by Justin Verlander:
barchartGC(~batter_hand+pitch_type,data=verlander,
main="Verlander's Pitches, by Batter Stance",
type="percent")
In the chunk below, write the code to make a bar-chart to study the relationship between sex and love_first in the mat111survey data. (Tip: Copy and paste the code from the previous chunk, and then make the necessary changes.)
barchartGC(~sex+love_first,data=m111survey,main= "Love at first sight, by sex",type="percent")
Here’s the code to make favstats() on the speeds of Justin Verlander’s pitches:
favstats(~speed,data=verlander)
In the code chunk below, enter the code to make favstats() for the fastest speeds driven by the students in the m111survey data.
favstats(~fastest,data=m111survey)
## min Q1 median Q3 max mean sd n missing
## 60 90.5 102 119.5 190 105.9014 20.8773 71 0
What is the median of the fastest speeds?
The first quartile? 90.5 mph
The third quartile? 119.5 mph
The interquartile range (IQR)? 119.5-90.5 “29 mph”
Here’s the code to make a box plot of the fastest speeds, broken down by sex:
bwplot(fastest~sex,data=m111survey,
main="Fastest Speed at GC, by Sex",
xlab="sex",
ylab="speed (mph)")
Make your own code chunk below, and inside it put the code needed for a box-plot of the speed of Justin Verlander’s pitches, broken down by the type of pitch he threw. Tip: After you insert eh chunk, copy the code above and paste it into the chunk, then modify it so that it does what you want.
bwplot(speed~pitch_type,data=verlander,main="Speed by Pitch Type",xlab="pitch_type",ylab="speed")
Ten thousand people were asked to give their “temperature rating” of Kim Kardashian, on a scale from 0 to 100. 0 means you don’t like her at all, and 100 means you like her very much. Here is some code for a density plot of the ratings:
densityplot(~kkardashtemp,data=imagpop,
main="Kim Kardashian Ratings",
xlab="rating",
from=0,to=100,
plot.points=FALSE)
Is the distribution symmetric or skewed?
Is the distribution unimodal, bimodal, or neither?
True or False: Most of the ratings were between 40 and 60.
About 68% of Justin Verlander’s four-seam fast were between ? and ?? (Give the two speeds.)
About what percentage of the time were his four-seam fastballs faster than 100 miles per hour?
Find a speed so that about 16% of his four-seam fastballs were slower than that speed.
Sleep<- equal.count(m111survey$sleep, number = 2, overlap = 0.1)
cloud(fastest ~ height * GPA | extra_life * Sleep,
data = m111survey,
layout = c(2,2),
strip = strip.custom(strip.names = c(TRUE, TRUE)),
screen = list(x = -69,
y = 52,
z = 0),
groups = sex,
auto.key = list(
space = "top",
title = "sex of the student ",
cex.title = 0.8,
columns = 2),
pch = 20,
zoom = 0.65,
main = "Graph")