library(tigerstats)

Descriptive Statistics Part I

Verlander Pitching

We are looking at the data frame verlander in the tigerstats package.

We would like to see whether the speed of a Justin Verlander pitch depends on the type of pitch he is throwing.

Practice Part a

We know that the two variables involved in our Research Question are speed and pitch_type.

What kind of variable is the explanatory variable pitch_type: factor or numerical?

Factor

What kind of variable is the response variable speed: factor or numerical?

Numerical

Practice Part b

Which of the following types of graph could we use to look at the pitching speeds: bar-charts or density plots? Why?

Density plots, bar charts are not able to handle numerical values.

Practice Part c

Here is a code chunk containing code from the slides to make density plots to compare the fastest speeds driven by guys and the fastest speeds driven by gals in the m1111survey data:

densityplot(~fastest|sex,data=m111survey,
       main="Fastest Speeds, by Sex",
       xlab="Fastest Speed (mph)",
       layout=c(1,2))
favstats(speed~pitch_type,data=verlander)
##   pitch_type  min   Q1 median   Q3   max     mean       sd    n missing
## 1         CH 81.0 85.3   86.7 88.2  91.9 86.91929 2.343242 2550       0
## 2         CU 58.9 79.0   80.2 81.5  86.9 80.23111 1.929778 2716       0
## 3         FF 91.7 94.6   96.0 97.5 102.4 96.02916 2.031015 6756       0
## 4         FT 90.5 94.6   96.2 97.6 102.1 96.06893 2.209033 2021       0
## 5         SL 78.0 84.5   86.3 87.9  93.3 86.16329 2.386953 1264       0
bwplot(speed~pitch_type,data=verlander,
         main="Speed by pitch type",
       xlab="pitch",
       ylab="Speed (mph)")

In the chunk below, write the code you need in order to make density plots to compare the speeds for the various types of pitches that Justin Verlander throws. (Tip: copy and paste the code from the previous chunk into the chunk below. Then modify it to fit the Verlander situation. **Big Hint: Find out HOW MANY types of pitches Verlander throws, so you can changwe the layout arugment correctly.)

versmall<-subset(verlander,speed>70)
densityplot(~speed|pitch_type,data=versmall,
       main="Pitch types",
       xlab="Speed (mph)",
       layout=c(1,5))

Practice Part d

Study the same Research Question numerically, now.

Should you use xtabs() to make a two-way table, or should you use favstats()? Why?

favstats(), because favstats can handle multiple variables, such as 5 different pitch types.

favstats(speed~pitch_type,data=verlander)
##   pitch_type  min   Q1 median   Q3   max     mean       sd    n missing
## 1         CH 81.0 85.3   86.7 88.2  91.9 86.91929 2.343242 2550       0
## 2         CU 58.9 79.0   80.2 81.5  86.9 80.23111 1.929778 2716       0
## 3         FF 91.7 94.6   96.0 97.5 102.4 96.02916 2.031015 6756       0
## 4         FT 90.5 94.6   96.2 97.6 102.1 96.06893 2.209033 2021       0
## 5         SL 78.0 84.5   86.3 87.9  93.3 86.16329 2.386953 1264       0

Practice Part e

Use favstats() to break the speeds down by the type of pitch thrown. (Tip: look back in the slides to find the code used to break down the fastest speed driven by sex. Copy this code into the chunk below, then make the needed changes.)

favstats(speed~pitch_type,data=verlander)
##   pitch_type  min   Q1 median   Q3   max     mean       sd    n missing
## 1         CH 81.0 85.3   86.7 88.2  91.9 86.91929 2.343242 2550       0
## 2         CU 58.9 79.0   80.2 81.5  86.9 80.23111 1.929778 2716       0
## 3         FF 91.7 94.6   96.0 97.5 102.4 96.02916 2.031015 6756       0
## 4         FT 90.5 94.6   96.2 97.6 102.1 96.06893 2.209033 2021       0
## 5         SL 78.0 84.5   86.3 87.9  93.3 86.16329 2.386953 1264       0

Practice Part f

Which types of pitches are the fastest? Which types are the slowest?

Ft are the fastest, Cu were the slowest

Love at First Sight

Now we will look at the m111survey data, and ask the Research Question:

Who is more likely to believe in love at first sight: a GC gal or a GC guy?

Practice Part a

What are the two variables form the m111survey data frame that are involved in this Research Question?

love_first and sex

Which one would you say is the explanatory variable?

sex

Which one would you say is the response variable?

love_first

For each variable, say what type of variable it is: factor or numerical.

Practice Part b

Why would it not be correct to make density plots to study this Research Question?

Density plots are for numerical variables, both of these are factor

Practice Part c

The following code makes a two-way table of the responses about love at first sight, broken down by the sex of the respondent:

sexLove <- xtabs(~sex+love_first,data=m111survey)
sexLove
##         love_first
## sex      no yes
##   female 22  18
##   male   23   8

Run this code. How many people in the study were males who believe in love at first sight?

8

Practice Part c

Insert code below to make row percents for the two-way table stored as sexLove:

rowPerc(sexLove)
##         love_first
## sex          no    yes  Total
##   female  55.00  45.00 100.00
##   male    74.19  25.81 100.00

What percentage of the females believe in love at first sight?

45%

What percentage of the males believe in love at first sight?

26%

Practice Part d

Here is some code (from the slides) that makes a bar-chart to study the relationship between batter hand and type of pitch thrown by Justin Verlander:

barchartGC(~batter_hand+pitch_type,data=verlander,
           main="Verlander's Pitches, by Batter Stance",
           type="percent")

In the chunk below, write the code to make a bar-chart to study the relationship between sex and love_first in the mat111survey data. (Tip: Copy and paste the code from the previous chunk, and then make the necessary changes.)

barchartGC(~love_first+sex,data=m111survey,
           main="Male vs Female love at first site",
           type="percent")

Medians and Boxplots

Pratice a

Here’s the code to make favstats() on the speeds of Justin Verlander’s pitches:

favstats(~speed,data=verlander)

In the code chunk below, enter the code to make favstats() for the fastest speeds driven by the students in the m111survey data.

favstats(~fastest,data=m111survey)
##  min   Q1 median    Q3 max     mean      sd  n missing
##   60 90.5    102 119.5 190 105.9014 20.8773 71       0

What is the median of the fastest speeds?

102

The first quartile? 90.5

The third quartile? 119.5

The interquartile range (IQR)?

29

Practice b

Here’s the code to make a box plot of the fastest speeds, broken down by sex:

bwplot(fastest~sex,data=m111survey,
       main="Fastest Speed at GC, by Sex",
       xlab="sex",
       ylab="speed (mph)")

Make your own code chunk below, and inside it put the code needed for a box-plot of the speed of Justin Verlander’s pitches, broken down by the type of pitch he threw. Tip: After you insert eh chunk, copy the code above and paste it into the chunk, then modify it so that it does what you want.

bwplot(speed~pitch_type,data=versmall,
       main="Speed of pitches by pitch type",
       xlab="sex",
       ylab="speed (mph)")

Kim Kardashian

Ten thousand people were asked to give their “temperature rating” of Kim Kardashian, on a scale from 0 to 100. 0 means you don’t like her at all, and 100 means you like her very much. Here is some code for a density plot of the ratings:

densityplot(~kkardashtemp,data=imagpop,
            main="Kim Kardashian Ratings",
            xlab="rating",
            from=0,to=100,
            plot.points=FALSE)

Is the distribution symmetric or skewed?

Is the distribution unimodal, bimodal, or neither?

True or False: Most of the ratings were between 40 and 60.

Justin Verlander Again

Practice Part a

About 68% of Justin Verlander’s four-seam fast were between ? and ?? (Give the two speeds.)

Practice Part b

About what percentage of the time were his four-seam fastballs faster than 100 miles per hour?

Practice Part c

Find a speed so that about 16% of his four-seam fastballs were slower than that speed.

graphs are fun

Sleep<- equal.count(m111survey$sleep, number = 2, overlap = 0.1)
cloud(height ~ GPA * fastest | enough_Sleep * Sleep,
    data = m111survey,
    layout = c(2,2),
    screen = list(x = -91,
            y = 24,
            z = 360),
    groups = extra_life,
    auto.key = list(
        space = "top",
        title = "extra_life",
        cex.title = 1,
        columns = 1),
    pch = 25,
    zoom = 0.65)