Basic Descriptive Stats Exercise

Galton Exercises

Read in the Data

The data we will use are called ‘galton’, contained in the package ‘mosaic’. These data consist of children’s and their parents’ heights.

library(mosaic)

data(Galton)
str(Galton)

## 'data.frame':    898 obs. of  6 variables:
##  $ family: Factor w/ 197 levels "1","10","100",..: 1 1 1 1 108 108 108 108 123 123 ...
##  $ father: num  78.5 78.5 78.5 78.5 75.5 75.5 75.5 75.5 75 75 ...
##  $ mother: num  67 67 67 67 66.5 66.5 66.5 66.5 64 64 ...
##  $ sex   : Factor w/ 2 levels "F","M": 2 1 1 1 2 2 1 1 2 1 ...
##  $ height: num  73.2 69.2 69 69 73.5 72.5 65.5 65.5 71 68 ...
##  $ nkids : int  4 4 4 4 4 4 4 4 2 2 ...

#The str command tells us what the basics of our data are.

Calculate a mean for children’s heights

mean(height, data=Galton)

## [1] 66.76

Calculate a standard deviation for children’s heights

sd(height, data=Galton)

## [1] 3.583

Make a plot of children’s heights

hist(Galton$height, main='', xlab='Height (Inches)', ylab='Frequency',xlim=range(55:80))

plot of chunk unnamed-chunk-5

New Data Exercises

Get the data from Google Drive

library(mosaic)
library(RCurl)

These data are a somewhat close replication of Baumeister et al. (1998). We induce ego depletion by asking participants to count the letter e’s in a passage either using no rules (control) or many rules (depletion condition).

ego_data<-fetchGoogle("https://docs.google.com/spreadsheet/pub?key=0Ampua78j1HupdHpBakRhRVZfQ1RQRVpvbm1ycVpqSlE&output=csv")
str(ego_data)

## 'data.frame':    46 obs. of  28 variables:
##  $ start_time     : Factor w/ 45 levels "1/27/2014 10:10:00",..: 17 18 19 19 20 21 22 23 24 25 ...
##  $ stop_time      : Factor w/ 46 levels "1/27/2014 10:17:00",..: 17 19 18 21 20 22 35 23 24 25 ...
##  $ min_to_complete: num  7.32 7.62 5.25 14.52 8.43 ...
##  $ condition      : int  1 1 2 2 1 1 2 2 1 1 ...
##  $ count          : int  54 55 17 21 48 53 36 16 57 50 ...
##  $ e_timer        : num  114.3 114.8 160.7 174.4 68.8 ...
##  $ lett_diff      : int  4 3 5 6 2 5 3 5 3 5 ...
##  $ lett_conc      : int  6 5 6 7 6 6 6 6 6 7 ...
##  $ ana1_NIEDM     : Factor w/ 19 levels "","DEMIN","denim",..: 5 9 1 11 7 3 1 11 1 11 ...
##  $ ana2_LEESTC    : Factor w/ 19 levels "","celest","celest ",..: 1 1 1 1 7 13 1 18 1 14 ...
##  $ ana3_SDETRE    : Factor w/ 20 levels "","desert","Desert",..: 1 5 1 20 12 5 1 7 1 7 ...
##  $ ana4_HRBOT     : Factor w/ 16 levels "","broth","Broth",..: 1 11 12 11 10 11 7 15 15 11 ...
##  $ ana5_NELMO     : Factor w/ 19 levels "","lemon","Lemon",..: 4 10 3 18 11 11 11 1 1 2 ...
##  $ ana6_MILES     : Factor w/ 30 levels "","limes, miles",..: 16 3 1 30 5 26 9 20 14 18 ...
##  $ ana_time       : num  214 190.9 67.7 316.9 244.2 ...
##  $ ana1_score     : int  1 2 0 1 1 1 0 1 0 1 ...
##  $ ana2_score     : int  0 0 0 0 0 2 0 2 0 1 ...
##  $ ana3_score     : int  0 2 0 0 1 2 0 1 0 1 ...
##  $ ana4_score     : int  0 1 1 1 1 1 2 2 2 1 ...
##  $ ana5_score     : int  1 2 1 2 1 1 1 0 0 1 ...
##  $ ana6_score     : int  2 3 0 3 1 2 2 4 2 3 ...
##  $ anagram_sum    : int  4 10 2 7 5 9 5 10 4 8 ...
##  $ anagram_avg    : num  0.667 1.667 0.333 1.167 0.833 ...
##  $ ana_clicks     : int  12 9 4 13 20 8 8 13 6 17 ...
##  $ gender         : int  2 2 2 2 2 2 1 2 1 2 ...
##  $ age            : int  19 22 19 19 21 21 20 21 21 19 ...
##  $ race           : int  4 6 4 4 5 NA 6 6 4 6 ...
##  $ suspic         : Factor w/ 42 levels "","all of the anagrams had 'e's in them",..: 42 16 18 6 36 22 1 21 5 37 ...

Descriptive Statistics

Let’s use the psych package to get some basic descriptives. I’ll load library(psych) behind the scenes.

describe(ego_data)

##                 vars  n    mean      sd median trimmed    mad   min
## start_time*        1 46   22.91   13.00  22.50   22.89  16.31  1.00
## stop_time*         2 46   23.50   13.42  23.50   23.50  17.05  1.00
## min_to_complete    3 46   99.40  339.75   9.02   10.59   4.46  2.22
## condition          4 46    1.48    0.51   1.00    1.47   0.00  1.00
## count              5 46   36.07   18.79  42.00   36.47  23.72  9.00
## e_timer            6 46  166.99   84.67 148.25  158.45  68.29 24.50
## lett_diff          7 46    4.22    1.47   5.00    4.32   1.48  1.00
## lett_conc          8 46    5.85    1.07   6.00    6.00   0.00  2.00
## ana1_NIEDM*        9 46    7.57    5.52   6.50    7.21   6.67  1.00
## ana2_LEESTC*      10 46    6.52    6.09   4.50    5.95   5.19  1.00
## ana3_SDETRE*      11 46    6.85    6.14   5.00    6.24   5.93  1.00
## ana4_HRBOT*       12 46    9.65    4.72  11.00    9.95   3.71  1.00
## ana5_NELMO*       13 46    7.74    5.79   6.50    7.37   6.67  1.00
## ana6_MILES*       14 46   13.13    8.62  11.50   12.71   9.64  1.00
## ana_time          15 46 1627.23 9308.00 217.27  236.00 104.52 36.31
## ana1_score        16 46    0.89    0.53   1.00    0.87   0.00  0.00
## ana2_score        17 46    0.57    0.75   0.00    0.47   0.00  0.00
## ana3_score        18 46    0.87    0.72   1.00    0.82   0.00  0.00
## ana4_score        19 46    1.11    0.60   1.00    1.13   0.00  0.00
## ana5_score        20 46    1.13    0.62   1.00    1.16   0.00  0.00
## ana6_score        21 46    1.96    1.11   2.00    1.89   1.48  0.00
## anagram_sum       22 46    6.52    3.13   6.00    6.47   2.97  0.00
## anagram_avg       23 46    1.09    0.52   1.00    1.08   0.49  0.00
## ana_clicks        24 46   11.83    6.09  11.00   11.42   5.19  1.00
## gender            25 46    1.72    0.46   2.00    1.76   0.00  1.00
## age               26 46   19.54    1.19  19.00   19.50   1.48 18.00
## race              27 45    4.16    1.00   4.00    4.16   0.00  1.00
## suspic*           28 46   20.85   12.11  19.50   20.79  14.83  1.00
##                      max    range  skew kurtosis      se
## start_time*        45.00    44.00  0.02    -1.25    1.92
## stop_time*         46.00    45.00  0.00    -1.28    1.98
## min_to_complete  1662.60  1660.38  3.57    11.54   50.09
## condition           2.00     1.00  0.08    -2.04    0.07
## count              59.00    50.00 -0.14    -1.79    2.77
## e_timer           413.01   388.52  1.01     0.71   12.48
## lett_diff           6.00     5.00 -0.61    -0.87    0.22
## lett_conc           7.00     5.00 -1.91     4.69    0.16
## ana1_NIEDM*        19.00    18.00  0.32    -1.30    0.81
## ana2_LEESTC*       19.00    18.00  0.58    -1.24    0.90
## ana3_SDETRE*       20.00    19.00  0.67    -1.02    0.90
## ana4_HRBOT*        16.00    15.00 -0.68    -0.87    0.70
## ana5_NELMO*        19.00    18.00  0.37    -1.31    0.85
## ana6_MILES*        30.00    29.00  0.35    -1.22    1.27
## ana_time        63373.51 63337.20  6.34    39.10 1372.39
## ana1_score          2.00     2.00 -0.13     0.33    0.08
## ana2_score          2.00     2.00  0.87    -0.74    0.11
## ana3_score          3.00     3.00  0.54     0.14    0.11
## ana4_score          2.00     2.00 -0.04    -0.40    0.09
## ana5_score          2.00     2.00 -0.08    -0.52    0.09
## ana6_score          4.00     4.00  0.46    -0.80    0.16
## anagram_sum        14.00    14.00  0.33    -0.62    0.46
## anagram_avg         2.33     2.33  0.33    -0.62    0.08
## ana_clicks         26.00    25.00  0.62    -0.39    0.90
## gender              2.00     1.00 -0.93    -1.15    0.07
## age                22.00     4.00  0.37    -1.08    0.18
## race                6.00     5.00 -0.57     3.04    0.15
## suspic*            42.00    41.00  0.05    -1.18    1.79

Paul’s Stats Data

textbook_data<-fetchGoogle("https://docs.google.com/spreadsheet/pub?key=0Ampua78j1HupdE5OaDFOZjlNbWxwdXRYeHJJWUR0SkE&output=csv")
str(textbook_data)

## 'data.frame':    30 obs. of  3 variables:
##  $ Price: num  4.25 5.95 7 6.5 7 ...
##  $ Pages: int  57 194 51 104 294 140 336 150 600 91 ...
##  $ Year : int  2006 1972 2004 2005 2002 1991 1973 2003 2004 1997 ...