Week 1 and 2

Download R and RStudio

What is R?

Top of page
  • R is a free software environment originally designed for statistical computing and graphics. It has grown considerably in capability and usability over the years. Now it is an excellent “jackknife” for anything that you might want to do with data.
  • R is open-source and open-development, which means that thousands of users contribute improvements and enhancements, without changing the basic functionality.


Why use R?

Top of page
  • It’s free!
  • Statistics plus data management/representation/analysis
  • Great graphics
  • Ever-expanding capabilities (just about any statistics under the sun)
  • Can easily import data from most other major programs (including SPSS, SAS, Stata, Excel)
  • Beautifully-formatted output (R notebooks, R bookdown)
  • Not too hard to learn (especially after this workshop!)
  • Comprehensive help resources (internal help;r-bloggers.com Google search)
  • Excellent video tutorials: MarinStatsLectures

Check out the following opinions: Opinion 1. Opinion 2. Opinion 3. Opinion 4.


Capabilities of R:

What is Rstudio?

Top of page

RStudio is a graphical user interface for R which includes a set of integrated tools designed to help you be more productive with R. It includes:

  • Console, which displays executed R commands
  • An editor (with syntax highlighting). Code may be executed directly from the editor.
  • History viewer (record of past commands)
  • Environment viewer (displays all variables in your current workspace)
  • Package installer (adds new capabilities to R)
  • Plot viewer / exporter
  • Help window


Downloading and installing R and Rstudio (windows)

Top of page


Exercises:

Top of page
  1. Install R (either from USB or from the Internet)
  2. Install Rstudio (either from USB or from the Internet)

Note: Once R and Rstudio are installed, it is not necessary to start R, because Rstudio will start it


Online discussion 1:

Measurement Scales:

There are 4 types of measurement scales: 1. Nominal/Categorical

  1. Ordinal

  2. Interval

  3. Ratio

These are simply ways to categorize different types of variables. 

1. Nominal/Categorical:

o Used to label variables without any quantitative value.

o “Nominal” scales could also be called “labels.” 

o Scales are mutually exclusive.

o Categories have no intrinsic ranking.

Examples:

o Gender

o Zip code

o Eye color

2. Ordinal:

o Allows for rank order (1st, 2nd, 3rd, etc.) by which data can be sorted.

o Does not allow for relative degree of difference between them.

Examples

o ‘sick’ vs. ‘healthy’ when measuring health,

o ‘guilty’ vs. ‘not-guilty’ when making judgments in courts,

o ‘wrong/false’ vs. ‘right/true’ when measuring truth value,

o a spectrum of values, such as ‘completely agree’, ‘mostly agree’, ‘mostly disagree’, ‘completely disagree’ when measuring opinion.

3. Interval:

o Interval scales provide information about order,

o Possess equal intervals.

Examples:

o Interval scale is temperature,

o Interval time of day.

4. Ratio:

o Has an absolute zero (a point where none of the quality being measured exists),

o Tell us the exact value between units,

o Quantitative variable

Examples:

o Salary

o Height

o Weight

COMPLETE ONLINE DISCUSSION 1 ON CANVAS

TExES Domain IV

Competency 015: The teachers understands how to use appropriate graphical and numerical techniques to explore data, characterize patterns and describe departures from patterns.

The beginning teacher:

  1. Selects and uses an appropriate measurement scale (i.e nominal, ordinal, interval, ratio) to answer research questions and analyze data.

  2. Organizes, displays and interprets data in a variety of formats (eg. tables, frequency distributions, scatterplots, stem-and-leaf plots, box-and-whisker plots, histograms, pie charts).

Week 3

Import files into R

Consider the following data file on the body temperatures of ten US males.

https://www.amazon.com/clouddrive/share/RJLhFeGmPR8j4b4dQDUzjuxbnhDLhIKqabQvJCKDnER

Watch the following video on how to import a .csv file into R. https://www.amazon.com/clouddrive/share/bcK8ZluX3i45PvJaQ5Omwc0ii53iVzRJx1jcrYIAbp9

This week we cover the following topics:


Histograms and numerical summaries

Top of page

A histogram is a visual representation of the distribution of a dataset. The shape of a histogram allows you to easily see where most of the data is situated. In particular, you can see where the middle of distribution is located, how closely the data lie around the middle, and where possible outliers are to be found. As shown in the figures below, a histogram consists of an x-axis, a y-axis and bars of different heights. The x-axis is divided into intervals (called “bins”), and on each bin a vertical bar is constructed whose height represents the number of data values within that bin. Note that histograms (unlike bar charts) don’t have gaps between the bars (if it looks like there’s a gap, that’s because that particular bin has no data in it).


Example: Suppose you are interested in the distribution of ages for employees working in a certain office. The following data is available: 36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55. We use R to construct a histogram to represent the distribution of the data.

age<-c(36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55)
hist(age)

The output appears under the ‘Plots’ tab, and looks like this:
[Histogram of age] The ‘hist’ command has many options that enable the user to change the display. For example, the user can control the number of bins by using the ‘breaks’ option. The title of the histogram by using the ‘main’ option, and the x- and y-axis labels using the ‘xlab’ and ‘ylab’ options.


Example: The following command creates a histogram with 7 nonempty bins, with title “Age of Employees” and x label “Employee ages”:

hist(age,breaks=7,main="Age of Employees",xlab="Employee ages")

The output appears under the ‘Plots’ tab, and looks like this:
[Histogram of age] ### XY plots {#xyplots} ###### Top of page

The command ‘xyplot’ can be used to plot one variable against another. The command uses the ‘lattice’ package, so before using it you must load the package.


Example: Load a new package called ‘lattice’.

library(lattice)

If you get an error message, it probably means you haven’t installed ‘lattice’. In this case, go back to “R_RStudioWindows” and follow the instructions found in the section ‘Packages window’.

To demonstrate ‘xyplot’ we will be using data from the ‘mosaicData package’, so you must load this package as well.


Week 4

Install the package ‘mosaic’:

install.packages('mosaic')

Install the package ‘mosaicData’:

install.packages('mosaicData')

Load the package ‘mosaic’:

require(mosaic)

Load the package ‘mosaicData’:

require(mosaicData)

We set the default number of digits to 2:

options(digits =2)

Consider the HELPrct (Health Evaluation and Linkage to Primary Care) data set that can be found under the “mosaicData” package. The HELP study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomized to receive a multidisciplinary assessment and a brief motivational intervention or usual care, with the goal of linking them to primary medical care.

This is a data frame with 453 observations on the following variables.

age subject age at baseline (in years)

anysub use of any substance post-detox: a factor with levels no yes

cesd Center for Epidemiologic Studies Depression measure at baseline (high scores indicate more depressive symptoms)

d1 lifetime number of hospitalizations for medical problems (measured at baseline)

daysanysub time (in days) to first use of any substance post-detox

dayslink time (in days) to linkage to primary care

drugrisk Risk Assessment Battery drug risk scale at baseline

e2b number of times in past 6 months entered a detox program (measured at baseline)

female 0 for male, 1 for female

sex a factor with levels male female

g1b experienced serious thoughts of suicide in last 30 days (measured at baseline): a factor with levels no yes

homeless housing status: a factor with levels housed homeless

i1 average number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)

i2 maximum number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)

id subject identifier

indtot Inventory of Drug Use Consequences (InDUC) total score (measured at baseline)

linkstatus post-detox linkage to primary care (0 = no, 1 = yes)

link post-detox linkage to primary care: no yes

mcs SF-36 Mental Component Score (measured at baseline, lower scores indicate worse status)

pcs SF-36 Physical Component Score (measured at baseline, lower scores indicate worse status)

pss_fr perceived social support by friends (measured at baseline, higher scores indicate more support)

racegrp race/ethnicity: levels black hispanic other white

satreat any BSAS substance abuse treatment at baseline: no yes

sexrisk Risk Assessment Battery sex risk score (measured at baseline)

substance primary substance of abuse: alcohol cocaine heroin

treat randomized to HELP clinic: no yes


We find the mean of the cesd (Center for Epidemiologic Studies Depression measure at baseline (high scores indicate more depressive symptoms)) variable:

mean(HELPrct$cesd)

Which is equal to 33.

The standard deviation is:


sd(HELPrct$cesd)

Which works out to be 13.

The variance is:


var(HELPrct$cesd)

157

We can also calculate the median:


median(HELPrct$cesd)

which is 34.

We can use the “summary” command to print out the min, max, mean, median, and quantiles:

library(mosaic)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggformula
## Loading required package: ggplot2
## 
## New to ggformula?  Try the tutorials: 
##  learnr::run_tutorial("introduction", package = "ggformula")
##  learnr::run_tutorial("refining", package = "ggformula")
## Loading required package: mosaicData
## Loading required package: Matrix
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
## 
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
## 
##     mean
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median,
##     prop.test, quantile, sd, t.test, var
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
summary(HELPrct$cesd)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.00   34.00   32.85   41.00   60.00

Min. 1st Qu. Median Mean 3rd Qu. Max.

1   25   34   33   41   60

Graphical summaries

hist(HELPrct$cesd)

How many females in teh dataset?

tally(~sex, data=HELPrct)
## sex
## female   male 
##    107    346
tally(~sex, format="percent", data=HELPrct)
## sex
##   female     male 
## 23.62031 76.37969

Lets restrict our attention to the female subjects. We use the filter() function in the dplyr package to generate a new dataframe containing only females.

female<-filter(HELPrct, sex=='female')
female
##     age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b
## 1    39            1    yes   15  2        189      343        0   1
## 2    47            1    yes    6  1         31      365        0  NA
## 3    49           NA   <NA>   52 14         NA      334        0   1
## 4    50            1    yes   50 14         31      365       18   7
## 5    34           NA   <NA>   46  0         NA      365        8  NA
## 6    58            0     no   49  3        192      365        0  NA
## 7    28            1    yes   35  6         27       41        0   2
## 8    27            0     no   52  0        198       49       10   4
## 9    48            1    yes   19  4         67      365        0  NA
## 10   34            1    yes    5  2         23       14        0  NA
## 11   35            1    yes   46  3         17      365        0  NA
## 12   41            0     no   29  3        181       19        0   2
## 13   29            0     no   33  3        180      365        1   4
## 14   40            0     no   57  5        181       34        0  NA
## 15   26           NA   <NA>   30  4         NA       NA        0  NA
## 16   41            1    yes   43  0          2       NA       10  NA
## 17   32            1    yes   37  2        175      365        0  NA
## 18   33           NA   <NA>   47  9         NA       38        0   3
## 19   40           NA   <NA>   36  1         NA      217        0   1
## 20   35           NA   <NA>   30  2         NA       16        0  NA
## 21   30            0     no   39  0        201       18        0   1
## 22   32           NA   <NA>   53 15         NA       41        0  NA
## 23   42            0     no   26 10        183      358        0   2
## 24   30           NA   <NA>   51  9         NA       NA        9   1
## 25   35           NA   <NA>   58  5         NA       17        0   2
## 26   30            1    yes   15  1         15      365        0  NA
## 27   50            0     no   35  6        178       49        0  NA
## 28   38           NA   <NA>   26  4         NA       28        0  NA
## 29   24            1    yes   45  0         68      365        0   1
## 30   49           NA   <NA>   28 13         NA      193        0   1
## 31   28            1    yes   48  4         12      413        0  NA
## 32   37           NA   <NA>   35  1         NA      106        0  NA
## 33   31            1    yes   15  1         31      365        0  NA
## 34   30            1    yes   29  2         12      365        0  NA
## 35   57            1    yes   39  4         28      380        0   1
## 36   29           NA   <NA>   46  6         NA      365        5   3
## 37   33           NA   <NA>   44  4         NA      427        0  NA
## 38   28            1    yes   38  3        117      218        0  NA
## 39   31           NA   <NA>   38 10         NA      405       20   1
## 40   36           NA   <NA>   53  3         NA       45        0   3
## 41   38           NA   <NA>   57  4         NA      370        0  NA
## 42   39           NA   <NA>   43  1         NA      365       13   1
## 43   33            1    yes   19 40          3      146        0   1
## 44   38            1    yes   34  1          0      348       14   1
## 45   43           NA   <NA>   36  1         NA       18        0  NA
## 46   33            1    yes   24  6          2      365        1  NA
## 47   29           NA   <NA>   54  0         NA      407        4  NA
## 48   47            0     no   41  1        190       78        0  NA
## 49   31           NA   <NA>   18  3         NA       NA        8   1
## 50   40           NA   <NA>   60  7         NA      406        0  NA
## 51   32            0     no   34  3        184      365        0  NA
## 52   38            0     no   38  3        247      365        0   1
## 53   32            1    yes   37  1         82      348        0  NA
## 54   35           NA   <NA>   24  1         NA      365        0  NA
## 55   35            0     no   34  1        172      136        0  NA
## 56   45            1    yes   40  5          7      365        0   1
## 57   47           NA   <NA>   39  2         NA      365        1   3
## 58   39            1    yes   42  4        215      428        0  NA
## 59   44           NA   <NA>   13  0         NA      365        0  NA
## 60   55            1    yes   30  2         11       40        0   2
## 61   34           NA   <NA>   19  1         NA      329        0  NA
## 62   34           NA   <NA>   36  1         NA      326        0  NA
## 63   31           NA   <NA>   22  0         NA      359        0  NA
## 64   27            1    yes   33  0          4      365        0   2
## 65   33            1    yes   51  1          5      365        1   6
## 66   30           NA   <NA>   30  6         NA       83        0  NA
## 67   34           NA   <NA>   38  2         NA      365        8  NA
## 68   37            0     no   37  2        179       41        0  NA
## 69   26           NA   <NA>   56  2         NA      365        0  NA
## 70   45            1    yes   41  0         33      365        4   1
## 71   23            1    yes   48  1          2      365        0   2
## 72   35            1    yes   45  3          1       26        0   1
## 73   42           NA   <NA>   52  3         NA       63        0  NA
## 74   32            1    yes   45  4          1      427        0   2
## 75   36            1    yes   39  1        136      324        0   2
## 76   22            1    yes   51  2          2      374        9   1
## 77   37           NA   <NA>   58  8         NA      365        0   2
## 78   33            1    yes   19  0         64       33        0  NA
## 79   43            0     no    7  0        187       41        0  NA
## 80   47            1    yes   54  1          4      349        8  NA
## 81   48            1    yes   53  4          0      302        0   3
## 82   35            1    yes   54  1          5      365       13  NA
## 83   38           NA   <NA>   42  4         NA      337        0  NA
## 84   35            0     no   36  0        178      361        0  NA
## 85   47           NA   <NA>   52  8         NA      365        0   2
## 86   33           NA   <NA>   40  4         NA       21        0  NA
## 87   26            1    yes   33  0         35      296        0   1
## 88   34            1    yes   29  0         12      356        0  NA
## 89   47            0     no   32  3        158       74        0  NA
## 90   39            0     no   52  2        268      449        0  NA
## 91   37            1    yes   41 10          1      393        0  NA
## 92   31            1    yes   42  1         15      365        0  NA
## 93   42            1    yes   42  5         33       98        0  NA
## 94   33           NA   <NA>   15  0         NA      365        0  NA
## 95   38           NA   <NA>   33  1         NA      286        1  NA
## 96   43           NA   <NA>   23  4         NA      365        0   2
## 97   27           NA   <NA>    3  0         NA      365        0  NA
## 98   21           NA   <NA>   39  0         NA       NA        6  NA
## 99   29           NA   <NA>   47  2         NA      365        0  NA
## 100  45           NA   <NA>   41  2         NA      365        0   1
## 101  24           NA   <NA>   34  2         NA      365       14   8
## 102  35           NA   <NA>   23  2         NA       28        0  NA
## 103  33           NA   <NA>   21  8         NA       NA        0  NA
## 104  36           NA   <NA>   29  4         NA      365        0  NA
## 105  33           NA   <NA>   40  2         NA      365        0   1
## 106  31           NA   <NA>   47  1         NA      365        0  NA
## 107  39           NA   <NA>   28  0         NA      365        1  NA
##     female    sex g1b homeless i1  i2  id indtot linkstatus link       mcs
## 1        1 female  no   housed  5   5   4     28          0   no 43.967880
## 2        1 female  no   housed  4   4   6     29          0   no 55.508991
## 3        1 female yes   housed 13  20   7     38          0   no 21.793024
## 4        1 female  no homeless 71 129   9     44          0   no 22.029678
## 5        1 female  no   housed  0   0  11     34          0   no 43.974678
## 6        1 female  no   housed 13  13  12     11          0   no 13.382205
## 7        1 female yes homeless  0   0  17     26          1  yes 29.799828
## 8        1 female yes   housed  9  24  20     37          1  yes 15.458271
## 9        1 female  no   housed  6   8  27     40          0   no 21.668474
## 10       1 female  no   housed  6  13  50      8          1  yes 59.454094
## 11       1 female  no   housed 13  20  57     32          0   no 24.000315
## 12       1 female yes   housed  3   6  65     20          1  yes 33.374172
## 13       1 female yes homeless  0   0  66     29          0   no 27.575460
## 14       1 female yes homeless 59 164  71     43          1  yes 17.705963
## 15       1 female yes   housed 12  18  74     37         NA <NA> 26.697262
## 16       1 female  no   housed  0   0  75     40         NA <NA> 15.447794
## 17       1 female yes   housed  2   2  90     40          0   no 28.858498
## 18       1 female yes   housed 64  64 100     44          1  yes 19.595461
## 19       1 female yes homeless 33  38 104     42          1  yes 27.993336
## 20       1 female  no   housed  9  15 108     33          1  yes 23.299021
## 21       1 female  no   housed  0   0 118     19          1  yes 24.747171
## 22       1 female yes homeless 34  34 120     33          1  yes 27.136280
## 23       1 female  no homeless 39  95 121     31          0   no 41.321629
## 24       1 female yes   housed  0   0 125     43         NA <NA> 19.156574
## 25       1 female yes   housed  1   1 127     37          1  yes 18.465418
## 26       1 female  no   housed 26  26 131     25          0   no 37.438934
## 27       1 female  no   housed 13  13 134     28          1  yes 20.310446
## 28       1 female  no   housed  0   0 138     39          1  yes 22.787546
## 29       1 female  no homeless  7   7 141     39          0   no 28.505577
## 30       1 female  no homeless 15  15 143     36          1  yes 40.156929
## 31       1 female  no   housed  2   2 150     33          0   no 22.017500
## 32       1 female  no homeless  1   3 153     25          1  yes 33.366123
## 33       1 female  no   housed  0   0 166     38          0   no 50.030434
## 34       1 female  no homeless 29  29 179     31          0   no 52.197483
## 35       1 female  no   housed 12  12 181     36          0   no 36.651463
## 36       1 female  no   housed  0   0 187     39          0   no 20.119982
## 37       1 female yes homeless 59  59 188     38          0   no 25.257971
## 38       1 female yes   housed 16  20 191     35          1  yes 18.324743
## 39       1 female yes homeless 26  33 193     44          0   no 22.442661
## 40       1 female yes homeless 50  50 194     41          1  yes 27.171751
## 41       1 female yes   housed 13  32 200     39          0   no 20.356680
## 42       1 female yes   housed 20  20 203     37          0   no 22.815102
## 43       1 female  no homeless 19  26 204     32          1  yes 40.032974
## 44       1 female  no homeless  0   0 213     32          0   no 43.353584
## 45       1 female yes   housed 58  58 219     40          1  yes 36.100307
## 46       1 female yes   housed 32  38 220     23          0   no 33.259956
## 47       1 female  no   housed  0   0 221     33          0   no 12.323594
## 48       1 female yes homeless  0   0 224     21          1  yes 37.953403
## 49       1 female yes   housed  0   0 226     32         NA <NA> 27.641029
## 50       1 female yes homeless 38  38 228     43          0   no 16.786348
## 51       1 female  no   housed 13  13 229     31          0   no 54.768539
## 52       1 female yes   housed 16  26 236     34          0   no 14.919310
## 53       1 female  no   housed  1   6 237     28          0   no 40.462433
## 54       1 female  no   housed  0   0 241     34          0   no 44.351089
## 55       1 female  no homeless  4   4 242     36          1  yes 16.469986
## 56       1 female yes   housed 10  14 247     34          0   no 26.311474
## 57       1 female  no   housed 42  48 249     33          0   no 27.471394
## 58       1 female yes   housed  0   0 254     20          0   no 13.968738
## 59       1 female  no   housed 13  13 255     26          0   no 41.867615
## 60       1 female  no   housed  1   2 264     41          1  yes 23.547628
## 61       1 female  no   housed  4   4 269     27          0   no 34.048084
## 62       1 female  no   housed  1   1 272     38          0   no 32.384045
## 63       1 female  no   housed 10  20 275     23          0   no 47.442879
## 64       1 female  no homeless  8   8 284     38          0   no 31.781149
## 65       1 female yes   housed  8  13 304     28          0   no 20.911337
## 66       1 female yes homeless 27  33 306     25          1  yes 44.446507
## 67       1 female  no   housed  0   0 308     33          0   no 21.543468
## 68       1 female  no homeless  1   1 313     33          1  yes 27.601431
## 69       1 female  no   housed  1   1 316     36          0   no 14.415197
## 70       1 female  no   housed  2   2 320     22          0   no 34.747746
## 71       1 female yes homeless 29  58 324     27          0   no 16.718819
## 72       1 female  no   housed  0   0 325     32          1  yes 20.220354
## 73       1 female yes homeless  0   0 327     32          1  yes 28.447634
## 74       1 female yes homeless 67  67 333     40          0   no 17.926985
## 75       1 female yes homeless 53  53 339     36          0   no 22.237560
## 76       1 female  no   housed  0   0 342     40          0   no  7.035307
## 77       1 female yes homeless 67  80 351     41          0   no 16.922634
## 78       1 female  no homeless  6   6 354     22          1  yes 24.923189
## 79       1 female  no homeless 26  26 364     15          1  yes 60.542084
## 80       1 female yes   housed 13  13 367     35          0   no 13.852996
## 81       1 female yes homeless  0   0 370     32          0   no 19.808329
## 82       1 female  no   housed  0   0 372     44          0   no  9.406377
## 83       1 female yes   housed  3   3 374     40          0   no 27.495565
## 84       1 female  no homeless 58  58 379     13          0   no 44.767254
## 85       1 female  no   housed  6   6 391     34          0   no  7.226597
## 86       1 female  no   housed 13  26 402     38          1  yes 19.819555
## 87       1 female  no   housed  0   0 403     41          0   no 29.213017
## 88       1 female  no   housed  0   0 421     37          0   no 31.077631
## 89       1 female  no   housed 21  21 431     13          1  yes 51.922516
## 90       1 female  no   housed  0   0 442     37          0   no 24.930353
## 91       1 female  no homeless 24  51 445     44          0   no 25.710777
## 92       1 female yes homeless  6  13 461     34          0   no 16.863588
## 93       1 female yes   housed 26  41 465     35          1  yes 30.701563
## 94       1 female  no   housed  0   0 466      6          0   no 41.624706
## 95       1 female yes   housed  3  16 470     33          0   no 22.337873
## 96       1 female  no homeless 19  19  55     31          0   no 27.717655
## 97       1 female  no   housed  1   1 139     21          0   no 57.834595
## 98       1 female yes   housed  0   0 155     35         NA <NA> 47.773228
## 99       1 female  no homeless 11  14 157     35          0   no  9.732559
## 100      1 female  no homeless 19  26 162     25          0   no 55.479382
## 101      1 female  no   housed 13  26 171     38          0   no 28.590870
## 102      1 female  no   housed  4   4 303     20          1  yes 45.425110
## 103      1 female  no homeless 26  26 345     28         NA <NA> 18.594315
## 104      1 female  no   housed  7   8 349     27          0   no 25.676130
## 105      1 female yes homeless 26  32 427     37          0   no 34.152245
## 106      1 female yes homeless 56  61 451     41          0   no 17.050970
## 107      1 female  no homeless  1  24 460     28          0   no 33.434536
##          pcs pss_fr  racegrp satreat sexrisk substance treat avg_drinks
## 1   61.93168     11    white     yes       4    heroin    no          5
## 2   46.47521      5    black      no       5   cocaine   yes          4
## 3   24.51504      1    black     yes       8   cocaine    no         13
## 4   38.27088      5    white      no       8   alcohol    no         71
## 5   60.07915      0    white      no       2    heroin   yes          0
## 6   41.93376     13    black     yes       0   alcohol    no         13
## 7   44.77651      7 hispanic     yes       3    heroin   yes          0
## 8   37.45214     13    white      no       3    heroin   yes          9
## 9   36.01007      6    black      no       7   cocaine    no          6
## 10  52.69898     12    black      no       4   cocaine   yes          6
## 11  46.75086      1    black      no       7   cocaine   yes         13
## 12  55.23372     13    white     yes       4   alcohol   yes          3
## 13  35.12470      4 hispanic     yes       4    heroin    no          0
## 14  36.04016      1    black      no       4   alcohol   yes         59
## 15  54.38272      6    white      no       9   cocaine    no         12
## 16  55.32189     14    white      no       3    heroin    no          0
## 17  43.94296     11    black      no       3   cocaine    no          2
## 18  40.48884      1    other      no       7   alcohol   yes         64
## 19  44.53589      7    white     yes       3   alcohol    no         33
## 20  51.81045     12    black     yes       5   alcohol   yes          9
## 21  54.10854     14 hispanic      no       4   cocaine   yes          0
## 22  54.79462      7    black      no       5   alcohol   yes         34
## 23  36.68874      4    black      no      10   cocaine    no         39
## 24  34.33698     10    white      no       6    heroin    no          0
## 25  39.33260     13    black     yes       6   cocaine   yes          1
## 26  49.29042     11    black     yes       3   cocaine   yes         26
## 27  33.48925      2    white      no       0   alcohol    no         13
## 28  28.74085      9    other      no       7   cocaine   yes          0
## 29  37.79718      7    black     yes       7   cocaine   yes          7
## 30  40.96234      7 hispanic     yes       9   alcohol    no         15
## 31  40.24271      1    white      no       5   cocaine   yes          2
## 32  45.16520      8    black      no       9   cocaine   yes          1
## 33  57.38777      9    black     yes       2   cocaine    no          0
## 34  55.73845     13    black     yes       7   cocaine   yes         29
## 35  30.50811      6    white     yes       0   alcohol    no         12
## 36  32.96189      3    white      no       4    heroin   yes          0
## 37  42.12069      7 hispanic      no       5   alcohol    no         59
## 38  43.24062     14    black      no      11   cocaine    no         16
## 39  35.90619      8    white      no      11   alcohol    no         26
## 40  37.75567      3    white      no       9   alcohol   yes         50
## 41  35.97361      0    black      no      14   cocaine    no         13
## 42  35.22702     10    white      no       4    heroin   yes         20
## 43  38.10227      2    black     yes       7   cocaine    no         19
## 44  21.91906      9    black      no       8    heroin    no          0
## 45  37.03778     11    black     yes       2   alcohol   yes         58
## 46  41.66993      8    other      no       3    heroin    no         32
## 47  48.21926     11    white      no       6    heroin    no          0
## 48  57.64361     11    black      no       0   cocaine    no          0
## 49  48.37090     12    white      no       4    heroin    no          0
## 50  38.51597      3    white     yes      11   cocaine   yes         38
## 51  23.48208     12    black     yes       0   cocaine    no         13
## 52  57.83691      3    white      no       5   alcohol   yes         16
## 53  56.90286      3    black     yes       4   cocaine   yes          1
## 54  46.79942      4    black      no       2   cocaine    no          0
## 55  58.49455      2    black      no       8   cocaine    no          4
## 56  43.25021      8    white      no       5   alcohol    no         10
## 57  52.42204     10    black      no       5    heroin    no         42
## 58  48.97176     11    black      no       4   cocaine   yes          0
## 59  46.36879      7 hispanic      no       4    heroin    no         13
## 60  37.35865      7    black     yes       2    heroin   yes          1
## 61  57.24648     12    black      no       2   cocaine    no          4
## 62  44.85584     10    black      no       4   cocaine    no          1
## 63  52.85658     11    black      no       7   alcohol   yes         10
## 64  51.49556      7    black     yes       8   cocaine   yes          8
## 65  33.07642      6 hispanic     yes       4    heroin   yes          8
## 66  45.79400     12    black      no       4   alcohol   yes         27
## 67  52.35651     10    white      no       4    heroin    no          0
## 68  37.83872     11    black      no       6   cocaine    no          1
## 69  46.74971      2    black      no      11    heroin   yes          1
## 70  64.35030      3    white      no       1    heroin   yes          2
## 71  35.70664      3    black      no      11   alcohol   yes         29
## 72  32.44772      2    black      no       9   alcohol   yes          0
## 73  39.93384      2    other      no       0    heroin   yes          0
## 74  39.09279      7    black      no       6   alcohol    no         67
## 75  36.52407      3    black     yes       5   alcohol    no         53
## 76  52.51404      8    other      no       7    heroin   yes          0
## 77  34.09209      0    other      no       2   alcohol    no         67
## 78  63.77832      8    black      no       4   cocaine   yes          6
## 79  55.44015     13    white      no       1    heroin   yes         26
## 80  31.11147      9    black      no       0   cocaine   yes         13
## 81  27.09086     13    white     yes       3   alcohol    no          0
## 82  41.95401     13    white      no       4    heroin    no          0
## 83  51.27790      3    black      no       9   cocaine    no          3
## 84  53.42212     14    black      no       4   cocaine    no         58
## 85  47.60948      9    white      no       4   alcohol   yes          6
## 86  32.99675      0    black      no       4   alcohol   yes         13
## 87  56.69189      3    black     yes       3    heroin    no          0
## 88  64.91865     14    black      no      12   cocaine   yes          0
## 89  54.52398     12 hispanic      no       0   alcohol    no         21
## 90  33.53111      7    black      no       2    heroin   yes          0
## 91  49.18084      9    other      no       9   alcohol    no         24
## 92  46.69877      0    black      no      10   cocaine   yes          6
## 93  38.40187      5    white      no       6   alcohol   yes         26
## 94  62.08943     11    black     yes       6   cocaine   yes          0
## 95  42.31495      8    black      no       1    heroin    no          3
## 96  41.10135      3    black      no       6   alcohol    no         19
## 97  58.21511      4    black     yes       1   cocaine    no          1
## 98  41.09781     14    white      no       1    heroin    no          0
## 99  69.17161      4    black      no       7   cocaine    no         11
## 100 54.09069      4    white      no       4   alcohol    no         19
## 101 57.76270      9    white     yes      14    heroin   yes         13
## 102 58.75759      1    black      no       2   cocaine   yes          4
## 103 38.86502      3    white      no       4   alcohol    no         26
## 104 54.98139     13    white      no       4   alcohol   yes          7
## 105 45.27036      2 hispanic      no       3   alcohol   yes         26
## 106 34.51623      8 hispanic     yes      14   alcohol    no         56
## 107 40.04572      1    white      no       2    heroin    no          1
##     max_drinks
## 1            5
## 2            4
## 3           20
## 4          129
## 5            0
## 6           13
## 7            0
## 8           24
## 9            8
## 10          13
## 11          20
## 12           6
## 13           0
## 14         164
## 15          18
## 16           0
## 17           2
## 18          64
## 19          38
## 20          15
## 21           0
## 22          34
## 23          95
## 24           0
## 25           1
## 26          26
## 27          13
## 28           0
## 29           7
## 30          15
## 31           2
## 32           3
## 33           0
## 34          29
## 35          12
## 36           0
## 37          59
## 38          20
## 39          33
## 40          50
## 41          32
## 42          20
## 43          26
## 44           0
## 45          58
## 46          38
## 47           0
## 48           0
## 49           0
## 50          38
## 51          13
## 52          26
## 53           6
## 54           0
## 55           4
## 56          14
## 57          48
## 58           0
## 59          13
## 60           2
## 61           4
## 62           1
## 63          20
## 64           8
## 65          13
## 66          33
## 67           0
## 68           1
## 69           1
## 70           2
## 71          58
## 72           0
## 73           0
## 74          67
## 75          53
## 76           0
## 77          80
## 78           6
## 79          26
## 80          13
## 81           0
## 82           0
## 83           3
## 84          58
## 85           6
## 86          26
## 87           0
## 88           0
## 89          21
## 90           0
## 91          51
## 92          13
## 93          41
## 94           0
## 95          16
## 96          19
## 97           1
## 98           0
## 99          14
## 100         26
## 101         26
## 102          4
## 103         26
## 104          8
## 105         32
## 106         61
## 107         24
with(female, stem(cesd))
## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   0 | 3
##   0 | 567
##   1 | 3
##   1 | 555589999
##   2 | 123344
##   2 | 66889999
##   3 | 0000233334444
##   3 | 5556666777888899999
##   4 | 00011112222334
##   4 | 555666777889
##   5 | 011122222333444
##   5 | 67788
##   6 | 0

Week 5

Competencies covered:

  1. Concepts of probability through sampling, experiments and simulations

  2. Simple and compound events

  3. Determines probabilities by constructing sample spaces to model situations

Reading:

Whether you know it or not, you are a fortune teller of sorts. Every day, all day, you are constantly predicting what will happen:

  1. You choose clothes based on what you think the weather will be (or, be honest, what you have that’s clean regardless of the weather).

  2. You choose what table to sit at in the cafeteria based on where you think your friends will sit.

  3. You choose and choose and choose, and every choice is a prediction of how likely you think an event or series of events is to happen. We can actually measure how likely it is for something to happen, and that measurement is called probability.

Fig. 5.1

Fig. 5.1

Probability and relative frequency

Example 1:

Flip a coin. What is your sample space?

Example 2:

Let’s start with dice. Take a die (make sure it’s fair, not weighted or “funny” in any way). What are the odds (the probability) of rolling a 3 if you roll the die one time? Hopefully you figured out that it is one in six because there are six sides to the die, and one of those sides has three dots.

Let’s try it. Take the die and roll it 100 times, recording your results below. Calculate the percentage of each result (for example, if you rolled a 2 17 times, that would be 17/100, or 17 percent).

Fig. 5.2

Fig. 5.2

We would expect that the percentages for each number would hover around 16 or 17, which is 1/6 or .1666. This is probability in a nutshell.

Now guess what the percentage would be if you added up the percentages of the rolls of only the oddnumbered sides. When you add up those roles, does the percentage come close to your guess?

Probability is the ratio of the times an event is likely to occur divided by the total possible events.

In the case of our die, there are six possible events, and there is one likely event for each number with each roll, or 1/6. If there were no dots on any of the sides, the probability of rolling a 3 would be zero because there would be no 3 and no other dots either, giving us this ratio: 0/0. If every side had three dots, the probability of rolling a 3 would be 1 because it would be 6/6, or 1. So, probability is expressed as a number somewhere between 0 (not gonna happen) and 1 (definitely going to happen), with ratios closer to 1 being most likely.

Let’s put it in formula form: Fig. 5.3

Using the formula: 1. What is the probability of getting heads on a coin toss?

  1. If you go to school Monday through Friday and you know the cafeteria is going to serve pizza two days that week, what is the probability that pizza will be served on any given day? It would be 2/5, because there are two desired outcomes (pizza!) and five possible outcomes (days of the school week).

So far, we’ve just looked at things that could occur. What about looking at things that have actually happened?

We call that relative frequency, and it has its own formula:

Fig. 5.4

Fig. 5.4

Think back to your die experiment above. The first formula gives you your expectation (1/6). The formula for relative frequency gives you the actual outcome. What was the relative frequency of rolling 5s in your dierolling experiment?

The law of large numbers

The more times we roll the die, the closer we will get to the outcome we expected (1/6). We call that the Law of Large Numbers — even if you don’t get it to come out like you expect with a few tries, the more you do it, the closer you will come to the expectation.

Flip a penny 10 times and record how many times it lands on heads and how many times it lands on tails.

Heads: ____________

Tails: ____________

Now flip it 100 times and record your results.

Heads: ____________

Tails: ____________

Did you get closer to a ½ ratio the second time? That’s the Law of Large Numbers at work. Remember, the Law of Large Numbers tells us that the more times you repeat an experiment, the closer the relative frequency will come to the probability.

Calculating possible outcomes

We have one more thing to learn before we move on. Let’s figure out how to calculate the possible outcomes of an event. We’ve figured out the probability of simple events occurring, but what happens when the possible outcomes are harder to figure out? When you are rolling one die or flipping one coin, it’s simple to figure out possible outcomes, but it gets more complicated when you add in more dice or more coins.

Imagine that your parents pay you an allowance of 50¢ a week. Let’s say you love nickels. What is the probability that your 50¢ will contain a nickel? We need to figure out all the possible ways to give someone 50¢. Let’s say that your parents don’t ever use pennies or 50-cent pieces. Besides those, there are three possible coins to use: nickels, dimes and quarters. If your parents pay you in quarters, it’s simple, right? Two quarters make 50¢. But what other possible combinations make 50¢? And how likely are you to get that nickel you want? Let’s set it up in the table below. First, across the top we’ll list the possible coins. Next, we’ll start listing possible combinations by listing the greatest possible number of that type of coin and then decreasing that by one on the next line. Once we’ve gotten to zero of that coin, let’s move to the highest level possible of the next highest coin. Sound confusing? It’s not once you get going. Let’s try it:

First, quarters. We list two quarters, and that makes 50¢ by itself, so the columns for dimes and nickels are zero. There are no other possible combinations with two quarters, so the next line lists one quarter. Remember, we are going from largest to smallest, so first we’ll try one quarter and the greatest number of dimes possible, which is two. There are two other possibilities with one quarter, so we’ll list those on the next lines and then the next lower number of dimes, which is one. That means we’ll need three nickels because we’ve always got to add up to fifty cents.

Fig. 5.5

Fig. 5.5

Now we’re out of possibilities that use quarters, so we move on to dimes. The most dimes we could have is five, so we start with that. We decrease that number by one on each line. For every dime we take away, we have to add two nickels, so notice that the nickels increase by two each line. We end up with 10 possible combinations of coins.

So, how many of the 10 possibilities contain nickels? Did you count eight? You’re right! So let’s put this in our probability formula:

8 = desired (or likely) outcomes

10 = possible outcomes

So 8/10.

Does that seem like good odds to you? Not bad!

What are the odds if you want a quarter? How many possibilities contained a quarter? Did you find four? So the odds of getting a quarter are: 4 = likely outcomes

10 = possible outcomes

which gives 4/10.

Are you more likely to get a quarter or a nickel? All other things being equal (meaning your parents have a wide variety of coins and aren’t out of dimes or something), you are far more likely to get a nickel than a quarter.

In summary, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

The LLN is important because it “guarantees” stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the LLN only applies (as the name indicates) when a large number of observations is considered.

Assessment:

  1. Suppose you go to the store and there are six kinds of cereal. Your mom tells you to pick one. When she comes back, you ask her to guess which one you chose. Assuming she had no idea beforehand what your choice would be, how likely is it that she will guess the correct one? _____________________________________________________________________________________________

  2. Let’s say you are playing a coin-tossing game with a friend. You toss the same coin 500 times, and 400 times it comes up heads. Would this be normal? What could explain it? ____________________________________________________________

  3. You are bored one day, so you start rolling a die. You roll it 10 times and you get a 6 eight of the times. You think this is strange, so you keep rolling. You roll 100 more times and only get eight more 6s, leaving a total of 16. What is the rule that accounts for this scenario? ___________________________________________________________________________

  4. What is the relative frequency of the final total of the 6’s rolled in Question 3? _____________________________________________________________________________________________

  5. Someone hands you a standard deck of 52 cards. There are four queens. What is the likelihood that you will draw a queen from the deck the first try? _____________________________________________________________________________________________

  6. Let’s say you earn $20 doing chores (lots of them), and you’re being paid in cash. Create a table of possible outcomes that shows the possible combinations of $10’s, $5’s and $1’s that would total your $20. (Look back at the table we did before to see how to set this problem up.)

  7. You and a friend (who has no understanding of probability) are at the mall on a hot Tuesday afternoon in the middle of the summer. She offers to buy you a snow cone if you can guess the types of dollar bills she has in her wallet in fewer than three guesses. She tells you she has $20 and none of the bills are $1’s or a $20. If you don’t guess correctly by the third try, you have to give her a quarter. Are your odds of guessing correctly greater than ½?

Side-by-side plots in R

Load the mosaic and mosaicData packages:

require(mosaic)
require(mosaicData)

Consider the “cesd”-variable in the HELPrct data set in the mosaicData package:

HELPrct$cesd
##   [1] 49 30 39 15 39  6 52 32 50 46 46 49 22 36 43 35 19 40 52 37 35 18 36
##  [24] 28 19 30 27 24 47 45 18 11 26 29 34 37 23 41 21 16 36 17 36 19  5 25
##  [47] 36 27 44 29 46 16 44 42 30 25 26 29 33 28 33 44 29 57 26 31 30 43 28
##  [70] 29 32 30 34 49 36 42 40 29 31 10 37 32 16 15  4 30 44  8 16 47 49 30
##  [93] 36 48 17 39 30 24 25 51 17 37 45 28 17 23 39 38 53 26 47 49 34 51 33
## [116] 58 28  4 15 40 33 35 28 21 33 26 45 45 31 28 22 39 31 48 48 34 35 46
## [139] 34 10 31 34 26 15 48 37 20 38 39 46 17  6 18 29 51 39 31 49 43 45 46
## [162] 44 41 29 38 51 38 53 29 31 57 38 39 43 19 23 44 12 35 47 53 34 15 31
## [185] 27 36 24 54 31 22 41 23 18 60 34 26 40 40  1 41 38 37 16 33  4 24 34
## [208] 40 39 32 40 51 39 40 22 42 13 49 35 43 27 40 38 39 30 35 34 19 39 36
## [231] 58 38 22 46 31 11 32 33 39 33 27 43 30 12 42 31 40 17 44 15 41 51 24
## [254] 29 40 33 51 30 46 38 42 17 22 37 11 56 14 26 36 41 18 19 48 45 44 52
## [277] 19  9 55 18 45 12 33 32 20 37 39 43 51 27 40  8 54 35 58 50 55 19 37
## [300] 20 40 37 43  8 56 51  7 36 49 54 53 15 53  6 54 42 31 40 37 36 40 41
## [323] 39 38 38  9 36 27 26 52 24 16 34 46 24 25 40 33 31 37 28 27  6 21 29
## [346] 23 35 55  3 36 40 29 28 21 34 42 23 36 32 30 25 35 23 16 27 14 44 52
## [369] 48 11 41 41 37 31 34 40 37 30 42 51 42 15 12 39 10 33 57 17 20 49 23
## [392] 26 28  3 18 39 51 39 47 45 28 41 31 34 21 41 38 36 24 10 41 51 45 29
## [415] 56 34  4 32 38 26 27 21 30  7 35 23 36 15 48 31 54 21 21 29 23 33 14
## [438] 27 24 33 25 37 47 40  9 37 47 34 28 37 28 11 35

The “cesd”-score is the Center for Epidemiologic Studies Depression measure at baseline (high scores indicate more depressive symptoms).

Create an object called “depscore” in which we will save the set of depressive scores:

depscore<-HELPrct$cesd
depscore
##   [1] 49 30 39 15 39  6 52 32 50 46 46 49 22 36 43 35 19 40 52 37 35 18 36
##  [24] 28 19 30 27 24 47 45 18 11 26 29 34 37 23 41 21 16 36 17 36 19  5 25
##  [47] 36 27 44 29 46 16 44 42 30 25 26 29 33 28 33 44 29 57 26 31 30 43 28
##  [70] 29 32 30 34 49 36 42 40 29 31 10 37 32 16 15  4 30 44  8 16 47 49 30
##  [93] 36 48 17 39 30 24 25 51 17 37 45 28 17 23 39 38 53 26 47 49 34 51 33
## [116] 58 28  4 15 40 33 35 28 21 33 26 45 45 31 28 22 39 31 48 48 34 35 46
## [139] 34 10 31 34 26 15 48 37 20 38 39 46 17  6 18 29 51 39 31 49 43 45 46
## [162] 44 41 29 38 51 38 53 29 31 57 38 39 43 19 23 44 12 35 47 53 34 15 31
## [185] 27 36 24 54 31 22 41 23 18 60 34 26 40 40  1 41 38 37 16 33  4 24 34
## [208] 40 39 32 40 51 39 40 22 42 13 49 35 43 27 40 38 39 30 35 34 19 39 36
## [231] 58 38 22 46 31 11 32 33 39 33 27 43 30 12 42 31 40 17 44 15 41 51 24
## [254] 29 40 33 51 30 46 38 42 17 22 37 11 56 14 26 36 41 18 19 48 45 44 52
## [277] 19  9 55 18 45 12 33 32 20 37 39 43 51 27 40  8 54 35 58 50 55 19 37
## [300] 20 40 37 43  8 56 51  7 36 49 54 53 15 53  6 54 42 31 40 37 36 40 41
## [323] 39 38 38  9 36 27 26 52 24 16 34 46 24 25 40 33 31 37 28 27  6 21 29
## [346] 23 35 55  3 36 40 29 28 21 34 42 23 36 32 30 25 35 23 16 27 14 44 52
## [369] 48 11 41 41 37 31 34 40 37 30 42 51 42 15 12 39 10 33 57 17 20 49 23
## [392] 26 28  3 18 39 51 39 47 45 28 41 31 34 21 41 38 36 24 10 41 51 45 29
## [415] 56 34  4 32 38 26 27 21 30  7 35 23 36 15 48 31 54 21 21 29 23 33 14
## [438] 27 24 33 25 37 47 40  9 37 47 34 28 37 28 11 35

Find the summary statistics for this variable:

summary(depscore)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   25.00   34.00   32.85   41.00   60.00

Create a boxplot and histogram for the scores:

boxplot(depscore)

hist(depscore)

Now consider the variable “sex”. We will count how many males and how many females are in the data set:

tally(~sex, data=HELPrct)
## sex
## female   male 
##    107    346

We will now filter all the data for females by using the “filter”-command:

female<-filter(HELPrct, sex=='female')
female
##     age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b
## 1    39            1    yes   15  2        189      343        0   1
## 2    47            1    yes    6  1         31      365        0  NA
## 3    49           NA   <NA>   52 14         NA      334        0   1
## 4    50            1    yes   50 14         31      365       18   7
## 5    34           NA   <NA>   46  0         NA      365        8  NA
## 6    58            0     no   49  3        192      365        0  NA
## 7    28            1    yes   35  6         27       41        0   2
## 8    27            0     no   52  0        198       49       10   4
## 9    48            1    yes   19  4         67      365        0  NA
## 10   34            1    yes    5  2         23       14        0  NA
## 11   35            1    yes   46  3         17      365        0  NA
## 12   41            0     no   29  3        181       19        0   2
## 13   29            0     no   33  3        180      365        1   4
## 14   40            0     no   57  5        181       34        0  NA
## 15   26           NA   <NA>   30  4         NA       NA        0  NA
## 16   41            1    yes   43  0          2       NA       10  NA
## 17   32            1    yes   37  2        175      365        0  NA
## 18   33           NA   <NA>   47  9         NA       38        0   3
## 19   40           NA   <NA>   36  1         NA      217        0   1
## 20   35           NA   <NA>   30  2         NA       16        0  NA
## 21   30            0     no   39  0        201       18        0   1
## 22   32           NA   <NA>   53 15         NA       41        0  NA
## 23   42            0     no   26 10        183      358        0   2
## 24   30           NA   <NA>   51  9         NA       NA        9   1
## 25   35           NA   <NA>   58  5         NA       17        0   2
## 26   30            1    yes   15  1         15      365        0  NA
## 27   50            0     no   35  6        178       49        0  NA
## 28   38           NA   <NA>   26  4         NA       28        0  NA
## 29   24            1    yes   45  0         68      365        0   1
## 30   49           NA   <NA>   28 13         NA      193        0   1
## 31   28            1    yes   48  4         12      413        0  NA
## 32   37           NA   <NA>   35  1         NA      106        0  NA
## 33   31            1    yes   15  1         31      365        0  NA
## 34   30            1    yes   29  2         12      365        0  NA
## 35   57            1    yes   39  4         28      380        0   1
## 36   29           NA   <NA>   46  6         NA      365        5   3
## 37   33           NA   <NA>   44  4         NA      427        0  NA
## 38   28            1    yes   38  3        117      218        0  NA
## 39   31           NA   <NA>   38 10         NA      405       20   1
## 40   36           NA   <NA>   53  3         NA       45        0   3
## 41   38           NA   <NA>   57  4         NA      370        0  NA
## 42   39           NA   <NA>   43  1         NA      365       13   1
## 43   33            1    yes   19 40          3      146        0   1
## 44   38            1    yes   34  1          0      348       14   1
## 45   43           NA   <NA>   36  1         NA       18        0  NA
## 46   33            1    yes   24  6          2      365        1  NA
## 47   29           NA   <NA>   54  0         NA      407        4  NA
## 48   47            0     no   41  1        190       78        0  NA
## 49   31           NA   <NA>   18  3         NA       NA        8   1
## 50   40           NA   <NA>   60  7         NA      406        0  NA
## 51   32            0     no   34  3        184      365        0  NA
## 52   38            0     no   38  3        247      365        0   1
## 53   32            1    yes   37  1         82      348        0  NA
## 54   35           NA   <NA>   24  1         NA      365        0  NA
## 55   35            0     no   34  1        172      136        0  NA
## 56   45            1    yes   40  5          7      365        0   1
## 57   47           NA   <NA>   39  2         NA      365        1   3
## 58   39            1    yes   42  4        215      428        0  NA
## 59   44           NA   <NA>   13  0         NA      365        0  NA
## 60   55            1    yes   30  2         11       40        0   2
## 61   34           NA   <NA>   19  1         NA      329        0  NA
## 62   34           NA   <NA>   36  1         NA      326        0  NA
## 63   31           NA   <NA>   22  0         NA      359        0  NA
## 64   27            1    yes   33  0          4      365        0   2
## 65   33            1    yes   51  1          5      365        1   6
## 66   30           NA   <NA>   30  6         NA       83        0  NA
## 67   34           NA   <NA>   38  2         NA      365        8  NA
## 68   37            0     no   37  2        179       41        0  NA
## 69   26           NA   <NA>   56  2         NA      365        0  NA
## 70   45            1    yes   41  0         33      365        4   1
## 71   23            1    yes   48  1          2      365        0   2
## 72   35            1    yes   45  3          1       26        0   1
## 73   42           NA   <NA>   52  3         NA       63        0  NA
## 74   32            1    yes   45  4          1      427        0   2
## 75   36            1    yes   39  1        136      324        0   2
## 76   22            1    yes   51  2          2      374        9   1
## 77   37           NA   <NA>   58  8         NA      365        0   2
## 78   33            1    yes   19  0         64       33        0  NA
## 79   43            0     no    7  0        187       41        0  NA
## 80   47            1    yes   54  1          4      349        8  NA
## 81   48            1    yes   53  4          0      302        0   3
## 82   35            1    yes   54  1          5      365       13  NA
## 83   38           NA   <NA>   42  4         NA      337        0  NA
## 84   35            0     no   36  0        178      361        0  NA
## 85   47           NA   <NA>   52  8         NA      365        0   2
## 86   33           NA   <NA>   40  4         NA       21        0  NA
## 87   26            1    yes   33  0         35      296        0   1
## 88   34            1    yes   29  0         12      356        0  NA
## 89   47            0     no   32  3        158       74        0  NA
## 90   39            0     no   52  2        268      449        0  NA
## 91   37            1    yes   41 10          1      393        0  NA
## 92   31            1    yes   42  1         15      365        0  NA
## 93   42            1    yes   42  5         33       98        0  NA
## 94   33           NA   <NA>   15  0         NA      365        0  NA
## 95   38           NA   <NA>   33  1         NA      286        1  NA
## 96   43           NA   <NA>   23  4         NA      365        0   2
## 97   27           NA   <NA>    3  0         NA      365        0  NA
## 98   21           NA   <NA>   39  0         NA       NA        6  NA
## 99   29           NA   <NA>   47  2         NA      365        0  NA
## 100  45           NA   <NA>   41  2         NA      365        0   1
## 101  24           NA   <NA>   34  2         NA      365       14   8
## 102  35           NA   <NA>   23  2         NA       28        0  NA
## 103  33           NA   <NA>   21  8         NA       NA        0  NA
## 104  36           NA   <NA>   29  4         NA      365        0  NA
## 105  33           NA   <NA>   40  2         NA      365        0   1
## 106  31           NA   <NA>   47  1         NA      365        0  NA
## 107  39           NA   <NA>   28  0         NA      365        1  NA
##     female    sex g1b homeless i1  i2  id indtot linkstatus link       mcs
## 1        1 female  no   housed  5   5   4     28          0   no 43.967880
## 2        1 female  no   housed  4   4   6     29          0   no 55.508991
## 3        1 female yes   housed 13  20   7     38          0   no 21.793024
## 4        1 female  no homeless 71 129   9     44          0   no 22.029678
## 5        1 female  no   housed  0   0  11     34          0   no 43.974678
## 6        1 female  no   housed 13  13  12     11          0   no 13.382205
## 7        1 female yes homeless  0   0  17     26          1  yes 29.799828
## 8        1 female yes   housed  9  24  20     37          1  yes 15.458271
## 9        1 female  no   housed  6   8  27     40          0   no 21.668474
## 10       1 female  no   housed  6  13  50      8          1  yes 59.454094
## 11       1 female  no   housed 13  20  57     32          0   no 24.000315
## 12       1 female yes   housed  3   6  65     20          1  yes 33.374172
## 13       1 female yes homeless  0   0  66     29          0   no 27.575460
## 14       1 female yes homeless 59 164  71     43          1  yes 17.705963
## 15       1 female yes   housed 12  18  74     37         NA <NA> 26.697262
## 16       1 female  no   housed  0   0  75     40         NA <NA> 15.447794
## 17       1 female yes   housed  2   2  90     40          0   no 28.858498
## 18       1 female yes   housed 64  64 100     44          1  yes 19.595461
## 19       1 female yes homeless 33  38 104     42          1  yes 27.993336
## 20       1 female  no   housed  9  15 108     33          1  yes 23.299021
## 21       1 female  no   housed  0   0 118     19          1  yes 24.747171
## 22       1 female yes homeless 34  34 120     33          1  yes 27.136280
## 23       1 female  no homeless 39  95 121     31          0   no 41.321629
## 24       1 female yes   housed  0   0 125     43         NA <NA> 19.156574
## 25       1 female yes   housed  1   1 127     37          1  yes 18.465418
## 26       1 female  no   housed 26  26 131     25          0   no 37.438934
## 27       1 female  no   housed 13  13 134     28          1  yes 20.310446
## 28       1 female  no   housed  0   0 138     39          1  yes 22.787546
## 29       1 female  no homeless  7   7 141     39          0   no 28.505577
## 30       1 female  no homeless 15  15 143     36          1  yes 40.156929
## 31       1 female  no   housed  2   2 150     33          0   no 22.017500
## 32       1 female  no homeless  1   3 153     25          1  yes 33.366123
## 33       1 female  no   housed  0   0 166     38          0   no 50.030434
## 34       1 female  no homeless 29  29 179     31          0   no 52.197483
## 35       1 female  no   housed 12  12 181     36          0   no 36.651463
## 36       1 female  no   housed  0   0 187     39          0   no 20.119982
## 37       1 female yes homeless 59  59 188     38          0   no 25.257971
## 38       1 female yes   housed 16  20 191     35          1  yes 18.324743
## 39       1 female yes homeless 26  33 193     44          0   no 22.442661
## 40       1 female yes homeless 50  50 194     41          1  yes 27.171751
## 41       1 female yes   housed 13  32 200     39          0   no 20.356680
## 42       1 female yes   housed 20  20 203     37          0   no 22.815102
## 43       1 female  no homeless 19  26 204     32          1  yes 40.032974
## 44       1 female  no homeless  0   0 213     32          0   no 43.353584
## 45       1 female yes   housed 58  58 219     40          1  yes 36.100307
## 46       1 female yes   housed 32  38 220     23          0   no 33.259956
## 47       1 female  no   housed  0   0 221     33          0   no 12.323594
## 48       1 female yes homeless  0   0 224     21          1  yes 37.953403
## 49       1 female yes   housed  0   0 226     32         NA <NA> 27.641029
## 50       1 female yes homeless 38  38 228     43          0   no 16.786348
## 51       1 female  no   housed 13  13 229     31          0   no 54.768539
## 52       1 female yes   housed 16  26 236     34          0   no 14.919310
## 53       1 female  no   housed  1   6 237     28          0   no 40.462433
## 54       1 female  no   housed  0   0 241     34          0   no 44.351089
## 55       1 female  no homeless  4   4 242     36          1  yes 16.469986
## 56       1 female yes   housed 10  14 247     34          0   no 26.311474
## 57       1 female  no   housed 42  48 249     33          0   no 27.471394
## 58       1 female yes   housed  0   0 254     20          0   no 13.968738
## 59       1 female  no   housed 13  13 255     26          0   no 41.867615
## 60       1 female  no   housed  1   2 264     41          1  yes 23.547628
## 61       1 female  no   housed  4   4 269     27          0   no 34.048084
## 62       1 female  no   housed  1   1 272     38          0   no 32.384045
## 63       1 female  no   housed 10  20 275     23          0   no 47.442879
## 64       1 female  no homeless  8   8 284     38          0   no 31.781149
## 65       1 female yes   housed  8  13 304     28          0   no 20.911337
## 66       1 female yes homeless 27  33 306     25          1  yes 44.446507
## 67       1 female  no   housed  0   0 308     33          0   no 21.543468
## 68       1 female  no homeless  1   1 313     33          1  yes 27.601431
## 69       1 female  no   housed  1   1 316     36          0   no 14.415197
## 70       1 female  no   housed  2   2 320     22          0   no 34.747746
## 71       1 female yes homeless 29  58 324     27          0   no 16.718819
## 72       1 female  no   housed  0   0 325     32          1  yes 20.220354
## 73       1 female yes homeless  0   0 327     32          1  yes 28.447634
## 74       1 female yes homeless 67  67 333     40          0   no 17.926985
## 75       1 female yes homeless 53  53 339     36          0   no 22.237560
## 76       1 female  no   housed  0   0 342     40          0   no  7.035307
## 77       1 female yes homeless 67  80 351     41          0   no 16.922634
## 78       1 female  no homeless  6   6 354     22          1  yes 24.923189
## 79       1 female  no homeless 26  26 364     15          1  yes 60.542084
## 80       1 female yes   housed 13  13 367     35          0   no 13.852996
## 81       1 female yes homeless  0   0 370     32          0   no 19.808329
## 82       1 female  no   housed  0   0 372     44          0   no  9.406377
## 83       1 female yes   housed  3   3 374     40          0   no 27.495565
## 84       1 female  no homeless 58  58 379     13          0   no 44.767254
## 85       1 female  no   housed  6   6 391     34          0   no  7.226597
## 86       1 female  no   housed 13  26 402     38          1  yes 19.819555
## 87       1 female  no   housed  0   0 403     41          0   no 29.213017
## 88       1 female  no   housed  0   0 421     37          0   no 31.077631
## 89       1 female  no   housed 21  21 431     13          1  yes 51.922516
## 90       1 female  no   housed  0   0 442     37          0   no 24.930353
## 91       1 female  no homeless 24  51 445     44          0   no 25.710777
## 92       1 female yes homeless  6  13 461     34          0   no 16.863588
## 93       1 female yes   housed 26  41 465     35          1  yes 30.701563
## 94       1 female  no   housed  0   0 466      6          0   no 41.624706
## 95       1 female yes   housed  3  16 470     33          0   no 22.337873
## 96       1 female  no homeless 19  19  55     31          0   no 27.717655
## 97       1 female  no   housed  1   1 139     21          0   no 57.834595
## 98       1 female yes   housed  0   0 155     35         NA <NA> 47.773228
## 99       1 female  no homeless 11  14 157     35          0   no  9.732559
## 100      1 female  no homeless 19  26 162     25          0   no 55.479382
## 101      1 female  no   housed 13  26 171     38          0   no 28.590870
## 102      1 female  no   housed  4   4 303     20          1  yes 45.425110
## 103      1 female  no homeless 26  26 345     28         NA <NA> 18.594315
## 104      1 female  no   housed  7   8 349     27          0   no 25.676130
## 105      1 female yes homeless 26  32 427     37          0   no 34.152245
## 106      1 female yes homeless 56  61 451     41          0   no 17.050970
## 107      1 female  no homeless  1  24 460     28          0   no 33.434536
##          pcs pss_fr  racegrp satreat sexrisk substance treat avg_drinks
## 1   61.93168     11    white     yes       4    heroin    no          5
## 2   46.47521      5    black      no       5   cocaine   yes          4
## 3   24.51504      1    black     yes       8   cocaine    no         13
## 4   38.27088      5    white      no       8   alcohol    no         71
## 5   60.07915      0    white      no       2    heroin   yes          0
## 6   41.93376     13    black     yes       0   alcohol    no         13
## 7   44.77651      7 hispanic     yes       3    heroin   yes          0
## 8   37.45214     13    white      no       3    heroin   yes          9
## 9   36.01007      6    black      no       7   cocaine    no          6
## 10  52.69898     12    black      no       4   cocaine   yes          6
## 11  46.75086      1    black      no       7   cocaine   yes         13
## 12  55.23372     13    white     yes       4   alcohol   yes          3
## 13  35.12470      4 hispanic     yes       4    heroin    no          0
## 14  36.04016      1    black      no       4   alcohol   yes         59
## 15  54.38272      6    white      no       9   cocaine    no         12
## 16  55.32189     14    white      no       3    heroin    no          0
## 17  43.94296     11    black      no       3   cocaine    no          2
## 18  40.48884      1    other      no       7   alcohol   yes         64
## 19  44.53589      7    white     yes       3   alcohol    no         33
## 20  51.81045     12    black     yes       5   alcohol   yes          9
## 21  54.10854     14 hispanic      no       4   cocaine   yes          0
## 22  54.79462      7    black      no       5   alcohol   yes         34
## 23  36.68874      4    black      no      10   cocaine    no         39
## 24  34.33698     10    white      no       6    heroin    no          0
## 25  39.33260     13    black     yes       6   cocaine   yes          1
## 26  49.29042     11    black     yes       3   cocaine   yes         26
## 27  33.48925      2    white      no       0   alcohol    no         13
## 28  28.74085      9    other      no       7   cocaine   yes          0
## 29  37.79718      7    black     yes       7   cocaine   yes          7
## 30  40.96234      7 hispanic     yes       9   alcohol    no         15
## 31  40.24271      1    white      no       5   cocaine   yes          2
## 32  45.16520      8    black      no       9   cocaine   yes          1
## 33  57.38777      9    black     yes       2   cocaine    no          0
## 34  55.73845     13    black     yes       7   cocaine   yes         29
## 35  30.50811      6    white     yes       0   alcohol    no         12
## 36  32.96189      3    white      no       4    heroin   yes          0
## 37  42.12069      7 hispanic      no       5   alcohol    no         59
## 38  43.24062     14    black      no      11   cocaine    no         16
## 39  35.90619      8    white      no      11   alcohol    no         26
## 40  37.75567      3    white      no       9   alcohol   yes         50
## 41  35.97361      0    black      no      14   cocaine    no         13
## 42  35.22702     10    white      no       4    heroin   yes         20
## 43  38.10227      2    black     yes       7   cocaine    no         19
## 44  21.91906      9    black      no       8    heroin    no          0
## 45  37.03778     11    black     yes       2   alcohol   yes         58
## 46  41.66993      8    other      no       3    heroin    no         32
## 47  48.21926     11    white      no       6    heroin    no          0
## 48  57.64361     11    black      no       0   cocaine    no          0
## 49  48.37090     12    white      no       4    heroin    no          0
## 50  38.51597      3    white     yes      11   cocaine   yes         38
## 51  23.48208     12    black     yes       0   cocaine    no         13
## 52  57.83691      3    white      no       5   alcohol   yes         16
## 53  56.90286      3    black     yes       4   cocaine   yes          1
## 54  46.79942      4    black      no       2   cocaine    no          0
## 55  58.49455      2    black      no       8   cocaine    no          4
## 56  43.25021      8    white      no       5   alcohol    no         10
## 57  52.42204     10    black      no       5    heroin    no         42
## 58  48.97176     11    black      no       4   cocaine   yes          0
## 59  46.36879      7 hispanic      no       4    heroin    no         13
## 60  37.35865      7    black     yes       2    heroin   yes          1
## 61  57.24648     12    black      no       2   cocaine    no          4
## 62  44.85584     10    black      no       4   cocaine    no          1
## 63  52.85658     11    black      no       7   alcohol   yes         10
## 64  51.49556      7    black     yes       8   cocaine   yes          8
## 65  33.07642      6 hispanic     yes       4    heroin   yes          8
## 66  45.79400     12    black      no       4   alcohol   yes         27
## 67  52.35651     10    white      no       4    heroin    no          0
## 68  37.83872     11    black      no       6   cocaine    no          1
## 69  46.74971      2    black      no      11    heroin   yes          1
## 70  64.35030      3    white      no       1    heroin   yes          2
## 71  35.70664      3    black      no      11   alcohol   yes         29
## 72  32.44772      2    black      no       9   alcohol   yes          0
## 73  39.93384      2    other      no       0    heroin   yes          0
## 74  39.09279      7    black      no       6   alcohol    no         67
## 75  36.52407      3    black     yes       5   alcohol    no         53
## 76  52.51404      8    other      no       7    heroin   yes          0
## 77  34.09209      0    other      no       2   alcohol    no         67
## 78  63.77832      8    black      no       4   cocaine   yes          6
## 79  55.44015     13    white      no       1    heroin   yes         26
## 80  31.11147      9    black      no       0   cocaine   yes         13
## 81  27.09086     13    white     yes       3   alcohol    no          0
## 82  41.95401     13    white      no       4    heroin    no          0
## 83  51.27790      3    black      no       9   cocaine    no          3
## 84  53.42212     14    black      no       4   cocaine    no         58
## 85  47.60948      9    white      no       4   alcohol   yes          6
## 86  32.99675      0    black      no       4   alcohol   yes         13
## 87  56.69189      3    black     yes       3    heroin    no          0
## 88  64.91865     14    black      no      12   cocaine   yes          0
## 89  54.52398     12 hispanic      no       0   alcohol    no         21
## 90  33.53111      7    black      no       2    heroin   yes          0
## 91  49.18084      9    other      no       9   alcohol    no         24
## 92  46.69877      0    black      no      10   cocaine   yes          6
## 93  38.40187      5    white      no       6   alcohol   yes         26
## 94  62.08943     11    black     yes       6   cocaine   yes          0
## 95  42.31495      8    black      no       1    heroin    no          3
## 96  41.10135      3    black      no       6   alcohol    no         19
## 97  58.21511      4    black     yes       1   cocaine    no          1
## 98  41.09781     14    white      no       1    heroin    no          0
## 99  69.17161      4    black      no       7   cocaine    no         11
## 100 54.09069      4    white      no       4   alcohol    no         19
## 101 57.76270      9    white     yes      14    heroin   yes         13
## 102 58.75759      1    black      no       2   cocaine   yes          4
## 103 38.86502      3    white      no       4   alcohol    no         26
## 104 54.98139     13    white      no       4   alcohol   yes          7
## 105 45.27036      2 hispanic      no       3   alcohol   yes         26
## 106 34.51623      8 hispanic     yes      14   alcohol    no         56
## 107 40.04572      1    white      no       2    heroin    no          1
##     max_drinks
## 1            5
## 2            4
## 3           20
## 4          129
## 5            0
## 6           13
## 7            0
## 8           24
## 9            8
## 10          13
## 11          20
## 12           6
## 13           0
## 14         164
## 15          18
## 16           0
## 17           2
## 18          64
## 19          38
## 20          15
## 21           0
## 22          34
## 23          95
## 24           0
## 25           1
## 26          26
## 27          13
## 28           0
## 29           7
## 30          15
## 31           2
## 32           3
## 33           0
## 34          29
## 35          12
## 36           0
## 37          59
## 38          20
## 39          33
## 40          50
## 41          32
## 42          20
## 43          26
## 44           0
## 45          58
## 46          38
## 47           0
## 48           0
## 49           0
## 50          38
## 51          13
## 52          26
## 53           6
## 54           0
## 55           4
## 56          14
## 57          48
## 58           0
## 59          13
## 60           2
## 61           4
## 62           1
## 63          20
## 64           8
## 65          13
## 66          33
## 67           0
## 68           1
## 69           1
## 70           2
## 71          58
## 72           0
## 73           0
## 74          67
## 75          53
## 76           0
## 77          80
## 78           6
## 79          26
## 80          13
## 81           0
## 82           0
## 83           3
## 84          58
## 85           6
## 86          26
## 87           0
## 88           0
## 89          21
## 90           0
## 91          51
## 92          13
## 93          41
## 94           0
## 95          16
## 96          19
## 97           1
## 98           0
## 99          14
## 100         26
## 101         26
## 102          4
## 103         26
## 104          8
## 105         32
## 106         61
## 107         24

We will do the same for males and create a new dataframe called “males” to save the data in:

males<-filter(HELPrct, sex=='male')
males
##     age anysubstatus anysub cesd  d1 daysanysub dayslink drugrisk e2b
## 1    37            1    yes   49   3        177      225        0  NA
## 2    37            1    yes   30  22          2       NA        0  NA
## 3    26            1    yes   39   0          3      365       20  NA
## 4    32            1    yes   39  12          2       57        0   1
## 5    28            1    yes   32   1         47      365        7   8
## 6    39            1    yes   46   4        115      382       20   3
## 7    58            1    yes   22   5          6      365        0  NA
## 8    60            1    yes   36  10          6       22        0   1
## 9    36            1    yes   43   2          0      443        0  NA
## 10   35            1    yes   19   1          2      405        0  NA
## 11   29            0     no   40   2        220      449        0   1
## 12   27            1    yes   37   1         52      367        0  NA
## 13   41           NA   <NA>   35   1         NA      391       12   1
## 14   33            1    yes   18   1        129      272        0  NA
## 15   34           NA   <NA>   36   4         NA      293        0   2
## 16   31            1    yes   28   2          3      428        0   3
## 17   34            1    yes   30   1        154       56        0  NA
## 18   35            1    yes   27   0         34      361        1  NA
## 19   34            0     no   24   0        204      365        0  NA
## 20   29            1    yes   47   1        142       79        0   3
## 21   35            0     no   45   2        189      364        0  NA
## 22   43            1    yes   18  10          4      365        0  NA
## 23   37            0     no   11   0        203      203        3  NA
## 24   29            0     no   26   1        193      354        0  NA
## 25   33            1    yes   29   1         10       29        0  NA
## 26   20            1    yes   34   1        177      365        0  NA
## 27   38            0     no   37   2        195      365        0   3
## 28   28            1    yes   23   0          7      365        1   2
## 29   33            1    yes   41   7         14      365        0   3
## 30   40           NA   <NA>   21   0         NA      365        1  NA
## 31   43            0     no   16  15        191      414        0  NA
## 32   28            1    yes   36   1         31      414        0  NA
## 33   45            0     no   17   2        174       43        0   2
## 34   42            1    yes   36   2         17       38        7  NA
## 35   30           NA   <NA>   19   0         NA      264        0  NA
## 36   36            1    yes   25   2          2      377        0  NA
## 37   44           NA   <NA>   36   5         NA      321       19   1
## 38   41            1    yes   27   0         30       NA        0  NA
## 39   30            0     no   44   2        209       26       21   2
## 40   37            1    yes   29   2        111       18        0  NA
## 41   37            1    yes   16   5        137      171        0  NA
## 42   44            1    yes   44   1          4       27        0  NA
## 43   47            1    yes   42   2          3      190        0   4
## 44   38            1    yes   30   5         18       30        0   2
## 45   37            1    yes   25   0          2      365        1  NA
## 46   34            1    yes   26   1          1      365        0  11
## 47   35            1    yes   28   1         36      400        0   1
## 48   36           NA   <NA>   33   0         NA      365        0   1
## 49   27            0     no   44   3        252      431        0   1
## 50   36            0     no   29   1        195      195        0   1
## 51   38           NA   <NA>   26   4         NA      133        1  NA
## 52   42            1    yes   31   2        103       48        8   3
## 53   43            1    yes   28  10         78      365        0  NA
## 54   28            1    yes   29   3          9      129        0   2
## 55   30            1    yes   32   2         53       NA        3  NA
## 56   42           NA   <NA>   30   4         NA       35        0  NA
## 57   22            1    yes   34   7          4      365        0   1
## 58   31           NA   <NA>   49   2         NA      439        3   1
## 59   30            0     no   36   0        177       44        0   3
## 60   25           NA   <NA>   42   1         NA      365        1   1
## 61   26            1    yes   40   1          4       77       10  NA
## 62   35            1    yes   29   1         47       35        0   1
## 63   53            1    yes   31   3          5      365        0   1
## 64   29           NA   <NA>   10   2         NA      143        0  NA
## 65   24            1    yes   32   2        168      115        3   1
## 66   35            1    yes   16   1         20      386        1   3
## 67   32            1    yes   15   0         55      365        0  NA
## 68   47            1    yes    4   2         56       63        1  NA
## 69   26           NA   <NA>   30   2         NA      365        0  NA
## 70   45            1    yes   44   2         63       35       14   1
## 71   33           NA   <NA>    8   1         NA       NA        0  NA
## 72   45           NA   <NA>   16  20         NA      365        0   2
## 73   27            1    yes   49   1        222      136        0  NA
## 74   40            1    yes   30   2          9       37        1  NA
## 75   37            1    yes   48   3         16      349        0  NA
## 76   26            1    yes   17   1         59       NA        0  NA
## 77   27            1    yes   39   0        102      365        0   3
## 78   29           NA   <NA>   24   0         NA       NA       10   2
## 79   33           NA   <NA>   25   2         NA       60        0  NA
## 80   39            1    yes   51   3          2      365        0   5
## 81   33            1    yes   17   3          3      365        7  NA
## 82   35            1    yes   37  20         63      399        0  NA
## 83   38           NA   <NA>   45   0         NA       NA        0   1
## 84   44            1    yes   28   1         47      112       17   1
## 85   28           NA   <NA>   17   3         NA      365        0  NA
## 86   33           NA   <NA>   23   0         NA       NA        0  NA
## 87   35            1    yes   38   2        114      365        0   4
## 88   37            0     no   47   0        183      169        0  NA
## 89   41           NA   <NA>   49   4         NA      365        0   1
## 90   28            1    yes   34   5          0      325       17   2
## 91   35            1    yes   33   2          2      345        0  14
## 92   41            1    yes   28   1         17      104        0  NA
## 93   37            0     no    4   2        183       36        0  NA
## 94   39            1    yes   40   3         11      365        0   4
## 95   32           NA   <NA>   33   2         NA       NA        0  NA
## 96   33           NA   <NA>   28   1         NA       90        0   2
## 97   27            1    yes   21   0        163      169        0  NA
## 98   33            1    yes   33   0          7      399        1  NA
## 99   43            1    yes   45   6          4      358        0   8
## 100  35            1    yes   31  10        185      387        0   1
## 101  49            1    yes   22   5          1      126        0   4
## 102  33           NA   <NA>   39   1         NA      365        1   1
## 103  24            0     no   31   0        183       52        9   1
## 104  45            0     no   48   2        185       50        0   7
## 105  46           NA   <NA>   34  20         NA       NA        0  NA
## 106  32            0     no   46   2        183       42        0  NA
## 107  45           NA   <NA>   34   1         NA      303       11   2
## 108  39            0     no   10   0        186       30        0   1
## 109  34            1    yes   31   1        146      113        0  NA
## 110  32           NA   <NA>   34   2         NA      365        0   3
## 111  32            1    yes   26   2          5      369        0   1
## 112  45           NA   <NA>   48   1         NA       98        0   2
## 113  30           NA   <NA>   37   1         NA      338        0  NA
## 114  36            1    yes   20   8         57      365        7   1
## 115  25            1    yes   38   3          0      414        8   1
## 116  48            0     no   39   8        178       58        0  NA
## 117  42            0     no   46   1        256      368        0   1
## 118  33            1    yes   17   1         61      364        0   1
## 119  36           NA   <NA>    6   1         NA      365        1  NA
## 120  41           NA   <NA>   18   4         NA      365        0  NA
## 121  57           NA   <NA>   51  10         NA      365        0  NA
## 122  47           NA   <NA>   31   2         NA      365        5  NA
## 123  54            1    yes   49   0          0       38        0   4
## 124  55            0     no   43   1        164       31        0  NA
## 125  33            1    yes   45   1         13      330       10   1
## 126  28           NA   <NA>   41   3         NA      443       11   2
## 127  37            0     no   29   2        163       29        0  NA
## 128  32           NA   <NA>   51   1         NA      365        0  NA
## 129  39           NA   <NA>   29   2         NA       14        0   2
## 130  29           NA   <NA>   31   1         NA      424       13   1
## 131  33           NA   <NA>   38   0         NA       NA        0   2
## 132  31           NA   <NA>   39  10         NA       17        2  NA
## 133  31            1    yes   23   0          9       15        0  NA
## 134  46            1    yes   44   1        144       14        0   6
## 135  36            1    yes   12   1         11      140        0  NA
## 136  22            1    yes   35   0          1      365        0   4
## 137  33            1    yes   47   2         27      365        0   2
## 138  35           NA   <NA>   53   2         NA      365       14   2
## 139  28           NA   <NA>   15   1         NA       48        0  NA
## 140  33           NA   <NA>   31   2         NA       32        0   2
## 141  49            1    yes   27   2         61      365        0  NA
## 142  34            0     no   31   2        183       30        0  NA
## 143  41           NA   <NA>   22   4         NA      365        0  NA
## 144  24           NA   <NA>   23   0         NA      365        0  NA
## 145  32            0     no   26   4        192       22        0   3
## 146  39           NA   <NA>   40   1         NA      365        0   1
## 147  19           NA   <NA>   40   1         NA       63        0   8
## 148  49            1    yes    1   2        166       78        0  NA
## 149  27           NA   <NA>   41   4         NA      365        1   4
## 150  22            0     no   16   1        162      357        0  NA
## 151  36            1    yes   33   3         47       12        0  NA
## 152  32            1    yes    4   0         88       50        0  NA
## 153  41            1    yes   40   2         63       22        0  NA
## 154  36            1    yes   39   2         94        7        0  NA
## 155  43            1    yes   32   2         73       70        0  NA
## 156  39            1    yes   51   4         33      331        0  NA
## 157  32            1    yes   40   6        183       76        0  NA
## 158  33            1    yes   22   0          9      183        0  NA
## 159  35           NA   <NA>   49   4         NA       43        0   1
## 160  31            1    yes   35   1         32      307        1   3
## 161  25           NA   <NA>   43   0         NA      365        0  NA
## 162  48            1    yes   27   1         74      353        0   6
## 163  35           NA   <NA>   40   1         NA       37        0  NA
## 164  42           NA   <NA>   38   4         NA      349        0   2
## 165  51            1    yes   39   6          4      272        0   4
## 166  32            1    yes   35   6         70       37        0  NA
## 167  41            1    yes   34   2          2      365        0   3
## 168  30           NA   <NA>   39   2         NA      442        0  NA
## 169  38           NA   <NA>   58   8         NA      452        0   1
## 170  41           NA   <NA>   38   2         NA       24        2  NA
## 171  29           NA   <NA>   46   2         NA      336        0   3
## 172  36           NA   <NA>   31  10         NA      365        0   1
## 173  45           NA   <NA>   11   0         NA      379        0  NA
## 174  36           NA   <NA>   32   2         NA      434       10  NA
## 175  30            1    yes   33   1         59       12        0  NA
## 176  40            1    yes   39   1         16      294        0  NA
## 177  39           NA   <NA>   27   1         NA       21        0  NA
## 178  39            0     no   43   4        170      350        0   2
## 179  37            1    yes   30   1          2      440        0   5
## 180  43            1    yes   12   4         11      236        0   4
## 181  20            1    yes   42   1         20      365        0  NA
## 182  35            1    yes   31   2         32       35        5  17
## 183  32           NA   <NA>   40   6         NA       29       11   2
## 184  42            0     no   17   0        188      456        0  NA
## 185  27           NA   <NA>   44   0         NA      279        0  NA
## 186  30           NA   <NA>   15   2         NA      365        0  NA
## 187  27           NA   <NA>   41   0         NA      365        8   3
## 188  41           NA   <NA>   51   3         NA      349        0  NA
## 189  32            1    yes   24  20          7       46        6   1
## 190  47            1    yes   29   1         31      368        0   1
## 191  36           NA   <NA>   40   2         NA      365        0   2
## 192  32            1    yes   33   2          2      365        0   1
## 193  29           NA   <NA>   46   0         NA       79        8  NA
## 194  34            1    yes   42   0         52      365        1   2
## 195  40           NA   <NA>   17   2         NA      365        0   2
## 196  45           NA   <NA>   22   3         NA      365        7  21
## 197  32           NA   <NA>   11   2         NA       17        0  NA
## 198  31            1    yes   14   0          2      365        0   1
## 199  39            1    yes   26   0         94      425        0  NA
## 200  49            1    yes   36   1         94      365        0  NA
## 201  43           NA   <NA>   18   0         NA      365       10  NA
## 202  38           NA   <NA>   19   1         NA      365        0  NA
## 203  23            1    yes   44  20         45      207        0  NA
## 204  29           NA   <NA>   19   1         NA      318        0  NA
## 205  43            1    yes    9   2          0      365        0   2
## 206  29           NA   <NA>   55   0         NA      365        0  NA
## 207  39            1    yes   18   0         16      358        0   2
## 208  35           NA   <NA>   12   1         NA      441        0   2
## 209  22            1    yes   33   2          3       30        0  NA
## 210  39            1    yes   32   1        132       41        0  NA
## 211  38            1    yes   20   1         NA      285        0   2
## 212  56            1    yes   37  36          0      412        3  11
## 213  40           NA   <NA>   43   1         NA       15       17   2
## 214  39           NA   <NA>   27   5         NA      293        8   4
## 215  47            1    yes   40   2          3      365        0  NA
## 216  32            1    yes    8   3         30      373        0   1
## 217  41            1    yes   54   3          1      356        4  NA
## 218  32            0     no   35   1        191       21        0  NA
## 219  41            0     no   50   2        174       17        1   1
## 220  31            1    yes   55   5         65      365        0   1
## 221  30            1    yes   37   6          8      303       16   1
## 222  32            1    yes   20   1         93      449        0   1
## 223  35           NA   <NA>   40   1         NA       77        0  NA
## 224  32           NA   <NA>   37   1         NA       35        0   3
## 225  33           NA   <NA>   43   0         NA      365        0   2
## 226  30            1    yes    8   8          5       32        1  NA
## 227  44           NA   <NA>   56   3         NA      365        0   2
## 228  46            1    yes   51   0         62      365        0   2
## 229  47           NA   <NA>   36   4         NA      365       13   5
## 230  34            1    yes   49   0         93       32        0  NA
## 231  40            1    yes   53   2          1      393        0   7
## 232  34            1    yes   15  15          5       NA        0  NA
## 233  37            1    yes    6   5          1      364        1  NA
## 234  27           NA   <NA>   31   1         NA       31        0   1
## 235  39            0     no   40   0        178        9        4  NA
## 236  23            1    yes   37   1          0      359       20   4
## 237  53            0     no   40   2        175       80       19   2
## 238  31           NA   <NA>   41   1         NA      365        0  NA
## 239  32            1    yes   39   0         15       14        0   1
## 240  33            0     no   38   1        219      398        0   1
## 241  25            1    yes   38   0          1       40        1   1
## 242  37           NA   <NA>    9   1         NA       40        0  NA
## 243  26            1    yes   36   0         18       74        0  NA
## 244  29           NA   <NA>   27   0         NA      308        5   2
## 245  30            0     no   26   1        215        7        0  NA
## 246  33           NA   <NA>   24   1         NA      300        0  NA
## 247  36            1    yes   16   1        125      361        0   1
## 248  23           NA   <NA>   34   3         NA      393        0   1
## 249  36            1    yes   46   8          5        9        0   5
## 250  34            1    yes   24   1          2      350        2   1
## 251  28            1    yes   25   2          1      365        0   2
## 252  30            1    yes   31   0         15        6        0  NA
## 253  41           NA   <NA>   37   1         NA       19        0  NA
## 254  31           NA   <NA>   28   1         NA      123        1   4
## 255  28           NA   <NA>   27   0         NA       44        0  NA
## 256  59           NA   <NA>    6   2         NA      365        0  NA
## 257  39            1    yes   21   0         31      363        0  NA
## 258  36           NA   <NA>   29   0         NA       33        0  NA
## 259  47            1    yes   23   1         32      152        0  NA
## 260  26           NA   <NA>   35   0         NA      365        0  NA
## 261  22            1    yes   55   0         10      338       11   2
## 262  36           NA   <NA>    3   0         NA      365        0  NA
## 263  34           NA   <NA>   36   1         NA      365        2   6
## 264  27           NA   <NA>   40   1         NA      365        3   2
## 265  21           NA   <NA>   28   3         NA      331        0   1
## 266  33           NA   <NA>   21   0         NA      309        0  NA
## 267  42            1    yes   34   5          3      289       11   1
## 268  46           NA   <NA>   42   2         NA      306        0   2
## 269  26            1    yes   23   4        106      410        0  NA
## 270  36            1    yes   36   3          3      362        0  NA
## 271  48            0     no   30   2        191       16        0   1
## 272  32           NA   <NA>   25   5         NA      340       10  NA
## 273  38           NA   <NA>   35   7         NA      365        0   1
## 274  43            1    yes   23   2         61       11        0   2
## 275  30            1    yes   16   0         30      365        0  NA
## 276  40            0     no   27   1        176       41        0  NA
## 277  38           NA   <NA>   14   0         NA      292        1  NA
## 278  22            0     no   44   1        260      376       NA   5
## 279  22           NA   <NA>   48   2         NA        8        2   3
## 280  37            0     no   11   1        210      370        0   2
## 281  44            1    yes   41   3          0      365        0   1
## 282  38            0     no   37   1        165      166        1  NA
## 283  37            1    yes   31   2          2       89        0   3
## 284  43            1    yes   34   4          2      418        5  NA
## 285  39            1    yes   40   8          0      247        0   3
## 286  45            1    yes   37   2          2      322        3  NA
## 287  39            0     no   30   8        154      265        0  NA
## 288  32            1    yes   51   0          5       NA        6   3
## 289  47            1    yes   12   1         NA      345        0  NA
## 290  24            1    yes   39   2         32      365        0   3
## 291  27            1    yes   10   1          2       20        0  NA
## 292  53           NA   <NA>   57   4         NA      365        0  NA
## 293  39           NA   <NA>   17   1         NA       34        0   4
## 294  32           NA   <NA>   20   4         NA      365        1  NA
## 295  27           NA   <NA>   49   2         NA      365        0   1
## 296  31           NA   <NA>   26   1         NA      365        0  NA
## 297  41           NA   <NA>   28   3         NA      365        0  NA
## 298  28           NA   <NA>   18  17         NA       85        0  NA
## 299  39           NA   <NA>   39   8         NA      365        0  NA
## 300  39           NA   <NA>   51   0         NA      365       12   3
## 301  31           NA   <NA>   45   5         NA      365        5  NA
## 302  29           NA   <NA>   28   2         NA      118        2   1
## 303  25           NA   <NA>   31   7         NA       68        0  NA
## 304  41           NA   <NA>   21   5         NA      365        0  NA
## 305  27           NA   <NA>   41   3         NA      365        0   1
## 306  21           NA   <NA>   38   1         NA       44       14   4
## 307  27           NA   <NA>   36   5         NA       NA        0  NA
## 308  31           NA   <NA>   24   1         NA      365        0   1
## 309  41           NA   <NA>   10   0         NA      365        0  NA
## 310  33           NA   <NA>   41   1         NA      365        0  NA
## 311  49           NA   <NA>   51   1         NA      365        8   3
## 312  41           NA   <NA>   45   4         NA      365        1   1
## 313  25           NA   <NA>   29   0         NA       44        6  NA
## 314  41           NA   <NA>   56   4         NA       10        0  NA
## 315  34           NA   <NA>   34   1         NA       87        0   2
## 316  29           NA   <NA>    4   0         NA      365        0  NA
## 317  28           NA   <NA>   32   0         NA      365        0  NA
## 318  29           NA   <NA>   38   2         NA       NA        9   1
## 319  36           NA   <NA>   26   0         NA      115        0   5
## 320  36           NA   <NA>   27   0         NA      365        0  NA
## 321  24           NA   <NA>   21   4         NA      365        0  NA
## 322  38           NA   <NA>   30   2         NA        6        0   2
## 323  31           NA   <NA>    7   1         NA      365        0  NA
## 324  26           NA   <NA>   35   1         NA      365        0  NA
## 325  26           NA   <NA>   36   4         NA      365        1  NA
## 326  33           NA   <NA>   15   0         NA      365        0  NA
## 327  46           NA   <NA>   48 100         NA      365        0  NA
## 328  33           NA   <NA>   31   0         NA      365        0   1
## 329  39           NA   <NA>   54   6         NA       64        0  NA
## 330  27           NA   <NA>   21   1         NA      365        9   1
## 331  23           NA   <NA>   23   0         NA      365        5   2
## 332  33           NA   <NA>   33   2         NA      365       11   1
## 333  26           NA   <NA>   14   0         NA      365        0  NA
## 334  38           NA   <NA>   27  10         NA      365        0  NA
## 335  52           NA   <NA>   24   1         NA      365        0   1
## 336  39           NA   <NA>   33   2         NA      365        3   1
## 337  36           NA   <NA>   25   1         NA        2        1  NA
## 338  44           NA   <NA>   37   0         NA       NA        0   2
## 339  37           NA   <NA>   47   2         NA        4       21  NA
## 340  31           NA   <NA>    9   1         NA      365        0  NA
## 341  25           NA   <NA>   37   3         NA      365        0   3
## 342  24           NA   <NA>   34   0         NA      365       13   2
## 343  33           NA   <NA>   28   1         NA      365        0   1
## 344  49           NA   <NA>   37   0         NA        7        0  NA
## 345  59           NA   <NA>   11   2         NA      365        0   1
## 346  45           NA   <NA>   35   1         NA      365        0   1
##     female  sex g1b homeless  i1  i2  id indtot linkstatus link       mcs
## 1        0 male yes   housed  13  26   1     39          1  yes 25.111990
## 2        0 male yes homeless  56  62   2     43         NA <NA> 26.670307
## 3        0 male  no   housed   0   0   3     41          0   no  6.762923
## 4        0 male  no homeless  10  13   5     38          1  yes 21.675755
## 5        0 male yes homeless  12  24   8     44          0   no  9.160530
## 6        0 male  no homeless  20  27  10     44          0   no 36.143761
## 7        0 male  no homeless  20  31  14     40          0   no 49.089302
## 8        0 male  no homeless  13  20  15     41          1  yes 25.846157
## 9        0 male  no   housed  51  51  16     38          0   no 23.608444
## 10       0 male  no   housed   0   0  18     17          0   no 42.166462
## 11       0 male yes homeless   1   1  19     40          0   no 16.732292
## 12       0 male  no   housed  23  23  21     37          0   no 55.128109
## 13       0 male  no   housed  26  26  22     36          0   no 20.871447
## 14       0 male  no   housed   0   0  23     27          1  yes 47.286739
## 15       0 male yes homeless  34  34  24     42          0   no 19.620596
## 16       0 male  no homeless   4   5  25     42          0   no 44.442104
## 17       0 male  no   housed   3   3  28     34          1  yes 37.371555
## 18       0 male  no homeless   7   7  30     37          0   no 34.335667
## 19       0 male yes   housed  24  48  31     41          0   no 46.340755
## 20       0 male  no homeless   0   0  32     37          1  yes 27.717710
## 21       0 male  no homeless  20  20  33     44          0   no 18.984324
## 22       0 male  no homeless   3   3  34     41          0   no 58.241264
## 23       0 male  no homeless   6   6  35     35          1  yes 27.852608
## 24       0 male  no   housed   0   0  36     21          0   no 54.774349
## 25       0 male  no   housed   0   0  37     30          1  yes 27.495481
## 26       0 male  no homeless  32 135  38     33          0   no 56.324333
## 27       0 male  no   housed   2  24  39     43          0   no 37.006042
## 28       0 male  no   housed   3   3  40     41          0   no 39.897774
## 29       0 male yes homeless  27  27  42     41          0   no 18.640594
## 30       0 male  no   housed   3   7  43     32          0   no 45.134098
## 31       0 male  no homeless  24  36  44     41          0   no 15.861924
## 32       0 male  no homeless   6  12  45     39          0   no 24.148815
## 33       0 male  no homeless   0   0  46     22          1  yes 29.901625
## 34       0 male  no   housed  13  13  47     39          1  yes 29.412977
## 35       0 male  no homeless  25  28  49     38          1  yes 35.206970
## 36       0 male  no   housed  13  61  51     36          0   no 20.999893
## 37       0 male yes homeless  15  26  52     42          0   no 29.390280
## 38       0 male yes   housed   7   7  53     31         NA <NA> 26.773279
## 39       0 male yes homeless   9  15  54     44          1  yes 17.925251
## 40       0 male  no homeless   5  13  56     40          1  yes 34.434696
## 41       0 male yes   housed  34  34  58     29          1  yes 47.671936
## 42       0 male yes   housed   3   6  59     44          1  yes 26.653036
## 43       0 male yes homeless  37  43  60     43          1  yes 28.469273
## 44       0 male  no homeless  36  36  61     38          1  yes 26.065777
## 45       0 male yes   housed  13  15  62     34          0   no 31.501711
## 46       0 male  no   housed   3  19  63     41          0   no 24.998930
## 47       0 male  no   housed  32  32  67     38          0   no 35.839642
## 48       0 male  no   housed  35  42  68     42          0   no 17.565235
## 49       0 male yes homeless  20  20  69     41          0   no 20.025341
## 50       0 male  no homeless   7  25  70     38          1  yes 25.812592
## 51       0 male  no   housed   0   0  72     38          1  yes 39.934162
## 52       0 male  no homeless  26  51  73     44          1  yes 23.996725
## 53       0 male  no   housed  18  36  76     38          0   no 38.752102
## 54       0 male  no   housed   6  12  78     29          1  yes 34.839962
## 55       0 male  no   housed  13  17  80     35         NA <NA> 22.957235
## 56       0 male yes homeless   5   5  81     28          1  yes 28.418003
## 57       0 male  no homeless   2   2  82     31          0   no 33.115913
## 58       0 male  no homeless 102 102  83     40          0   no 14.913925
## 59       0 male yes homeless   0   0  84     44          1  yes 17.449858
## 60       0 male yes   housed  21  21  85     36          0   no 13.134663
## 61       0 male yes homeless   6   8  86     29          1  yes 19.344807
## 62       0 male  no   housed   1   1  87     42          1  yes 26.221968
## 63       0 male  no homeless  19  19  88     40          0   no 34.210976
## 64       0 male  no   housed   1  22  89     29          1  yes 52.926834
## 65       0 male  no homeless   0   0  91     39          1  yes 26.918222
## 66       0 male  no   housed  26  47  93     39          0   no 39.298168
## 67       0 male  no   housed   0   0  94     35          0   no 47.550678
## 68       0 male  no homeless   9  19  95     38          1  yes 54.053368
## 69       0 male  no   housed  10  10  96     40          0   no 37.845036
## 70       0 male yes homeless   4   5  97     44          1  yes 20.202173
## 71       0 male  no   housed   6  15  98     19         NA <NA> 51.788670
## 72       0 male yes homeless  26  51  99     43          0   no 32.566528
## 73       0 male yes homeless  26  26 102     34          1  yes 16.302422
## 74       0 male yes   housed   2   3 103     42          1  yes 15.754984
## 75       0 male yes   housed  61 184 105     40          0   no 23.659925
## 76       0 male yes   housed   2   2 106     39         NA <NA> 34.737865
## 77       0 male  no homeless  19  19 107     40          0   no 15.618371
## 78       0 male  no   housed   0   0 109     38         NA <NA> 40.941338
## 79       0 male yes   housed  18  47 110     41          1  yes 24.330456
## 80       0 male yes homeless  51  51 111     42          0   no 15.196477
## 81       0 male  no   housed   0   0 112     37          0   no 50.788845
## 82       0 male  no homeless  36  66 113     43          0   no 23.554617
## 83       0 male  no   housed  31  91 114     38         NA <NA> 15.822761
## 84       0 male  no   housed   0   0 115     33          1  yes 45.402626
## 85       0 male  no   housed  26  69 116     34          0   no 53.616177
## 86       0 male  no   housed   2  20 117     28         NA <NA> 59.264427
## 87       0 male yes homeless  51  51 119     43          0   no 12.432887
## 88       0 male  no   housed  19  26 122     42          1  yes 21.912630
## 89       0 male  no homeless  13  13 123     33          0   no 28.972683
## 90       0 male  no homeless   0   0 124     36          0   no 16.284695
## 91       0 male  no homeless  13  13 126     19          0   no 41.590557
## 92       0 male  no   housed  22  22 128     25          1  yes 39.450993
## 93       0 male  no homeless  13  33 129     42          1  yes 42.539974
## 94       0 male yes homeless  19  30 132     39          0   no 22.669971
## 95       0 male yes homeless  26  26 133     41         NA <NA> 45.529411
## 96       0 male  no homeless   3   3 135     40          1  yes 23.729639
## 97       0 male  no   housed  24  24 136     40          1  yes 40.676174
## 98       0 male  no   housed   0   0 137     29          0   no 28.075939
## 99       0 male yes homeless  53  53 140     39          0   no 21.460621
## 100      0 male  no homeless  25  25 142     38          0   no 33.652927
## 101      0 male yes homeless  64 179 144     42          1  yes 45.491100
## 102      0 male yes homeless   4   4 148     42          0   no 23.371147
## 103      0 male yes homeless   3   6 149     37          1  yes 34.598862
## 104      0 male  no homeless  13  13 151     42          1  yes 29.082914
## 105      0 male  no   housed  20  51 152     37         NA <NA> 24.422007
## 106      0 male  no homeless  38  38 154     43          1  yes 18.690155
## 107      0 male  no homeless   8   8 156     40          0   no 27.683458
## 108      0 male  no homeless   0   0 158     34          1  yes 47.145802
## 109      0 male  no   housed  13  13 160     43          1  yes 33.517311
## 110      0 male  no homeless  39  39 163     30          0   no 41.131794
## 111      0 male  no   housed  12  20 164     44          0   no 24.090509
## 112      0 male  no   housed   0   0 167     37          1  yes 20.069775
## 113      0 male  no   housed   1   1 168     29          0   no 18.211269
## 114      0 male  no   housed  19  32 169     43          0   no 30.071957
## 115      0 male  no   housed   0   0 170     30          0   no 28.679745
## 116      0 male yes   housed  26  51 172     37          1  yes 20.517740
## 117      0 male  no   housed  19  19 173     29          0   no 31.188143
## 118      0 male  no homeless   3   6 174     41          0   no 43.881058
## 119      0 male  no   housed   1   1 177     35          0   no 56.784805
## 120      0 male  no   housed  12  17 178     41          0   no 39.074711
## 121      0 male  no homeless  38  38 180     42          0   no 21.200043
## 122      0 male  no   housed   4   4 182     38          0   no 10.564762
## 123      0 male yes homeless  19  50 183     41          1  yes 22.640652
## 124      0 male  no   housed  41  54 185     40          1  yes 39.270416
## 125      0 male  no   housed   1   3 186     36          0   no 18.771036
## 126      0 male  no homeless  19  19 189     42          0   no 21.049545
## 127      0 male  no   housed   8   8 190     34          1  yes 50.018494
## 128      0 male  no   housed  12  12 192     34          0   no  7.938221
## 129      0 male  no homeless  12  20 198     36          1  yes 41.054363
## 130      0 male yes homeless   1   3 199     36          0   no 29.860514
## 131      0 male  no homeless  10  13 201     44         NA <NA> 26.252979
## 132      0 male yes   housed   3  24 202     41          1  yes 40.167236
## 133      0 male yes   housed   6  12 206     32          1  yes 25.615507
## 134      0 male  no homeless 102 102 208     38          1  yes 14.358881
## 135      0 male  no   housed   1   4 209     39          1  yes 27.122667
## 136      0 male  no   housed   0   0 210     29          0   no 36.823708
## 137      0 male yes   housed  58  58 211     41          0   no 17.509274
## 138      0 male  no   housed   9   9 212     37          0   no 17.927528
## 139      0 male  no   housed  35  65 214     43          1  yes 47.711655
## 140      0 male  no   housed  33  51 215     42          1  yes 20.731987
## 141      0 male  no   housed  19  19 217     28          0   no 52.455845
## 142      0 male  no   housed   0   0 222     38          1  yes 23.058514
## 143      0 male  no   housed   6   6 223     40          0   no 45.011848
## 144      0 male  no   housed  18  18 225     36          0   no 48.410297
## 145      0 male yes homeless   0   0 230     41          1  yes 46.119808
## 146      0 male  no   housed  46  46 231     32          0   no 35.955441
## 147      0 male  no homeless  27  30 232     40          1  yes 30.300137
## 148      0 male  no homeless   3   3 233     40          1  yes 59.453930
## 149      0 male yes homeless  12  12 235     42          0   no 23.546112
## 150      0 male  no homeless  26  26 238     29          0   no 46.729744
## 151      0 male  no   housed  23  92 239     40          1  yes 37.674961
## 152      0 male  no   housed  13  13 240     34          1  yes 57.260887
## 153      0 male yes homeless  26  26 243     43          1  yes 35.235611
## 154      0 male  no homeless  13  13 245     35          1  yes 48.239128
## 155      0 male  no homeless  13  13 246     35          1  yes 30.371395
## 156      0 male yes homeless  23  42 248     42          1  yes 22.884369
## 157      0 male  no   housed  15  15 250     34          1  yes 30.280018
## 158      0 male  no   housed  19  20 253     30          1  yes 47.979435
## 159      0 male  no   housed   2   3 256     39          1  yes 25.039495
## 160      0 male yes homeless  13  26 257     45          0   no 26.453758
## 161      0 male  no   housed  14  16 258     43          0   no 14.480626
## 162      0 male  no homeless  51  51 259     36          0   no 52.789551
## 163      0 male  no homeless  10  26 260     37          1  yes 35.576111
## 164      0 male  no homeless  16  16 261     42          0   no 26.799009
## 165      0 male yes homeless 102 102 262     44          1  yes 27.808109
## 166      0 male yes   housed   6  20 265     33          1  yes 27.650967
## 167      0 male  no homeless  27  27 268     42          0   no 27.177586
## 168      0 male  no homeless  27  41 270     33          1  yes 31.328341
## 169      0 male yes   housed  54  73 273     45          0   no 16.125675
## 170      0 male yes   housed  24  36 274     40          1  yes 17.625854
## 171      0 male  no homeless  30  41 276     42          0   no 27.898603
## 172      0 male  no homeless  43  43 277     39          0   no 23.683241
## 173      0 male  no   housed   2   2 278     21          0   no 58.168713
## 174      0 male  no   housed  16  16 279     37          0   no 31.777193
## 175      0 male  no   housed   3   3 280      4          1  yes 52.955296
## 176      0 male  no   housed  34  51 283     36          1  yes 24.813925
## 177      0 male  no   housed  28  28 285     42          1  yes 46.830055
## 178      0 male  no   housed  13  13 287     44          0   no 16.398746
## 179      0 male  no   housed  51  51 288     38          0   no 36.798199
## 180      0 male  no homeless 134 140 289     42          1  yes 55.991005
## 181      0 male yes homeless   5   6 290     28          0   no 41.624405
## 182      0 male  no homeless   5   5 291     40          1  yes 19.645632
## 183      0 male yes   housed   3   3 292     44          1  yes 26.919926
## 184      0 male  no   housed   0   0 293     37          0   no 37.953053
## 185      0 male yes   housed  26  26 294     32          0   no 31.877844
## 186      0 male  no   housed  15  30 295     30          0   no 54.970051
## 187      0 male yes homeless   9  20 296     39          0   no 30.701992
## 188      0 male  no   housed  10  15 297     41          0   no 27.607288
## 189      0 male yes   housed   0   0 298     31          1  yes 29.505835
## 190      0 male yes   housed  24  45 299     39          0   no 21.931257
## 191      0 male yes homeless  33  51 300     40          0   no 20.979116
## 192      0 male  no   housed   0   0 302     32          0   no 28.558788
## 193      0 male yes   housed   0   0 307     39          1  yes 11.819070
## 194      0 male  no homeless   3   3 309     40          0   no 25.548498
## 195      0 male  no homeless  14  20 310     39          0   no 34.139271
## 196      0 male  no homeless  12  12 311     38          0   no 29.400602
## 197      0 male  no   housed   0   0 315     27          1  yes 56.963795
## 198      0 male  no   housed   0   0 317     29          0   no 41.195469
## 199      0 male  no   housed  25  33 318     39          0   no 36.719200
## 200      0 male yes   housed  42  57 319     40          0   no 48.008137
## 201      0 male  no   housed   6   6 322     32          0   no 58.477470
## 202      0 male  no   housed  19  19 323     38          0   no 62.031616
## 203      0 male yes homeless   0   0 326     37          1  yes 24.378925
## 204      0 male  no homeless  22  32 328     31          0   no 18.677704
## 205      0 male  no homeless  19  19 329     19          0   no 58.899960
## 206      0 male yes homeless  13  19 331     41          0   no 15.773271
## 207      0 male yes   housed   1   1 332     34          0   no 34.541599
## 208      0 male  no homeless  13  13 334     40          0   no 51.918278
## 209      0 male  no   housed  20  20 335     37          1  yes 23.137871
## 210      0 male  no   housed   0   0 336     39          1  yes 22.939909
## 211      0 male  no   housed   3   9 337     26          0   no 33.888065
## 212      0 male  no homeless 142 142 338     37          0   no 34.412716
## 213      0 male yes homeless  64  64 341     32          1  yes 22.354912
## 214      0 male  no   housed   2   2 343     42          0   no 19.718121
## 215      0 male  no homeless  51  51 346     43          0   no 28.747435
## 216      0 male  no   housed   1   1 347     12          0   no 55.912579
## 217      0 male  no homeless  24  30 348     44          0   no 18.948950
## 218      0 male  no   housed  35  35 350     40          1  yes 38.851971
## 219      0 male  no homeless   0   0 352     41          1  yes 31.739616
## 220      0 male yes   housed  13  26 353     38          0   no 17.837486
## 221      0 male yes homeless  12  12 355     41          0   no 20.911737
## 222      0 male  no homeless   7   7 356     37          0   no 32.773659
## 223      0 male yes homeless  26  26 357     40          1  yes 23.771542
## 224      0 male yes homeless  41  56 359     41          1  yes 23.242210
## 225      0 male yes homeless   3   3 360     41          0   no 22.447948
## 226      0 male  no   housed  18  31 361     31          1  yes 58.851147
## 227      0 male  no homeless  38  55 362     43          0   no 27.218351
## 228      0 male  no   housed  12  15 363     39          0   no 18.287806
## 229      0 male  no homeless   4   4 365     40          0   no 37.835770
## 230      0 male  no   housed  32  32 366     24          1  yes 37.698196
## 231      0 male  no homeless  34 102 368     42          0   no 18.615227
## 232      0 male  no homeless  38  51 369     29         NA <NA> 47.255920
## 233      0 male  no homeless  13  13 371     31          0   no 57.873539
## 234      0 male  no   housed  49  49 376     42          1  yes 41.010502
## 235      0 male  no homeless  18  36 377     35          1  yes 39.963680
## 236      0 male yes   housed   0   0 378     37          0   no 21.599306
## 237      0 male yes homeless   2   2 380     43          1  yes 29.332056
## 238      0 male  no   housed   6  13 381     40          0   no 18.604780
## 239      0 male  no homeless   6  13 382     33          1  yes 19.291830
## 240      0 male  no   housed  10  10 383     37          0   no 31.856297
## 241      0 male  no   housed   0   0 385     36          1  yes 26.698538
## 242      0 male  no homeless   6  20 386     26          1  yes 53.340359
## 243      0 male  no homeless   6   6 387     42          1  yes 51.003738
## 244      0 male  no   housed   0   0 388     35          0   no 28.639238
## 245      0 male  no   housed  32  32 389     41          1  yes 44.215485
## 246      0 male  no   housed   3  12 392     36          0   no 57.296200
## 247      0 male  no homeless   6   6 394     42          0   no 30.918043
## 248      0 male  no homeless   0   0 395     33          0   no 24.849377
## 249      0 male  no homeless  25  25 399     38          1  yes 17.863741
## 250      0 male  no homeless  13  26 400     41          0   no 48.483433
## 251      0 male  no homeless  18  18 401     36          0   no 27.514502
## 252      0 male  no   housed   2   2 404     39          1  yes 36.029205
## 253      0 male yes homeless  26  38 405     41          1  yes 25.465322
## 254      0 male  no homeless   5  25 406     39          1  yes 38.778580
## 255      0 male yes   housed  10  23 407     25          1  yes 31.255833
## 256      0 male  no homeless   0   0 408     32          0   no 58.750145
## 257      0 male  no   housed   4   4 409     39          0   no 32.313843
## 258      0 male  no   housed  29  85 411     31          1  yes 40.056877
## 259      0 male  no homeless  20  20 413     40          1  yes 37.504734
## 260      0 male  no   housed   3  12 415     29          0   no 18.340139
## 261      0 male yes homeless   6  12 416     41          0   no 14.108759
## 262      0 male  no homeless  13  13 418      9          0   no 59.930012
## 263      0 male  no homeless  36  36 419     39          0   no 26.474701
## 264      0 male  no homeless  18  18 420     37          0   no 57.489437
## 265      0 male  no homeless  45  45 422     40          0   no 41.324745
## 266      0 male  no   housed  13  13 423     31          0   no 38.907230
## 267      0 male  no homeless   4  10 424     42          0   no 22.673281
## 268      0 male  no homeless   6  26 425     42          0   no 30.106504
## 269      0 male  no   housed   6   6 428     15          0   no 38.276970
## 270      0 male  no   housed  25  42 430     37          0   no 45.859604
## 271      0 male  no homeless  13  13 432     35          1  yes 25.544411
## 272      0 male  no   housed  37  37 433     30          0   no 22.730097
## 273      0 male  no   housed  25  25 435     44          0   no 25.445648
## 274      0 male yes homeless  38  38 436     32          1  yes 46.967522
## 275      0 male  no   housed  12  29 437     32          0   no 47.133209
## 276      0 male  no   housed   6  24 438     38          1  yes 42.632927
## 277      0 male  no homeless   6   6 440     34          0   no 54.851093
## 278      0 male  no   housed   0   0 441     44          0   no 15.101494
## 279      0 male yes homeless   8   8 443     40          1  yes 19.116766
## 280      0 male  no   housed  32  32 444     41          0   no 51.843193
## 281      0 male  no homeless  51  51 447     30          0   no 32.484653
## 282      0 male  no   housed  35  35 448     42          1  yes 43.498222
## 283      0 male  no homeless  73  73 449     36          1  yes 18.795931
## 284      0 male yes homeless   9  31 452     45          0   no 18.525930
## 285      0 male yes homeless  51  51 457     44          1  yes 25.738285
## 286      0 male  no   housed   6   8 458     28          0   no 14.891697
## 287      0 male  no   housed   6  16 459     32          1  yes 41.360710
## 288      0 male  no homeless   2   3 464     44         NA <NA> 17.082233
## 289      0 male  no   housed   1   1 467     31          0   no 43.441059
## 290      0 male  no homeless  49 109 468     42          0   no 27.801510
## 291      0 male  no   housed  19  25 469     35          1  yes 42.457150
## 292      0 male  no   housed  38  51  13     45          0   no 18.750151
## 293      0 male  no homeless  26  40  26     45          1  yes 28.556833
## 294      0 male  no homeless  83 145  29     42          0   no 28.602417
## 295      0 male yes   housed  32  40  48     43          0   no 15.268264
## 296      0 male  no   housed  30 101  64     41          0   no 40.633827
## 297      0 male  no   housed  42  42 130     31          0   no 46.269627
## 298      0 male  no   housed  18  26 145     36          1  yes 33.659222
## 299      0 male yes homeless  35 105 146     36          0   no 21.645960
## 300      0 male  no homeless  20  20 147     41          0   no 23.724752
## 301      0 male yes   housed  26  26 159     33          0   no 15.599421
## 302      0 male  no homeless  43  54 161     43          1  yes 28.475632
## 303      0 male  no   housed   1   2 165     35          1  yes 36.594727
## 304      0 male  no   housed  51  51 175     37          0   no 15.078867
## 305      0 male yes homeless  24  48 176     44          0   no 38.950596
## 306      0 male yes   housed  13  13 184     43          1  yes 31.680859
## 307      0 male  no   housed  20  26 195     41         NA <NA> 19.096197
## 308      0 male  no   housed  26  26 197     35          0   no 48.442287
## 309      0 male  no   housed   8  18 205     36          0   no 52.697727
## 310      0 male yes   housed  61  61 207     34          0   no 19.919922
## 311      0 male  no   housed  13  19 216     33          0   no 13.312669
## 312      0 male yes homeless  28  37 218     43          0   no 15.686288
## 313      0 male  no   housed   6   7 227     32          1  yes 33.820976
## 314      0 male  no   housed  10  10 234     41          1  yes 11.499865
## 315      0 male  no   housed   0   0 244     36          1  yes 26.392733
## 316      0 male  no   housed   4  10 251     19          0   no 52.945427
## 317      0 male  no   housed  25  37 252     33          0   no 39.972664
## 318      0 male  no homeless   2   2 263     40         NA <NA> 23.446474
## 319      0 male  no homeless  26  26 266     44          1  yes 42.341843
## 320      0 male  no   housed  24  24 267     33          0   no 28.061911
## 321      0 male yes homeless   0   0 271     38          0   no 28.073883
## 322      0 male  no homeless  13  13 281     38          1  yes 37.116608
## 323      0 male  no   housed  12  12 282     31          0   no 57.800064
## 324      0 male  no   housed  12  30 301     41          0   no 12.204219
## 325      0 male  no   housed  12  18 305     38          0   no 39.038631
## 326      0 male  no homeless   3   3 312     36          0   no 37.102394
## 327      0 male  no   housed  51  69 314     29          0   no 23.898293
## 328      0 male  no homeless   5   5 321     29          0   no 46.330513
## 329      0 male  no homeless  68  68 330     42          1  yes 13.412563
## 330      0 male  no homeless  29  29 340     43          0   no 49.503277
## 331      0 male  no   housed   5   5 373     38          0   no 33.345051
## 332      0 male  no homeless  32  32 390     41          0   no 18.530807
## 333      0 male  no   housed   0   0 393     14          0   no 54.525818
## 334      0 male  no   housed  76  78 396     10          0   no 44.171612
## 335      0 male  no homeless  26  26 397     32          0   no 47.779892
## 336      0 male  no homeless  41  62 398     39          0   no 21.271496
## 337      0 male  no homeless  18  18 410     43          1  yes 39.929405
## 338      0 male  no homeless  22  30 412     31         NA <NA> 25.632202
## 339      0 male  no   housed  53  63 417     43          1  yes 23.716438
## 340      0 male  no homeless   4  13 434     34          0   no 52.792542
## 341      0 male  no   housed   3   3 439     30          0   no 28.609346
## 342      0 male  no homeless   0   0 453     36          0   no 25.851772
## 343      0 male  no   housed   0   0 454     38          0   no 41.943066
## 344      0 male  no   housed  13  20 455     39          1  yes 62.175503
## 345      0 male  no homeless  13  13 462     26          0   no 54.424816
## 346      0 male  no homeless  51  51 463     43          0   no 30.212227
##          pcs pss_fr  racegrp satreat sexrisk substance treat avg_drinks
## 1   58.41369      0    black      no       4   cocaine   yes         13
## 2   36.03694      1    white      no       7   alcohol   yes         56
## 3   74.80633     13    black      no       2    heroin    no          0
## 4   37.34558     10    black      no       6   cocaine    no         10
## 5   65.13801      4    white     yes       6   alcohol   yes         12
## 6   22.61060      0    white     yes       0    heroin   yes         20
## 7   39.24264     13    white     yes       1   alcohol    no         20
## 8   31.82965      1    black      no       4   cocaine   yes         13
## 9   55.16998      1    white      no       8   alcohol   yes         51
## 10  56.43837      9    black      no       4    heroin    no          0
## 11  58.29807      1    other      no       4   cocaine    no          1
## 12  34.33926     11    black     yes       7   cocaine    no         23
## 13  36.58481      8    black      no       4    heroin   yes         26
## 14  61.64098     14    black      no       4   cocaine   yes          0
## 15  46.22176     10    white      no       6   alcohol    no         34
## 16  51.56324      6    black     yes       9   cocaine    no          4
## 17  63.06006      3    white      no       5   cocaine   yes          3
## 18  61.82597      6    black      no       4    heroin   yes          7
## 19  43.53374      4    white      no       5   alcohol    no         24
## 20  42.22490      5    black     yes       2   cocaine   yes          0
## 21  42.40059      3    black      no       6   alcohol    no         20
## 22  50.14700     12    black      no       0   alcohol    no          3
## 23  63.52000      2    black     yes       5   cocaine   yes          6
## 24  53.35109     10    black      no       2   cocaine   yes          0
## 25  56.73985     10    black      no       0   cocaine   yes          0
## 26  53.23396      8    black      no       3   alcohol   yes         32
## 27  62.04113      6    white      no       4   alcohol    no          2
## 28  38.39529     11    black      no       4    heroin    no          3
## 29  51.30330      1    white     yes       0   alcohol    no         27
## 30  56.68389     10 hispanic      no       4    heroin    no          3
## 31  71.39259      3    white      no       7   cocaine   yes         24
## 32  52.61977      4    black      no       7   cocaine   yes          6
## 33  36.04588      7    black      no       6   cocaine    no          0
## 34  50.06427     14    white      no       4    heroin   yes         13
## 35  62.03183     10    black      no       5   alcohol    no         25
## 36  56.38669     12    black      no       1   alcohol    no         13
## 37  40.38438     11    black      no      10    heroin   yes         15
## 38  58.16169      6    black      no       6   cocaine    no          7
## 39  45.48341      6    other      no       9    heroin   yes          9
## 40  63.05807      2    black      no       7   alcohol   yes          5
## 41  29.45625      8    white      no       3   alcohol    no         34
## 42  40.46056     13    other      no       4   cocaine   yes          3
## 43  57.20213      1    white     yes       2   alcohol    no         37
## 44  47.60514     10    black      no       4   alcohol   yes         36
## 45  50.16318      7    black     yes       6    heroin    no         13
## 46  50.39870      6    black     yes       7   cocaine    no          3
## 47  52.68871     12    black     yes       6   cocaine   yes         32
## 48  67.53625     11    black     yes       4   alcohol    no         35
## 49  36.98058      5    white      no       6   alcohol    no         20
## 50  64.29022      5    black     yes       9   alcohol    no          7
## 51  53.15686      8    white     yes       2    heroin   yes          0
## 52  45.18499      3    white     yes       6   alcohol   yes         26
## 53  27.36663      4    black      no       5   cocaine   yes         18
## 54  58.25895      5    white      no       8   cocaine    no          6
## 55  63.91367     10    white      no      12   cocaine    no         13
## 56  56.90441      2    black      no       4   cocaine   yes          5
## 57  48.79136      4    black      no       9    heroin    no          2
## 58  52.59380      9    black     yes       6   cocaine    no        102
## 59  68.12395      7    white     yes       6   alcohol   yes          0
## 60  57.07777      1 hispanic      no       3    heroin    no         21
## 61  42.62894     12    white      no      11    heroin   yes          6
## 62  59.56708      1    black      no       7   cocaine   yes          1
## 63  44.16995     10    black      no       4   alcohol    no         19
## 64  58.21477     13    black      no       4   alcohol   yes          1
## 65  59.82454      9    other      no       5   alcohol   yes          0
## 66  38.46090      8    white     yes       2   alcohol    no         26
## 67  37.18519      3    black      no       1    heroin   yes          0
## 68  56.50476     12    white      no       1    heroin   yes          9
## 69  57.33492     14    black      no       4   cocaine    no         10
## 70  28.85472      4 hispanic      no       5    heroin   yes          4
## 71  60.58733     10    black      no       5   cocaine    no          6
## 72  30.05406      9    white      no       3   alcohol    no         26
## 73  55.98083     10    black     yes       4   cocaine   yes         26
## 74  48.05733      9    black      no      10   cocaine   yes          2
## 75  30.23405      6    black      no       3   cocaine   yes         61
## 76  65.74425      5    black      no       2   cocaine    no          2
## 77  55.50122      9    black     yes       7   cocaine    no         19
## 78  63.61380      1    white     yes       4    heroin    no          0
## 79  46.41464      4    white      no       6   alcohol   yes         18
## 80  54.13217      8    white      no       0   alcohol   yes         51
## 81  46.75063      4    white     yes       5    heroin    no          0
## 82  40.18310      9    black      no       6   alcohol    no         36
## 83  63.48228      6    black      no       3   alcohol    no         31
## 84  43.62142      5    white      no       4    heroin   yes          0
## 85  57.95000      7    black      no       4   alcohol   yes         26
## 86  54.44389      2    black      no       5   cocaine    no          2
## 87  48.89978      4    white     yes      13   alcohol    no         51
## 88  43.00148      3    black      no       9   cocaine   yes         19
## 89  59.74108      6    white     yes       4   alcohol    no         13
## 90  48.89844     12    white     yes       4    heroin   yes          0
## 91  40.88239      3    other      no       9   cocaine    no         13
## 92  28.93009      7    black      no       3   alcohol   yes         22
## 93  60.92048      4    black      no       7   cocaine   yes         13
## 94  35.39379      5    white     yes       3   alcohol    no         19
## 95  57.32318      2    white      no       5   alcohol    no         26
## 96  45.54259      8    black     yes       7   cocaine   yes          3
## 97  59.10600      9    black      no       7   cocaine   yes         24
## 98  42.01285     11    black      no       4    heroin   yes          0
## 99  45.01618      7    white      no       0   alcohol    no         53
## 100 48.87681      1    black      no       7   alcohol    no         25
## 101 38.13606      5    black      no       6   alcohol   yes         64
## 102 29.47202     12    black      no       4    heroin    no          4
## 103 50.21533      9    white     yes       5    heroin   yes          3
## 104 36.24839      8    white      no       7   alcohol    no         13
## 105 45.56750      7    white      no       5   alcohol    no         20
## 106 59.47648      7    other      no       5   cocaine   yes         38
## 107 31.97959      6    black     yes       6    heroin    no          8
## 108 53.66537      3    black     yes       6   cocaine   yes          0
## 109 29.78529      3    black      no       2   cocaine    no         13
## 110 24.43518      8    white      no       5   alcohol   yes         39
## 111 53.75950     10    black      no       7   cocaine    no         12
## 112 50.23810     11    white      no       4    heroin   yes          0
## 113 56.00507     11    other     yes       3    heroin    no          1
## 114 44.92406      9    white     yes       1   alcohol    no         19
## 115 61.78611      2    white      no       6    heroin    no          0
## 116 54.35444      8    white      no       4   alcohol   yes         26
## 117 55.74972      8    black      no       7   alcohol   yes         19
## 118 61.44474      7    black      no       8   cocaine    no          3
## 119 56.84005      3    black     yes       9   cocaine   yes          1
## 120 36.56960      5    white     yes       5   alcohol    no         12
## 121 32.28706      2    white      no       8   alcohol    no         38
## 122 52.94168      9    black      no       0    heroin    no          4
## 123 31.00380      7    white      no       0   alcohol   yes         19
## 124 26.45694     11 hispanic      no       3   alcohol   yes         41
## 125 40.46645      2    other      no       0    heroin    no          1
## 126 45.46138      1    white      no       6    heroin   yes         19
## 127 54.07817      7    black      no       2   cocaine   yes          8
## 128 53.61504     10    black      no       4   cocaine    no         12
## 129 57.70763     14    white      no       0   alcohol   yes         12
## 130 53.68318     11    white      no       1    heroin   yes          1
## 131 54.42475      3 hispanic      no       7   cocaine    no         10
## 132 61.28633      4    black      no       8   cocaine   yes          3
## 133 66.59317     10    white      no       4   alcohol   yes          6
## 134 49.27981      2    black     yes       7   alcohol   yes        102
## 135 58.16642     10    black      no       7   cocaine    no          1
## 136 31.52861      2    other     yes       5    heroin   yes          0
## 137 49.36320     12    black      no       8    heroin    no         58
## 138 43.17081      2    white      no       4    heroin   yes          9
## 139 57.81969      2    black      no       6   cocaine   yes         35
## 140 54.82264      5    black      no       8   alcohol   yes         33
## 141 60.41816     13    black     yes       3   alcohol    no         19
## 142 54.36913      6    white      no       6   cocaine   yes          0
## 143 35.79145     10    black      no       3   cocaine   yes          6
## 144 59.32288      6 hispanic      no       6   alcohol    no         18
## 145 23.50237      5    black      no       3   alcohol   yes          0
## 146 56.30513     11    black      no       3   alcohol    no         46
## 147 41.06454      4    white     yes       2    heroin   yes         27
## 148 58.16510     14    black      no       2   cocaine    no          3
## 149 41.57280      7    white      no       4    heroin    no         12
## 150 54.59662      1    white      no       0   alcohol    no         26
## 151 47.36353      2    black     yes       7   alcohol   yes         23
## 152 56.89963      0    black     yes       5   alcohol    no         13
## 153 48.48331      0    white     yes       5   alcohol   yes         26
## 154 56.39499      3    black      no       2   cocaine   yes         13
## 155 47.35083      1    other      no       5   cocaine   yes         13
## 156 29.11139      5    black      no       4   alcohol    no         23
## 157 34.58012     12    white      no       4   alcohol   yes         15
## 158 48.27899      6    black      no       4   cocaine   yes         19
## 159 63.25544     14    black      no       8   cocaine    no          2
## 160 46.76894      3    white      no       5    heroin    no         13
## 161 70.14779      5    white      no       5   cocaine   yes         14
## 162 50.25876      1    white     yes       6   alcohol    no         51
## 163 29.49112      3    black      no       7   cocaine   yes         10
## 164 42.42209     10    white      no       0   alcohol    no         16
## 165 25.61815      7    white      no       1   alcohol   yes        102
## 166 53.05504      6    black      no       5   cocaine   yes          6
## 167 43.00587      6    black      no      11   cocaine    no         27
## 168 41.78789      1    black      no       7   cocaine   yes         27
## 169 47.65467     11    white     yes       1   alcohol    no         54
## 170 44.01194     13 hispanic      no       4    heroin   yes         24
## 171 43.68238      2    white     yes       1   alcohol    no         30
## 172 43.55378      9    black      no       4   alcohol   yes         43
## 173 49.47607      3    black      no       8   cocaine    no          2
## 174 41.87122      4    black      no       0    heroin    no         16
## 175 60.10658     12    black      no       4   cocaine   yes          3
## 176 35.46683     12    black      no       5   cocaine    no         34
## 177 62.44834      1    black      no       1   alcohol   yes         28
## 178 42.32603      3    black      no       6   cocaine    no         13
## 179 57.78556     13    black     yes       7   cocaine   yes         51
## 180 32.58783     11    black     yes      13   alcohol   yes        134
## 181 53.04678      6    black     yes       2   alcohol    no          5
## 182 46.33508      9    white      no       5    heroin   yes          5
## 183 48.62301      3    white      no       4   alcohol    no          3
## 184 61.60262      6    black     yes       4   cocaine    no          0
## 185 51.38743     11 hispanic      no       3   cocaine    no         26
## 186 33.79744     12    white      no       1   alcohol   yes         15
## 187 51.40308      4    white      no       4    heroin   yes          9
## 188 44.29502      5    white     yes       0   alcohol    no         10
## 189 46.76040     10 hispanic      no       3    heroin    no          0
## 190 49.87759      6    other      no       2   alcohol   yes         24
## 191 59.28272      1    white     yes       0   alcohol   yes         33
## 192 36.63770      4 hispanic      no       5    heroin    no          0
## 193 62.81930      2 hispanic     yes       4    heroin   yes          0
## 194 46.98674      5 hispanic      no       3    heroin   yes          3
## 195 56.95329     14    white      no       4   alcohol   yes         14
## 196 44.11552      3    white     yes       2    heroin    no         12
## 197 46.56849      5    black      no       4   cocaine   yes          0
## 198 40.11784     11 hispanic      no       3    heroin   yes          0
## 199 30.27282      9    other      no       4   alcohol    no         25
## 200 51.74989     11 hispanic      no       0   alcohol    no         42
## 201 58.89470     11    white     yes       6    heroin   yes          6
## 202 36.10949     12    black      no       5   cocaine    no         19
## 203 35.89378      4 hispanic      no       4    heroin    no          0
## 204 71.62856      6    black      no       6   cocaine    no         22
## 205 59.34274     12    black     yes       6   alcohol   yes         19
## 206 48.61113      3    black     yes       5   alcohol    no         13
## 207 54.08614      5    black      no       8   cocaine   yes          1
## 208 51.16233     12    black     yes      11   cocaine    no         13
## 209 51.24271     10    white      no       2    heroin   yes         20
## 210 33.03571     10    black      no       2    heroin   yes          0
## 211 33.92213      2    black     yes       7   cocaine   yes          3
## 212 25.92422      5    white      no       8   alcohol    no        142
## 213 31.76573      1    white     yes       0    heroin    no         64
## 214 41.32350      7    other      no       3    heroin    no          2
## 215 51.08913     10    white      no       6   alcohol   yes         51
## 216 51.01180     11    black     yes       6   cocaine   yes          1
## 217 40.42006      7    black      no       9   alcohol    no         24
## 218 45.13578     12 hispanic      no       1   alcohol    no         35
## 219 31.52352      0    white      no       4    heroin   yes          0
## 220 54.94331      5    black      no       1   cocaine    no         13
## 221 44.87310      2 hispanic      no       8    heroin   yes         12
## 222 63.90699      2 hispanic      no       0   alcohol    no          7
## 223 47.50178      5    white      no       7   alcohol   yes         26
## 224 30.34914      9    white     yes       6   alcohol   yes         41
## 225 45.32498      2 hispanic     yes       7   alcohol    no          3
## 226 58.71478      4    black      no       5   alcohol   yes         18
## 227 34.31445      0    white      no       9   alcohol    no         38
## 228 43.60749      2    black     yes       4   alcohol    no         12
## 229 32.12609      4    black     yes       0    heroin    no          4
## 230 52.02918     11 hispanic      no       4   alcohol   yes         32
## 231 58.15246      5    white      no       4   alcohol    no         34
## 232 46.52069      6    black      no       7   cocaine   yes         38
## 233 57.59651     14    black      no       8   cocaine   yes         13
## 234 62.97789      5    black     yes      11   cocaine   yes         49
## 235 37.80672      0    white      no       3   alcohol    no         18
## 236 36.64597      1 hispanic     yes       3    heroin   yes          0
## 237 25.43683      5 hispanic      no       4    heroin   yes          2
## 238 66.09068      4    black      no       5   cocaine   yes          6
## 239 59.91458      6    black      no       5   cocaine   yes          6
## 240 64.18298      7    black     yes       6   cocaine    no         10
## 241 43.39342      3 hispanic      no       1    heroin   yes          0
## 242 57.65739     12    white      no       0   alcohol    no          6
## 243 51.70669      2 hispanic     yes       2   cocaine   yes          6
## 244 48.98777     10    white      no       2    heroin    no          0
## 245 54.15862      9    black      no       6   alcohol   yes         32
## 246 59.14530      5 hispanic     yes       4   alcohol    no          3
## 247 63.34270     11    black     yes       5   cocaine   yes          6
## 248 51.15330      1 hispanic     yes       4   cocaine   yes          0
## 249 38.19618      6    white      no       0   alcohol   yes         25
## 250 57.44889      5    white      no       5   alcohol    no         13
## 251 64.07393      3    black     yes       5   alcohol    no         18
## 252 61.19665      1    black      no       3   cocaine   yes          2
## 253 65.26759      5    black      no       9   cocaine    no         26
## 254 41.73849     10    black     yes       8   cocaine   yes          5
## 255 56.56525      7    black      no       6   cocaine   yes         10
## 256 53.01821     12    black      no       8   cocaine    no          0
## 257 57.04919     14    black      no       4    heroin    no          4
## 258 57.73149     11    black      no       5   cocaine   yes         29
## 259 54.06671      3    other      no       2   alcohol    no         20
## 260 43.89911     12    black      no       2   cocaine   yes          3
## 261 48.81484      5    white     yes       5    heroin    no          6
## 262 58.22468      3    black      no       7   cocaine    no         13
## 263 48.76114     12    white     yes       3    heroin   yes         36
## 264 37.74971      8    white     yes       3    heroin    no         18
## 265 36.81136      3 hispanic      no       2    heroin    no         45
## 266 49.43321     11    black     yes       2    heroin   yes         13
## 267 45.18067      4    white      no       0    heroin    no          4
## 268 36.35557      5    white     yes       4   alcohol    no          6
## 269 36.49366      5    black      no       3    heroin    no          6
## 270 14.07429      8    white      no       4   alcohol    no         25
## 271 42.86974     12    other      no       4    heroin   yes         13
## 272 56.85568     11 hispanic      no       2    heroin   yes         37
## 273 44.17665      8    black     yes       5   alcohol    no         25
## 274 58.74847      4    white     yes       1   alcohol   yes         38
## 275 51.92163      8 hispanic      no       3   alcohol    no         12
## 276 56.86680      6    black      no       4   cocaine   yes          6
## 277 50.26602      3    black      no       5    heroin   yes          6
## 278 48.11589      0    white     yes       5    heroin    no          0
## 279 45.58474      4    white     yes       1    heroin   yes          8
## 280 59.72128      5    white      no       5   alcohol    no         32
## 281 44.22039      4    white     yes       3   alcohol    no         51
## 282 20.74029      3    black      no       4   alcohol   yes         35
## 283 54.93296      4 hispanic     yes       2   alcohol    no         73
## 284 47.58062      3    white      no       4    heroin   yes          9
## 285 34.90893      2    white      no       5   alcohol   yes         51
## 286 60.11456      2    black      no       7    heroin    no          6
## 287 44.59728      7    white      no       3   alcohol   yes          6
## 288 47.00855     14    white      no       5    heroin   yes          2
## 289 59.99293     14    black     yes       5   cocaine    no          1
## 290 51.69448      1    black      no       5   cocaine    no         49
## 291 53.54025     11 hispanic      no       4    heroin   yes         19
## 292 46.04046      5    white      no       2   alcohol    no         38
## 293 53.17226     14    white      no       3   alcohol   yes         26
## 294 47.83191      6    white      no       5   alcohol    no         83
## 295 40.83885      7    white      no       7    heroin    no         32
## 296 58.78673      4 hispanic     yes      12   cocaine   yes         30
## 297 36.50988     14    black      no       3   alcohol    no         42
## 298 45.00826      8    other      no       7   cocaine   yes         18
## 299 41.52777      8    white      no       4   alcohol    no         35
## 300 32.87765      7 hispanic      no       0   alcohol   yes         20
## 301 47.65695      4 hispanic     yes       2    heroin   yes         26
## 302 45.82243      7    white      no       6   alcohol   yes         43
## 303 59.08202      8    white      no       5   cocaine   yes          1
## 304 41.00370      3    white     yes       4   alcohol   yes         51
## 305 59.73408     11    white      no       1   alcohol    no         24
## 306 60.97185      1    white      no      10    heroin   yes         13
## 307 59.91701      3    white     yes       1   alcohol    no         20
## 308 58.50863      7    black     yes       3   alcohol    no         26
## 309 58.58452     11    white      no       3   alcohol    no          8
## 310 64.95238      5    white      no       8   alcohol    no         61
## 311 49.44656      8    black      no       3    heroin    no         13
## 312 58.84382      1 hispanic      no       8   alcohol   yes         28
## 313 27.27006      4    other      no       5    heroin   yes          6
## 314 66.23132      4    other      no       3   alcohol   yes         10
## 315 32.35484      7 hispanic      no       8    heroin   yes          0
## 316 58.86002     11    white     yes       5   cocaine   yes          4
## 317 56.95388      3    black     yes       2   alcohol    no         25
## 318 40.40644      8 hispanic      no       9    heroin    no          2
## 319 61.74688      3    white      no       0   alcohol   yes         26
## 320 53.93607     12    black      no       4   cocaine    no         24
## 321 63.86327      9 hispanic      no       3   cocaine   yes          0
## 322 35.98627      9    black     yes       8   cocaine   yes         13
## 323 49.21747     12    black      no       4   cocaine    no         12
## 324 51.45133     11    white      no       2   cocaine    no         12
## 325 47.92621      0    black      no       7   cocaine    no         12
## 326 51.63569      2    black      no       3   cocaine    no          3
## 327 23.55043      9    white     yes       2   alcohol    no         51
## 328 59.16547      8    black      no       3   cocaine    no          5
## 329 42.08535      7    white      no       3   cocaine    no         68
## 330 51.01598      3    white      no       5   alcohol   yes         29
## 331 46.42344      1    white      no       6    heroin   yes          5
## 332 52.71838      1    white     yes       7    heroin    no         32
## 333 59.42862      5 hispanic      no       4    heroin    no          0
## 334 38.49107      1    white      no       4   alcohol    no         76
## 335 52.73988     10    white      no       0   alcohol    no         26
## 336 45.72916      2    white      no       8   alcohol   yes         41
## 337 61.97865      3    white     yes       4    heroin   yes         18
## 338 60.46511     14    white      no       3   alcohol    no         22
## 339 38.24600      7 hispanic      no       1    heroin   yes         53
## 340 57.12674     11    other      no       2   alcohol    no          4
## 341 52.02338      6    black     yes       4    heroin    no          3
## 342 50.60834      5    white     yes       4    heroin    no          0
## 343 56.96868      7    white      no       4    heroin   yes          0
## 344 57.25384     11    white      no       0   alcohol   yes         13
## 345 53.73204      7    black     yes       9   cocaine    no         13
## 346 43.47607     11    white      no       4   alcohol    no         51
##     max_drinks
## 1           26
## 2           62
## 3            0
## 4           13
## 5           24
## 6           27
## 7           31
## 8           20
## 9           51
## 10           0
## 11           1
## 12          23
## 13          26
## 14           0
## 15          34
## 16           5
## 17           3
## 18           7
## 19          48
## 20           0
## 21          20
## 22           3
## 23           6
## 24           0
## 25           0
## 26         135
## 27          24
## 28           3
## 29          27
## 30           7
## 31          36
## 32          12
## 33           0
## 34          13
## 35          28
## 36          61
## 37          26
## 38           7
## 39          15
## 40          13
## 41          34
## 42           6
## 43          43
## 44          36
## 45          15
## 46          19
## 47          32
## 48          42
## 49          20
## 50          25
## 51           0
## 52          51
## 53          36
## 54          12
## 55          17
## 56           5
## 57           2
## 58         102
## 59           0
## 60          21
## 61           8
## 62           1
## 63          19
## 64          22
## 65           0
## 66          47
## 67           0
## 68          19
## 69          10
## 70           5
## 71          15
## 72          51
## 73          26
## 74           3
## 75         184
## 76           2
## 77          19
## 78           0
## 79          47
## 80          51
## 81           0
## 82          66
## 83          91
## 84           0
## 85          69
## 86          20
## 87          51
## 88          26
## 89          13
## 90           0
## 91          13
## 92          22
## 93          33
## 94          30
## 95          26
## 96           3
## 97          24
## 98           0
## 99          53
## 100         25
## 101        179
## 102          4
## 103          6
## 104         13
## 105         51
## 106         38
## 107          8
## 108          0
## 109         13
## 110         39
## 111         20
## 112          0
## 113          1
## 114         32
## 115          0
## 116         51
## 117         19
## 118          6
## 119          1
## 120         17
## 121         38
## 122          4
## 123         50
## 124         54
## 125          3
## 126         19
## 127          8
## 128         12
## 129         20
## 130          3
## 131         13
## 132         24
## 133         12
## 134        102
## 135          4
## 136          0
## 137         58
## 138          9
## 139         65
## 140         51
## 141         19
## 142          0
## 143          6
## 144         18
## 145          0
## 146         46
## 147         30
## 148          3
## 149         12
## 150         26
## 151         92
## 152         13
## 153         26
## 154         13
## 155         13
## 156         42
## 157         15
## 158         20
## 159          3
## 160         26
## 161         16
## 162         51
## 163         26
## 164         16
## 165        102
## 166         20
## 167         27
## 168         41
## 169         73
## 170         36
## 171         41
## 172         43
## 173          2
## 174         16
## 175          3
## 176         51
## 177         28
## 178         13
## 179         51
## 180        140
## 181          6
## 182          5
## 183          3
## 184          0
## 185         26
## 186         30
## 187         20
## 188         15
## 189          0
## 190         45
## 191         51
## 192          0
## 193          0
## 194          3
## 195         20
## 196         12
## 197          0
## 198          0
## 199         33
## 200         57
## 201          6
## 202         19
## 203          0
## 204         32
## 205         19
## 206         19
## 207          1
## 208         13
## 209         20
## 210          0
## 211          9
## 212        142
## 213         64
## 214          2
## 215         51
## 216          1
## 217         30
## 218         35
## 219          0
## 220         26
## 221         12
## 222          7
## 223         26
## 224         56
## 225          3
## 226         31
## 227         55
## 228         15
## 229          4
## 230         32
## 231        102
## 232         51
## 233         13
## 234         49
## 235         36
## 236          0
## 237          2
## 238         13
## 239         13
## 240         10
## 241          0
## 242         20
## 243          6
## 244          0
## 245         32
## 246         12
## 247          6
## 248          0
## 249         25
## 250         26
## 251         18
## 252          2
## 253         38
## 254         25
## 255         23
## 256          0
## 257          4
## 258         85
## 259         20
## 260         12
## 261         12
## 262         13
## 263         36
## 264         18
## 265         45
## 266         13
## 267         10
## 268         26
## 269          6
## 270         42
## 271         13
## 272         37
## 273         25
## 274         38
## 275         29
## 276         24
## 277          6
## 278          0
## 279          8
## 280         32
## 281         51
## 282         35
## 283         73
## 284         31
## 285         51
## 286          8
## 287         16
## 288          3
## 289          1
## 290        109
## 291         25
## 292         51
## 293         40
## 294        145
## 295         40
## 296        101
## 297         42
## 298         26
## 299        105
## 300         20
## 301         26
## 302         54
## 303          2
## 304         51
## 305         48
## 306         13
## 307         26
## 308         26
## 309         18
## 310         61
## 311         19
## 312         37
## 313          7
## 314         10
## 315          0
## 316         10
## 317         37
## 318          2
## 319         26
## 320         24
## 321          0
## 322         13
## 323         12
## 324         30
## 325         18
## 326          3
## 327         69
## 328          5
## 329         68
## 330         29
## 331          5
## 332         32
## 333          0
## 334         78
## 335         26
## 336         62
## 337         18
## 338         30
## 339         63
## 340         13
## 341          3
## 342          0
## 343          0
## 344         20
## 345         13
## 346         51

We create a stem-and-leaf plot for the variable “cesd”-scores of the females:

with(female, stem(cesd))
## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   0 | 3
##   0 | 567
##   1 | 3
##   1 | 555589999
##   2 | 123344
##   2 | 66889999
##   3 | 0000233334444
##   3 | 5556666777888899999
##   4 | 00011112222334
##   4 | 555666777889
##   5 | 011122222333444
##   5 | 67788
##   6 | 0

We can also create side-by-side histograms to compare the “cesd”-scores for females and males:

histogram(~cesd|sex, data=HELPrct)

Generating random numbers in R:

For uniformly distributed (flat) random numbers, use runif(). By default, its range is from 0 to 1. If we want to generate 1 random number between 0 and 1, then we use the code:

runif(1)
## [1] 0.1511837

If we want to generate 5 random numbers between 0 and 1, then we use the code:

runif(5)
## [1] 0.57579225 0.94263535 0.50566641 0.16260997 0.03687694

To generate a random integer between 1 and 10, we use the sample function:

x3<-sample(1:10, 1)
x3
## [1] 4

Complete Project 1 on CANVAS.

Week 6

Permutations and Combinations

Permutations

A permutation is an arrangement or ordering. For a permutation, the order matters.

Recall that:

\(n\)-factorial gives the number of permutations of \(n\) items.

\(n! = n(n - 1)(n - 2)(n - 3) ... (3)(2)(1)\)

Example 6.1:

Let’s say we have 8 people:

1: Alice

2: Bob

3: Charlie

4: David

5: Eve

6: Frank

7: George

8: Horatio

How many ways can we award a 1st, 2nd and 3rd place prize among eight contestants? (Gold / Silver / Bronze)

Fig. 6.1

Fig. 6.1

We’re going to use permutations since the order we hand out these medals matters. Here’s how it breaks down:

Gold medal:

8 choices:

A B C D E F G H

Let’s say A wins the Gold.

Silver medal:

7 choices:

B C D E F G H.

Let’s say B wins the silver.

Bronze medal:

6 choices: C D E F G H.

Let’s say… C wins the bronze.

We picked certain people to win, but the details don’t matter: we had 8 choices at first, then 7, then 6. The total number of options was 8 · 7 · 6 = 336.

Let’s look at the details. We had to order 3 people out of 8. To do this, we started with all options (8) then took them away one at a time (7, then 6) until we ran out of medals.

We know the factorial is:

\(\displaystyle{ 8! = 8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2 \cdot 1 }\)

Unfortunately, that does too much! We only want 8 · 7 · 6. How can we “stop” the factorial at 5?

This is where permutations get cool: notice how we want to get rid of 5 · 4 · 3 · 2 · 1. What’s another name for this? 5 factorial!

So, if we do 8!/5! we get:

\(\displaystyle{\frac{8!}{5!} = \frac{8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 8 \cdot 7 \cdot 6}\)

And why did we use the number 5? Because it was left over after we picked 3 medals from 8. So, a better way to write this would be:

\(\displaystyle{\frac{8!}{(8-3)!}}\)

where 8!/(8-3)! is just a fancy way of saying “Use the first 3 numbers of 8!”. If we have n items total and want to pick k in a certain order, we get:

\(\displaystyle{\frac{n!}{(n-k)!}}\)

And this is the fancy permutation formula: You have n items and want to find the number of ways k items can be ordered:

\(\displaystyle{P(n,k) = \frac{n!}{(n-k)!}}\)

Example 6.2

A license plate begins with three letters. If the possible letters are A, B, C, D and E, how many different permutations of these letters can be made if no letter is used more than once?

Solution:

Using reasoning:

For the first letter, there are 5 possible choices. After that letter is chosen, there are 4 possible choices. Finally, there are 3 possible choices.

\(5 × 4 × 3 = 60\)

Using the permutation formula:

The problem involves 5 things (A, B, C, D, E) taken 3 at a time.

\(P(5,3)= \dfrac{5!}{(5-3)!}=\dfrac{5!}{2!}=\dfrac{5*4*3*2*1}{2*1}=60\)

There are 60 different permutations for the license plate.

Example 6.3:

In how many ways can a president, a treasurer and a secretary be chosen from among 7 candidates?

Solution:

Using reasoning:

For the first position, there are 7 possible choices. After that candidate is chosen, there are 6 possible choices. Finally, there are 5 possible choices.

7 × 6 × 5 = 210

Using permutation formula:

The problem involves 7 candidates taken 3 at a time.

\(P(7,3)= \dfrac{7!}{(7-3)!}=\dfrac{7!}{4!}=\dfrac{7*6*5*4*3*2*1}{4*3*2*1}=210\)

There are 210 possible ways to choose a president, a treasurer and a secretary be chosen from among 7 candidates.

Permutations with indistinguishable items

The number of different permutations of n objects where there are \(n_1\) indistinguishable items, \(n_2\) indistinguishable items, … \(n_k\) indistinguishable items, is

\(\dfrac{n!}{(n_1!*n_2!*...*n_k!)}.\)

Example 6.4

In how many ways can the letters of the word MATHEMATICS be arranged?

Solution:

\(\dfrac{11!}{2!*2!*2!}=4,989,600\)

Since we have a total of 11 letters and 2 x M’s, 2 x T’s and 2 x A’s.

Example 6.5

In how many ways can the letters of MISSISSIPPI be arranged?

Solution:

We have a total of 11 letters, but but 4 x I’s, 4 x S’s, and 2 x P’s.

So we have

\(\dfrac{11!}{4!*4!*2!} = 34,650\)

The letters in the word MISSISSIPPI can be rearranged in 34,650 many ways.

Permutations in R:

A permutation is an ordered combination. There are basically two types of permutations, with repetition (or replacement) and without repetition (without replacement).

Permutations with repetition

The number of permutations with repetition (or with replacement) is simply calculated by:

\(n^r\),

where \(n\) is the number of things to choose from, \(r\) number of times.

Example 6.6

Suppose you have an urn with a red, blue and black ball. If you choose two balls with replacement/repetition, there are \(3^2\) permutations:

{red, red},

{red, blue},

{red, black},

{blue, red},

{blue, blue},

{blue, black},

{black, red},

{black, blue}, and

{black, black}.

In R:

Install the “gtools” package.

#load library
library(gtools)
## 
## Attaching package: 'gtools'
## The following object is masked from 'package:mosaic':
## 
##     logit
#urn with 3 balls
x <- c('red', 'blue', 'black')
#pick 2 balls from the urn with replacement
#get all permutations
permutations(n=3,r=2,v=x,repeats.allowed=T)
##       [,1]    [,2]   
##  [1,] "black" "black"
##  [2,] "black" "blue" 
##  [3,] "black" "red"  
##  [4,] "blue"  "black"
##  [5,] "blue"  "blue" 
##  [6,] "blue"  "red"  
##  [7,] "red"   "black"
##  [8,] "red"   "blue" 
##  [9,] "red"   "red"
#number of permutations
nrow(permutations(n=3,r=2,v=x,repeats.allowed=T))
## [1] 9
#[1] 9

Permutations without repetition

Calculating permutations without repetition/replacement, just means that for cases where \(r > 1\), \(n\) gets smaller after each pick. For example, if we choose two balls from the urn with the red, blue and black ball but without repetition/replacement, the first pick has 3 choices and the second pick has 2 choices:

{red, blue},

{red, black},

{blue, red},

{blue, black},

{black, red} and

{black, blue}.

In R:

#load library
library(gtools)
#urn with 3 balls
x <- c('red', 'blue', 'black')
#pick 2 balls from the urn with replacement
#get all permutations
permutations(n=3,r=2,v=x)
##      [,1]    [,2]   
## [1,] "black" "blue" 
## [2,] "black" "red"  
## [3,] "blue"  "black"
## [4,] "blue"  "red"  
## [5,] "red"   "black"
## [6,] "red"   "blue"
#     [,1]    [,2]   

#number of permutations
nrow(permutations(n=3,r=2,v=x))
## [1] 6

Combinations:

A combination is a selection of items from a collection, such that (unlike permutations) the order of selection does not matter.

For example, given three fruits, say an apple, an orange and a pear, there are three combinations of two that can be drawn from this set:

an apple and a pear; an apple and an orange; or a pear and an orange.

More formally, a \(k\)-combination of a set \(S\) is a subset of \(k\) distinct elements of \(S\).

\({\binom {n}{k}}={\frac {n!}{k!(n-k)!}}\).

Combination without repetition

Example 6.7

Five people are in a club and three are going to be in the ‘planning committee,’ to determine how many different ways this committee can be created we use our combination formula as follows:

\({\binom {5}{3}}={\frac {5!}{3!(5-3)!}} = 10\).

Example 6.8

Eleven students put their names on slips of paper inside a box. Three names are going to be taken out. How many different ways can the three names be chosen?

Solution:

\({\binom {11}{3}}={\frac {11!}{3!(11-3)!}} = 165\).

Example 6.9

Over the weekend, your family is going on vacation, and your mom is letting you bring your favorite video game console as well as five of your games. How many ways can you choose the five games if you have 12 games total?

Solution:

\({\binom {12}{5}}={\frac {12!}{5!(12-5)!}} = 792\).

Example 6.10

Suppose we have 12 adults and 10 kids as an audience of a certain show. Find the number of ways the host can select three persons from the audiences to volunteer. The choice must contain two kids and one adult.

Solution:

The order here does not matter so we have:

\(C (10, 2) * C (12, 1) = [10 * 92] * [121] = 45 * 12 = 540\).

Combinations in R:

The choose() function computes the combination \(nCr\),

where

choose(n,r)

n: n elements

r: r subset elements

Example 6.11

Choose 3 elements from a total of 6 elements:

choose(6,3)
## [1] 20

Combination with repetition

Example 6.12

Let us say there are five flavors of icecream: banana, chocolate, lemon, strawberry and vanilla.

We can have three scoops. How many variations will there be?

Let’s use letters for the flavors: {b, c, l, s, v}. Example selections include

{c, c, c} (3 scoops of chocolate)

{b, l, v} (one each of banana, lemon and vanilla)

{b, v, v} (one of banana, two of vanilla)

(And just to be clear: There are n=5 things to choose from, and we choose r=3 of them. Order does not matter, and we can repeat!)

Now, I can’t describe directly to you how to calculate this, but I can show you a special technique that lets you work it out.

Think about the ice cream being in boxes, we could say “move past the first box, then take 3 scoops, then move along 3 more boxes to the end” and we will have 3 scoops of chocolate!

So it is like we are ordering a robot to get our ice cream, but it doesn’t change anything, we still get what we want.

We can write this down as:

$####Spinner

https://illuminations.nctm.org/adjustablespinner/

Week 7

Expected value

Definition

The expected value of a random variable \(X\) is the sum of the values of the random variable with each value multiplied by its probability of occurrence.

Example 1:

If grades of five students are 65, 76, 88, 34, and 90, then find expected value of mark for a random student.

Solution:

As discrete values are given, the expected value is the mean of all the values given.

\(E(X)=\frac{65 + 76 + 88 + 34 + 90}{5}=155\)

Week 8:

Take the Midterm examination.

Week 9:

Independent events:
Definition:

Independent Events:

When two events are said to be independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin. The probability of getting any number face on the die in no way influences the probability of getting a head or a tail on the coin.

Dependent Events:

When two events are said to be dependent, the probability of one event occurring influences the likelihood of the other event.

Two events \(A\) and \(B\) are if the occurrence of one affects the occurrence of the other. The probability that \(B\) will occur given that \(A\) has occurred is called the conditional probability of \(B\) given \(A\) and is written \(P(B|A)\).

If \(A\) and \(B\) are dependent events, then the probability that both \(A\) and \(B\) occur is

\(P(A \cap B) = P(A)\cdot P(B|A)\)

Example 9.1

We can calculate the chances of two or more independent events by multiplying the chances.

What is the probability of getting 3 Heads in a Row when tossing a coin?

Solution:

For each toss of a coin a “Head” has a probability of 0.5, and so the probability of getting 3 heads in a row is:

\(\dfrac{1}{2}\cdot\dfrac{1}{2}\cdot\dfrac{1}{2}=\dfrac{1}{8}.\)

Example 9.2

You are playing a game that involves spinning the money wheel shown. During your turn you get to spin the wheel twice. What is the probability that you get more than $500 on your first spin and then go bankrupt on your second spin?

Moneywheel

Moneywheel

Solution:

Let event \(A\) be getting more than $500 on the first spin, and let event \(B\) be going bankrupt on the second spin. The two events are independent. So, the probability is:

\(P(A \cap B) = = P(A) • P(B) = \frac{8}{24} \cdot \frac{2}{24} = \frac{1}{36} \approx 0.028\)

Example 9.3

During the 1997 baseball season, the Florida Marlins won 5 out of 7 home games and 3 out of 7 away games against the San Francisco Giants. During the 1997 National League Division Series with the Giants, the Marlins played the first two games at home and the third game away. The Marlins won all three games. Estimate the probability of this happening.

Solution

Let events \(A\), \(B\), and \(C\) be winning the first, second, and third games. The three events are independent and have experimental probabilities based on the regular season games. So, the probability of winning the first three games is: \(P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C) = \frac{5}{7}\cdot \frac{5}{7}\cdot\frac{3}{7} = \frac{75}{343} \approx 0.219\)

Example 9.4

A computer chip manufacturer has found that only 1 out of 1000 of its chips is defective. You are ordering a shipment of chips for the computer store where you work. How many chips can you order before the probability that at least one chip is defective reaches 50%?

Solution:

Let \(n\) be the number of chips you order. From the given information you know that P(chip is not defective) \(=\frac{999}{1000}=0.999.\) Use this probability and the fact that each chip ordered represents an independent event to find the value of \(n\).

P(at least one chip is defective) = 0.5

1 - P(no chips are defective) = 0.5

\(1 - (0.999)^n = 0.5\)

\(-(0.999)^n = -0.5\)

\((0.999)^n = 0.5\)

\(n = \dfrac{log(0.5)}{log(0.999)}\)

\(n \approx 693\)

If you order 693 chips, you have a 50% chance of getting a defective chip. Therefore, you can order 692 chips before the probability that at least one chip is defective reaches 50%.

Example 9.5

The table shows the number of endangered and threatened animal species in the United States as of November 30, 1998.

Find

  1. the probability that a listed animal is a reptile and

  2. the probability that an endangered animal is a reptile.

Table 9.5

Table 9.5

Solution:
  1. P(reptile) \(=\dfrac{number\ of\ reptiles }{total\ number \ of \ animals } = \frac{35}{475} \approx 0.0737\)

  2. P(reptile | endangered) \(=\dfrac{number\ of\ endangered \ reptiles }{total\ number \ of \ endangered \ animals } = \frac{14}{355} \approx 0.0394\).

Example 9.6

You randomly select two cards from a standard 52-card deck. What is the probability that the first card is not a face card (a king, queen, or jack) and the second card is a face card if

  1. you replace the first card before selecting the second, and

  2. you do not replace the first card?

Solution:
  1. If you replace the first card before selecting the second card, then \(A\) and \(B\) are independent events. So, the probability is:

\(P(A \cap B) = P(A) \cdot P(B) = \frac{40}{52} \cdot \frac{12}{52} = \frac{30}{169} \approx 0.178\).

  1. If you do not replace the first card before selecting the second card, then \(A\) and \(B\) are dependent events. So, the probability is:

\(P(A \cap B) = P(A) \cdot P(B|A) = \frac{40}{52} \cdot \frac{12}{51} = \frac{40}{221} \approx 0.181\).

Project 2:

We use the “mosaic”-package for this project. Make sure to call the package:

require(mosaic)

The favstats() function can provide more statistics by group.

favstats(cesd~sex, data=HELPrct)
##      sex min Q1 median   Q3 max     mean       sd   n missing
## 1 female   3 29   38.0 46.5  60 36.88785 13.01764 107       0
## 2   male   1 24   32.5 40.0  58 31.59827 12.10332 346       0

Boxplots are particularly helpful to compare distributions. The bwplot() function can be used to display the boxplots for the CESD scores separately by sex.

bwplot(sex~cesd, data=HELPrct)

It is clear from the box-and-whiskers plots that females have a higher “cesd”-score.

Exercise:
  1. Use the “mosaic”-package and the “HELPrct”-dataset to find the statistics of the “cesd”-score by race group.
  2. Use the bwplot() function to display the boxplots for the CESD scores separately by race.

Week 10:

Bayes’ theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates.

Given a hypothesis \(H\) and evidence \(E\) , Bayes’ theorem states that the relationship between the probability of the hypothesis \(P(H)\) before getting the evidence and the probability of the hypothesis after getting the evidence \(P(H|E)\) is:

\(P(H|E)=\dfrac{P(E|H)}{P(E)}*P(H)\)

Many modern machine learning techniques rely on Bayes’ theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating p-values or interpreting medical results, are best described in terms of how they contribute to updating hypotheses using Bayes’ theorem.

The formula relates the probability of the hypothesis before getting the evidence \(P(H)\) to the probability of the hypothesis after getting the evidence, \(P(H|E)\). For this reason, \(P(H)\) is called the prior probability, while \(P(H|E)\) is called the posterior probability. The factor that relates the two, \(\dfrac{P(E|H)}{P(E)}\) is called the likelihood ratio. Using these terms, Bayes’ theorem can be rephrased as “the posterior probability equals the prior probability times the likelihood ratio.”

Example 1:

If a single card is drawn from a standard deck of playing cards, the probability that the card is a king is 4/52, since there are 4 kings in a standard deck of 52 cards. Rewording this, if \(KING\) is the event “this card is a king,” the prior probability

\(P(KING)= \frac{4}{52}= \frac{1}{13}\).

If evidence is provided (for instance, someone looks at the card) that the single card is a face card, then the posterior probability \(P(KING | FACE)\) can be calculated using Bayes’ theorem:

\(P(KING | FACE)= \dfrac{P(FACE | KING)}{P(FACE)}*P(KING)\)

Since every \(KING\) is also a face card,

\(P(FACE | KING)= 1\)

Since there are 3 face cards in each suit \((JACK, QUEEN, KING)\), the probability of a face card is:

\(P(FACE)=\frac{3}{13}\)

Using Bayes’ theorem gives

\(P(KING | FACE)= \dfrac{P(FACE | KING)}{P(FACE)}*P(KING) = \frac{1}{\frac{3}{13}}*\frac{1}{13}= \frac{1}{3}\)

Probability Distributions.

Definition:

A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.

Binomial distribution.

“Bi” means “two” (like a bicycle has two wheels), so this is about experiments with two possible outcomes. In other words, the Binomial distribution is a discrete probability distribution with parameters \(n\) and \(p\), where \(n\) is representing the number of trials attempted and \(p\) represents the probability of success.

The binomial distribution is frequently used to model the number of successes in a sample of size \(n\) drawn with replacement from a population of size \(N\). If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one.

In general, if the random variable \(X\) follows the binomial distribution with parameters \(n \in ℕ\) and \(p \in [0,1]\), we write \(X \sim B(n, p)\). The probability of getting exactly k successes in n trials is given by the probability mass function:

Binomial Experiment

A binomial experiment is a statistical experiment that has the following properties:

The experiment consists of n repeated trials.

Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.

The probability of success, denoted by P, is the same on every trial.

The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.

Example 1:

Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because:

The experiment consists of repeated trials. We flip a coin 2 times. Each trial can result in just two possible outcomes - heads or tails. The probability of success is constant - 0.5 on every trial. The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials.

Example 2:

When you flip a coin, there are two possible outcomes: heads and tails. Each outcome has a fixed probability, the same from trial to trial.

  1. Navigate to

GeoGebra

  1. Click on the “Classic View”.
  2. Then click on “Probability”.
  3. Select the “Binomial distribution” with n=20 and p=0.5:
  4. Your output should look as follows:
Fig. 10.1

Fig. 10.1

Find the probability that X is greater than 5 and less than 20:

Make the selections:

Fig. 10.1

Fig. 10.1

It is clear that: P(5≤X≤20) = 0.9941.

Complete Online Discussion 3 on Canvas


Week 11:

The Poisson distribution:

A Poisson distribution is the probability distribution that results from a Poisson experiment.

Attributes of a Poisson Experiment

A Poisson experiment is a statistical experiment that has the following properties: 1. The experiment results in outcomes that can be classified as successes or failures.

  1. The average number of successes (μ) that occurs in a specified region is known.

  2. The probability that a success will occur is proportional to the size of the region.

  3. The probability that a success will occur in an extremely small region is virtually zero.

Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc.

Fig. 10.3

Fig. 10.3

The Poisson distribution has the following properties:

  1. The mean of the distribution is equal to \(μ\).

  2. The variance is also equal to \(μ\).

Example 11.1:

It has been observed that the average number of traffic accidents on the Hollywood Freeway between 7 and 8 PM on Wednesday mornings is 1 per hour. What is the chance that there will be 2 accidents on the Freeway, on some specified Wednesday morning?

Solution:

The basic rate is r = 1 (in hour units), and our window is 1 hour. We wish to know the chance of observing 2 events in that window. The rate r = 1 is included in the Poisson Table, so we don’t have to calculate anything. Reading down the r = 1 column, we come to the p(2) row, and there we find that the probability of 2 accidents is 0.1839, or a little less than 1 chance in 5. It’s not unlikely. You might get that situation about once a week.

Poisson table

Poisson table

Normal distribution:

Introduction to Normal Distributions and the Standard Normal Distribution The normal distribution is a bell-shaped distribution. A normal distribution is a continuous probability distribution for a random variable x with the following properties:

The mean, median, and mode are equal.

The curve is bell-shaped and symmetric about the mean.

The total area under the curve equals 1.

The curve approaches, but never touches, the \(x\)-axis as it extends farther from the mean.

Between \(\mu - \sigma\) and \(\mu + \sigma\), the graph curves downward. To the left of \(\mu - \sigma\) and to the right of \(\mu + \sigma\), the graph curves upward. The points at which the curve changes from curving upward to downward are called points of inflection. The graph of a normal distribution is called the normal curve. The equation for the curve is:

\(y =\frac{1} {\sigma \cdot \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)

The standard normal distribution is a normal distribution with \(\mu = 0\) and \(\sigma = 1\).

Any observation \(x\) from a normal distribution can be “converted” to data from a standard normal distribution by calculating a \(z\)-score:

\(z = \dfrac{Value - Mean} {Standard \ deviation} = \dfrac{x - \mu}{\sigma}\).

Example 11.1

Heights of males at a certain university are approximately normal with a mean of 70.9 inches and a standard deviation of 2.9 inches. Find the \(z\)-score for a male who is 6 feet tall.

Solution:

First, we need to convert 6 feet to inches, so we want a \(z\)-score for 72 inches.

\(z = \dfrac{72 - 70.9}{2.9} = 0.3793\).

For continuous distributions, the probability that a random variable takes an interval of values is the area under the distribution curve over that interval.

For normal distributions, tables are used to calculate probabilities.

The normal distribution table:

Fig. 11.1

Fig. 11.1

Navigate to:

https://www.geogebra.org/classic/probability

  1. To find \(P(Z\leq 0)\), make sure to select the “Normal Distribution”.

  2. Select the left-sided bracket.

  3. Type \(P(Z\leq 0)\) and it should give the answer as 0.5.

Example 11.3

Find the probability that z falls below 2.74.

Solution:

From the table or by using the GeoGebra applet, we can see that

\(P(Z\leq 2.74)=0.9969\).

Example 11.4

Find the probability that z is at least 0.62.

Solution:

Looking up 0.62 in the table gives us that the probability that z is less than 0.62 is 0.7324, but these are complementary events, so the probability that we want is 1 − 0.7324 = 0.2676. (Note the value in the table for −0.62 is also 0.2676. This happens due to symmetry.)

Example 11.5

Find \(P (z ≥ −2.6)\).

Solution:

Note that \(P (z ≥ −2.6) = 1 − P (z < −2.6) = 1 − 0.0047 = 0.9953\).

Example 11.6

Find \(P (−0.24 ≤ z ≤ 0.43)\).

Solution:

From the table, we know that \(P (z ≤ 0.43) = 0.6664\) and \(P (z ≤ −0.24) = 0.4052\), so the area in between the two values is 0.6664 − 0.4052 = 0.2612.

Example 11.7

Find \(P(z = 1)\).

Solution:

We can think of the above probability as \(P(1 ≤ z ≤ 1)\), then use the reasoning from the last part to get that \(P (z = 1) = 0\). We can also get this answer with the “area under the curve” definition for probability, which leads us to a rectangle of width zero, which has area zero.

Fig. 11.3

Fig. 11.3

Example 11.8

Given that \(X\) is a random variable that is normally distributed with \(\mu = 30\) and \(\sigma = 4\). Determine the following:

\(P (30 < x < 35)\).

Solution:

Here we are simply finding the area under the standard type of normal curve under given conditions.

Now, \(Z = \dfrac{(30−30)}{4} = 0\).

Also, \(Z = \dfrac{(35−30)}{4} = 1.25\).

Thus \(P (30 < x < 35) = P (0 < z < 1.25) = 0.3944\)

Example 11.9

Suppose the reaction times of teenage drivers are normally distributed with a mean of 0.53 seconds and a standard deviation of 0.11 seconds.

  1. What is the probability that a teenage driver chosen at random will have a reaction time less than 0.65 seconds?

  2. Find the probability that a teenage driver chosen at random will have a reaction time between 0.4 and 0.6 seconds.

Solution:

1.The goal is to find \(P(x < 0.65)\).

1.1. The first step is to convert 0.65 to a standard score.

\(z = \dfrac{(x - mean)}{standard \ deviation} = \dfrac{(0.65 - 0.53)}{0.11} = 1.09\).

1.2. The problem now is to find \(P(z < 1.09)\). This is a left tail problem as shown in the illustration to the right.

\(P(z < 1.09) = 0.8621\) (see table or use GeoGebra).

2.The goal is to find P(0.4 < x < 0.6).

2.1. The first step is to convert 0.4 and 0.6 to the corresponding standard scores.

\(z_1 = \dfrac{(x - mean)}{standard \ deviation} = \dfrac{(0.4 - 0.53)}{0.11} = -1.18\)

\(z_2 = \dfrac{(x - mean)}{standard \ deviation} = \dfrac{(0.6 - 0.53)}{0.11} = 0.64\).

2.2. The problem now is to find \(P(-1.18 < z < 0.64)\). This is a “between” problem as shown in the illustration to the right.

\(P(-1.18 < z < 0.64) = P(z < 0.64) - P(z < -1.18) = 0.7389 - 0.1190 = 0.6199\)

Therefore, \(P(0.4 < x < 0.6) = 0.6199\).

Project 3

Complete Project 3 on Canvas.


Week 12:

Sampling:

Researchers usually cannot make direct observations of every individual in the population they are studying. Instead, they collect data from a subset of individuals – a sample – and use those observations to make inferences about the entire population.

Ideally, the sample corresponds to the larger population on the characteristic(s) of interest. In that case, the researcher’s conclusions from the sample are probably applicable to the entire population.

Sampling Methods can be classified into one of two categories:

Probability Sampling: Sample has a known probability of being selected

Non-probability Sampling: Sample does not have known probability of being selected as in convenience or voluntary response surveys.

Probability Sampling

In probability sampling it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are examples of probability sampling:

Simple Random Sampling (SRS)

Stratified Sampling

Cluster Sampling

Systematic Sampling

Multistage Sampling (in which some of the methods above are combined in stages)

Stratified Sampling is possible when it makes sense to partition the population into groups based on a factor that may influence the variable that is being measured. These groups are then called strata. An individual group is called a stratum. With stratified sampling one should:

partition the population into groups (strata)

obtain a simple random sample from each group (stratum)

collect data on each sampling unit that was randomly sampled from each group (stratum)

Stratified sampling works best when a heterogeneous population is split into fairly homogeneous groups. Under these conditions, stratification generally produces more precise estimates of the population percents than estimates that would be found from a simple random sample.

Example 12.1:

Fig 12.1

Fig 12.1

Cluster Sampling is very different from Stratified Sampling. With cluster sampling one should

divide the population into groups (clusters).

obtain a simple random sample of so many clusters from all possible clusters.

obtain data on every sampling unit in each of the randomly selected clusters.

It is important to note that, unlike with the strata in stratified sampling, the clusters should be microcosms, rather than subsections, of the population. Each cluster should be heterogeneous. Additionally, the statistical analysis used with cluster sampling is not only different, but also more complicated than that used with stratified sampling.

Fig 12.2

Fig 12.2

In the two examples above, stratified sampling would be preferred over cluster sampling, particularly if the questions of interest are affected by time zone. For example the percentage of people watching a live sporting event on television might be highly affected by the time zone they are in. Cluster sampling really works best when there are a reasonable number of clusters relative to the entire population. In this case, selecting 2 clusters from 4 possible clusters really does not provide much advantage over simple random sampling.

The most common method of carrying out a poll today is using Random Digit Dialing in which a machine random dials phone numbers. Some polls go even farther and have a machine conduct the interview itself rather than just dialing the number! Such “robo call polls” can be very biased because they have extremely low response rates (most people don’t like speaking to a machine) and because federal law prevents such calls to cell phones. Since the people who have landline phone service tend to be older than people who have cell phone service only, another potential source of bias is introduced. National polling organizations that use random digit dialing in conducting interviewer based polls are very careful to match the number of landline versus cell phones to the population they are trying to survey.

Non-probability Sampling

The following sampling methods that are types of non-probability sampling that should be avoided:

volunteer samples

haphazard (convenience) samples

Since such non-probability sampling methods are based on human choice rather than random selection, statistical theory cannot explain how they might behave and potential sources of bias are rampant.

More examples of sampling:

Fig 12.3

Fig 12.3

Complete Online Discussion 5 on CANVAS.

Week 13:

Point estimators:

Suppose we have an unknown population parameter, such as a population mean \(\mu\) or a population proportion \(p\), which we’d like to estimate. For example, suppose we are interested in estimating:

\(p\) = the (unknown) proportion of American college students, 18-24, who have a smart phone

\(\mu\) = the (unknown) mean number of days it takes Alzheimer’s patients to achieve certain milestones

In either case, we can’t possibly survey the entire population. That is, we can’t survey all American college students between the ages of 18 and 24. Nor can we survey all patients with Alzheimer’s disease. So, of course, we do what comes naturally and take a random sample from the population, and use the resulting data to estimate the value of the population parameter. Of course, we want the estimate to be “good” in some way.

Statistical inference is the process by which we infer population properties from sample properties.

There are two types of statistical inference:

• Estimation

• Hypotheses Testing

The concepts involved are actually very similar.

Example 13.1

A poll may seek to estimate the proportion of adult residents of a city that support a proposition to build a new sports stadium. Out of a random sample of 200 people, 106 say they support the proposition. Thus in the sample, 0.53 of the people supported the proposition. This value of 0.53 is called a point estimate of the population proportion. It is called a point estimate because the estimate consists of a single value or point.

Example 13.2

We’re interested in the value of \(\mu\). We collected data and we use the observed \(\bar{x}\) as a point estimate for \(\mu\).


The value we get for \(\bar{X}\) (the sample mean) depends on the specific sample chosen. This means, \(\bar{X}\) is a random variable! The distribution of the random variable \(\bar{X}\) is called the sampling distribution of \(\bar{X}\).

We expect \(\bar{X}\) to be close to \(\mu\) (we ARE using it to estimate \(\mu\)) but there is variability in \(\bar{X}\) before it is observed because we use random sampling to choose our sample of size \(n\).

The Sampling Distribution of \(\bar{X}\) … • Tells us what kind of values are likely to occur for \(\bar{X}\). • Puts a probability distribution over the possible values for \(\bar{X}\).

Turns out the random variable \(\bar{X}\) is normally distributed no matter what your original distribution was IF \(n\) is large enough… What’s large enough? Rule of thumb is \(n ≥ 30\).

Complete Project 4 on CANVAS.

Week 14:

Hypothesis testing

The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about a parameter.

Example 14.1

Is there statistical evidence, from a random sample of potential customers, to support the hypothesis that more than 10% of the potential customers will purchase a new product?

Example 14.2

The manager of a department store is interested in the cost effectiveness of establishing a new billing system for the store’s credit customers. After a careful analysis, she determines that the new system is justified only if the mean monthly account size is more than $170 . The manager wishes to find out if there is sufficient statistical support for this.

The manager takes a random sample of 400 monthly accounts. The sample mean turns out to be $178. Historical data indicate that the standard deviation of monthly accounts is about $65.

Observe that what we are trying to find out is whether or not there is sufficient support for the hypothesis that the mean monthly accounts are “more than $170.” The standard procedure is then to let \(\mu > 170\) be the alternative hypothesis. For this reason, the alternative hypothesis is also often referred to as the research hypothesis.

It follows that the null hypothesis should be defined as \(\mu = 170\). Note that we do not use \(\mu ≤ 170\) as the null hypothesis; this is because the null hypothesis must be precise enough for us to determine a “unique” sampling distribution. The choice \(\mu = 170\) also gives \(H_0\), our favored assumption, the least probability of being rejected.

Thus,

\(H_0 : \mu = 170\)

\(H_1 : \mu > 170\)

where \(H_1\) is what we want to determine and \(H_0\) specifies a single value for the parameter of interest.

Clearly, if the sample mean is “large” relative to 170, i.e., if

\(X > \bar{X_L}\), for a suitably-chosen control limit \(\bar{X_L}\), then we should reject the null hypothesis in favor of the alternative. Pictorially, this means that for a given \(\alpha\), we wish to find \(\bar{X_L}\), such that:

Fig. 14.1

Fig. 14.1

Formally, from the central limit theorem, we know that if our null hypothesis is true, i.e., if \(\mu = 170\), then

\(P(\bar{X_L} \leq 170 + z_{\alpha}\dfrac{\sigma}{\sqrt{n}})\),

then

\(P(X > \bar{X_L}) = \alpha\) and the rejection region is the interval

$ ({X_L}, )$.

For \(\alpha = 0.05\), we have

\(\bar{X_L}=170 + 1.645 \cdot \dfrac{65}{\sqrt{400}}=175.34.\)

Since the observed sample mean \(\bar{X} = 178\) is greater than 175.34, we reject the null hypothesis in favor of the research hypothesis (which is what we are investigating). In other words, statistical evidence suggests that the installation of the new billing system will be cost effective.