Check out the following opinions: Opinion 1. Opinion 2. Opinion 3. Opinion 4.
RStudio is a graphical user interface for R which includes a set of integrated tools designed to help you be more productive with R. It includes:
Note: Once R and Rstudio are installed, it is not necessary to start R, because Rstudio will start it
There are 4 types of measurement scales: 1. Nominal/Categorical
Ordinal
Interval
Ratio
These are simply ways to categorize different types of variables.
o Used to label variables without any quantitative value.
o “Nominal” scales could also be called “labels.”
o Scales are mutually exclusive.
o Categories have no intrinsic ranking.
o Gender
o Zip code
o Eye color
o Allows for rank order (1st, 2nd, 3rd, etc.) by which data can be sorted.
o Does not allow for relative degree of difference between them.
o ‘sick’ vs. ‘healthy’ when measuring health,
o ‘guilty’ vs. ‘not-guilty’ when making judgments in courts,
o ‘wrong/false’ vs. ‘right/true’ when measuring truth value,
o a spectrum of values, such as ‘completely agree’, ‘mostly agree’, ‘mostly disagree’, ‘completely disagree’ when measuring opinion.
o Interval scales provide information about order,
o Possess equal intervals.
o Interval scale is temperature,
o Interval time of day.
o Has an absolute zero (a point where none of the quality being measured exists),
o Tell us the exact value between units,
o Quantitative variable
o Salary
o Height
o Weight
Competency 015: The teachers understands how to use appropriate graphical and numerical techniques to explore data, characterize patterns and describe departures from patterns.
The beginning teacher:
Selects and uses an appropriate measurement scale (i.e nominal, ordinal, interval, ratio) to answer research questions and analyze data.
Organizes, displays and interprets data in a variety of formats (eg. tables, frequency distributions, scatterplots, stem-and-leaf plots, box-and-whisker plots, histograms, pie charts).
Consider the following data file on the body temperatures of ten US males.
https://www.amazon.com/clouddrive/share/RJLhFeGmPR8j4b4dQDUzjuxbnhDLhIKqabQvJCKDnER
Watch the following video on how to import a .csv file into R. https://www.amazon.com/clouddrive/share/bcK8ZluX3i45PvJaQ5Omwc0ii53iVzRJx1jcrYIAbp9
This week we cover the following topics:
A histogram is a visual representation of the distribution of a dataset. The shape of a histogram allows you to easily see where most of the data is situated. In particular, you can see where the middle of distribution is located, how closely the data lie around the middle, and where possible outliers are to be found. As shown in the figures below, a histogram consists of an x-axis, a y-axis and bars of different heights. The x-axis is divided into intervals (called “bins”), and on each bin a vertical bar is constructed whose height represents the number of data values within that bin. Note that histograms (unlike bar charts) don’t have gaps between the bars (if it looks like there’s a gap, that’s because that particular bin has no data in it).
Example: Suppose you are interested in the distribution of ages for employees working in a certain office. The following data is available: 36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55. We use R to construct a histogram to represent the distribution of the data.
age<-c(36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55)
hist(age)
The output appears under the ‘Plots’ tab, and looks like this:
[Histogram of age] The ‘hist’ command has many options that enable the user to change the display. For example, the user can control the number of bins by using the ‘breaks’ option. The title of the histogram by using the ‘main’ option, and the x- and y-axis labels using the ‘xlab’ and ‘ylab’ options.
Example: The following command creates a histogram with 7 nonempty bins, with title “Age of Employees” and x label “Employee ages”:
hist(age,breaks=7,main="Age of Employees",xlab="Employee ages")
The output appears under the ‘Plots’ tab, and looks like this:
[Histogram of age] ### XY plots {#xyplots} ###### Top of page
The command ‘xyplot’ can be used to plot one variable against another. The command uses the ‘lattice’ package, so before using it you must load the package.
Example: Load a new package called ‘lattice’.
library(lattice)
If you get an error message, it probably means you haven’t installed ‘lattice’. In this case, go back to “R_RStudioWindows” and follow the instructions found in the section ‘Packages window’.
To demonstrate ‘xyplot’ we will be using data from the ‘mosaicData package’, so you must load this package as well.
Install the package ‘mosaic’:
install.packages('mosaic')
Install the package ‘mosaicData’:
install.packages('mosaicData')
Load the package ‘mosaic’:
require(mosaic)
Load the package ‘mosaicData’:
require(mosaicData)
We set the default number of digits to 2:
options(digits =2)
Consider the HELPrct (Health Evaluation and Linkage to Primary Care) data set that can be found under the “mosaicData” package. The HELP study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomized to receive a multidisciplinary assessment and a brief motivational intervention or usual care, with the goal of linking them to primary medical care.
This is a data frame with 453 observations on the following variables.
age subject age at baseline (in years)
anysub use of any substance post-detox: a factor with levels no yes
cesd Center for Epidemiologic Studies Depression measure at baseline (high scores indicate more depressive symptoms)
d1 lifetime number of hospitalizations for medical problems (measured at baseline)
daysanysub time (in days) to first use of any substance post-detox
dayslink time (in days) to linkage to primary care
drugrisk Risk Assessment Battery drug risk scale at baseline
e2b number of times in past 6 months entered a detox program (measured at baseline)
female 0 for male, 1 for female
sex a factor with levels male female
g1b experienced serious thoughts of suicide in last 30 days (measured at baseline): a factor with levels no yes
homeless housing status: a factor with levels housed homeless
i1 average number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)
i2 maximum number of drinks (standard units) consumed per day, in the past 30 days (measured at baseline)
id subject identifier
indtot Inventory of Drug Use Consequences (InDUC) total score (measured at baseline)
linkstatus post-detox linkage to primary care (0 = no, 1 = yes)
link post-detox linkage to primary care: no yes
mcs SF-36 Mental Component Score (measured at baseline, lower scores indicate worse status)
pcs SF-36 Physical Component Score (measured at baseline, lower scores indicate worse status)
pss_fr perceived social support by friends (measured at baseline, higher scores indicate more support)
racegrp race/ethnicity: levels black hispanic other white
satreat any BSAS substance abuse treatment at baseline: no yes
sexrisk Risk Assessment Battery sex risk score (measured at baseline)
substance primary substance of abuse: alcohol cocaine heroin
treat randomized to HELP clinic: no yes
We find the mean of the cesd (Center for Epidemiologic Studies Depression measure at baseline (high scores indicate more depressive symptoms)) variable:
mean(HELPrct$cesd)
Which is equal to 33.
The standard deviation is:
sd(HELPrct$cesd)
Which works out to be 13.
The variance is:
var(HELPrct$cesd)
157
We can also calculate the median:
median(HELPrct$cesd)
which is 34.
We can use the “summary” command to print out the min, max, mean, median, and quantiles:
library(mosaic)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggformula
## Loading required package: ggplot2
##
## New to ggformula? Try the tutorials:
## learnr::run_tutorial("introduction", package = "ggformula")
## learnr::run_tutorial("refining", package = "ggformula")
## Loading required package: mosaicData
## Loading required package: Matrix
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
##
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
##
## mean
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median,
## prop.test, quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
summary(HELPrct$cesd)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.00 34.00 32.85 41.00 60.00
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 25 34 33 41 60
hist(HELPrct$cesd)
How many females in teh dataset?
tally(~sex, data=HELPrct)
## sex
## female male
## 107 346
tally(~sex, format="percent", data=HELPrct)
## sex
## female male
## 23.62031 76.37969
Lets restrict our attention to the female subjects. We use the filter() function in the dplyr package to generate a new dataframe containing only females.
female<-filter(HELPrct, sex=='female')
female
## age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b
## 1 39 1 yes 15 2 189 343 0 1
## 2 47 1 yes 6 1 31 365 0 NA
## 3 49 NA <NA> 52 14 NA 334 0 1
## 4 50 1 yes 50 14 31 365 18 7
## 5 34 NA <NA> 46 0 NA 365 8 NA
## 6 58 0 no 49 3 192 365 0 NA
## 7 28 1 yes 35 6 27 41 0 2
## 8 27 0 no 52 0 198 49 10 4
## 9 48 1 yes 19 4 67 365 0 NA
## 10 34 1 yes 5 2 23 14 0 NA
## 11 35 1 yes 46 3 17 365 0 NA
## 12 41 0 no 29 3 181 19 0 2
## 13 29 0 no 33 3 180 365 1 4
## 14 40 0 no 57 5 181 34 0 NA
## 15 26 NA <NA> 30 4 NA NA 0 NA
## 16 41 1 yes 43 0 2 NA 10 NA
## 17 32 1 yes 37 2 175 365 0 NA
## 18 33 NA <NA> 47 9 NA 38 0 3
## 19 40 NA <NA> 36 1 NA 217 0 1
## 20 35 NA <NA> 30 2 NA 16 0 NA
## 21 30 0 no 39 0 201 18 0 1
## 22 32 NA <NA> 53 15 NA 41 0 NA
## 23 42 0 no 26 10 183 358 0 2
## 24 30 NA <NA> 51 9 NA NA 9 1
## 25 35 NA <NA> 58 5 NA 17 0 2
## 26 30 1 yes 15 1 15 365 0 NA
## 27 50 0 no 35 6 178 49 0 NA
## 28 38 NA <NA> 26 4 NA 28 0 NA
## 29 24 1 yes 45 0 68 365 0 1
## 30 49 NA <NA> 28 13 NA 193 0 1
## 31 28 1 yes 48 4 12 413 0 NA
## 32 37 NA <NA> 35 1 NA 106 0 NA
## 33 31 1 yes 15 1 31 365 0 NA
## 34 30 1 yes 29 2 12 365 0 NA
## 35 57 1 yes 39 4 28 380 0 1
## 36 29 NA <NA> 46 6 NA 365 5 3
## 37 33 NA <NA> 44 4 NA 427 0 NA
## 38 28 1 yes 38 3 117 218 0 NA
## 39 31 NA <NA> 38 10 NA 405 20 1
## 40 36 NA <NA> 53 3 NA 45 0 3
## 41 38 NA <NA> 57 4 NA 370 0 NA
## 42 39 NA <NA> 43 1 NA 365 13 1
## 43 33 1 yes 19 40 3 146 0 1
## 44 38 1 yes 34 1 0 348 14 1
## 45 43 NA <NA> 36 1 NA 18 0 NA
## 46 33 1 yes 24 6 2 365 1 NA
## 47 29 NA <NA> 54 0 NA 407 4 NA
## 48 47 0 no 41 1 190 78 0 NA
## 49 31 NA <NA> 18 3 NA NA 8 1
## 50 40 NA <NA> 60 7 NA 406 0 NA
## 51 32 0 no 34 3 184 365 0 NA
## 52 38 0 no 38 3 247 365 0 1
## 53 32 1 yes 37 1 82 348 0 NA
## 54 35 NA <NA> 24 1 NA 365 0 NA
## 55 35 0 no 34 1 172 136 0 NA
## 56 45 1 yes 40 5 7 365 0 1
## 57 47 NA <NA> 39 2 NA 365 1 3
## 58 39 1 yes 42 4 215 428 0 NA
## 59 44 NA <NA> 13 0 NA 365 0 NA
## 60 55 1 yes 30 2 11 40 0 2
## 61 34 NA <NA> 19 1 NA 329 0 NA
## 62 34 NA <NA> 36 1 NA 326 0 NA
## 63 31 NA <NA> 22 0 NA 359 0 NA
## 64 27 1 yes 33 0 4 365 0 2
## 65 33 1 yes 51 1 5 365 1 6
## 66 30 NA <NA> 30 6 NA 83 0 NA
## 67 34 NA <NA> 38 2 NA 365 8 NA
## 68 37 0 no 37 2 179 41 0 NA
## 69 26 NA <NA> 56 2 NA 365 0 NA
## 70 45 1 yes 41 0 33 365 4 1
## 71 23 1 yes 48 1 2 365 0 2
## 72 35 1 yes 45 3 1 26 0 1
## 73 42 NA <NA> 52 3 NA 63 0 NA
## 74 32 1 yes 45 4 1 427 0 2
## 75 36 1 yes 39 1 136 324 0 2
## 76 22 1 yes 51 2 2 374 9 1
## 77 37 NA <NA> 58 8 NA 365 0 2
## 78 33 1 yes 19 0 64 33 0 NA
## 79 43 0 no 7 0 187 41 0 NA
## 80 47 1 yes 54 1 4 349 8 NA
## 81 48 1 yes 53 4 0 302 0 3
## 82 35 1 yes 54 1 5 365 13 NA
## 83 38 NA <NA> 42 4 NA 337 0 NA
## 84 35 0 no 36 0 178 361 0 NA
## 85 47 NA <NA> 52 8 NA 365 0 2
## 86 33 NA <NA> 40 4 NA 21 0 NA
## 87 26 1 yes 33 0 35 296 0 1
## 88 34 1 yes 29 0 12 356 0 NA
## 89 47 0 no 32 3 158 74 0 NA
## 90 39 0 no 52 2 268 449 0 NA
## 91 37 1 yes 41 10 1 393 0 NA
## 92 31 1 yes 42 1 15 365 0 NA
## 93 42 1 yes 42 5 33 98 0 NA
## 94 33 NA <NA> 15 0 NA 365 0 NA
## 95 38 NA <NA> 33 1 NA 286 1 NA
## 96 43 NA <NA> 23 4 NA 365 0 2
## 97 27 NA <NA> 3 0 NA 365 0 NA
## 98 21 NA <NA> 39 0 NA NA 6 NA
## 99 29 NA <NA> 47 2 NA 365 0 NA
## 100 45 NA <NA> 41 2 NA 365 0 1
## 101 24 NA <NA> 34 2 NA 365 14 8
## 102 35 NA <NA> 23 2 NA 28 0 NA
## 103 33 NA <NA> 21 8 NA NA 0 NA
## 104 36 NA <NA> 29 4 NA 365 0 NA
## 105 33 NA <NA> 40 2 NA 365 0 1
## 106 31 NA <NA> 47 1 NA 365 0 NA
## 107 39 NA <NA> 28 0 NA 365 1 NA
## female sex g1b homeless i1 i2 id indtot linkstatus link mcs
## 1 1 female no housed 5 5 4 28 0 no 43.967880
## 2 1 female no housed 4 4 6 29 0 no 55.508991
## 3 1 female yes housed 13 20 7 38 0 no 21.793024
## 4 1 female no homeless 71 129 9 44 0 no 22.029678
## 5 1 female no housed 0 0 11 34 0 no 43.974678
## 6 1 female no housed 13 13 12 11 0 no 13.382205
## 7 1 female yes homeless 0 0 17 26 1 yes 29.799828
## 8 1 female yes housed 9 24 20 37 1 yes 15.458271
## 9 1 female no housed 6 8 27 40 0 no 21.668474
## 10 1 female no housed 6 13 50 8 1 yes 59.454094
## 11 1 female no housed 13 20 57 32 0 no 24.000315
## 12 1 female yes housed 3 6 65 20 1 yes 33.374172
## 13 1 female yes homeless 0 0 66 29 0 no 27.575460
## 14 1 female yes homeless 59 164 71 43 1 yes 17.705963
## 15 1 female yes housed 12 18 74 37 NA <NA> 26.697262
## 16 1 female no housed 0 0 75 40 NA <NA> 15.447794
## 17 1 female yes housed 2 2 90 40 0 no 28.858498
## 18 1 female yes housed 64 64 100 44 1 yes 19.595461
## 19 1 female yes homeless 33 38 104 42 1 yes 27.993336
## 20 1 female no housed 9 15 108 33 1 yes 23.299021
## 21 1 female no housed 0 0 118 19 1 yes 24.747171
## 22 1 female yes homeless 34 34 120 33 1 yes 27.136280
## 23 1 female no homeless 39 95 121 31 0 no 41.321629
## 24 1 female yes housed 0 0 125 43 NA <NA> 19.156574
## 25 1 female yes housed 1 1 127 37 1 yes 18.465418
## 26 1 female no housed 26 26 131 25 0 no 37.438934
## 27 1 female no housed 13 13 134 28 1 yes 20.310446
## 28 1 female no housed 0 0 138 39 1 yes 22.787546
## 29 1 female no homeless 7 7 141 39 0 no 28.505577
## 30 1 female no homeless 15 15 143 36 1 yes 40.156929
## 31 1 female no housed 2 2 150 33 0 no 22.017500
## 32 1 female no homeless 1 3 153 25 1 yes 33.366123
## 33 1 female no housed 0 0 166 38 0 no 50.030434
## 34 1 female no homeless 29 29 179 31 0 no 52.197483
## 35 1 female no housed 12 12 181 36 0 no 36.651463
## 36 1 female no housed 0 0 187 39 0 no 20.119982
## 37 1 female yes homeless 59 59 188 38 0 no 25.257971
## 38 1 female yes housed 16 20 191 35 1 yes 18.324743
## 39 1 female yes homeless 26 33 193 44 0 no 22.442661
## 40 1 female yes homeless 50 50 194 41 1 yes 27.171751
## 41 1 female yes housed 13 32 200 39 0 no 20.356680
## 42 1 female yes housed 20 20 203 37 0 no 22.815102
## 43 1 female no homeless 19 26 204 32 1 yes 40.032974
## 44 1 female no homeless 0 0 213 32 0 no 43.353584
## 45 1 female yes housed 58 58 219 40 1 yes 36.100307
## 46 1 female yes housed 32 38 220 23 0 no 33.259956
## 47 1 female no housed 0 0 221 33 0 no 12.323594
## 48 1 female yes homeless 0 0 224 21 1 yes 37.953403
## 49 1 female yes housed 0 0 226 32 NA <NA> 27.641029
## 50 1 female yes homeless 38 38 228 43 0 no 16.786348
## 51 1 female no housed 13 13 229 31 0 no 54.768539
## 52 1 female yes housed 16 26 236 34 0 no 14.919310
## 53 1 female no housed 1 6 237 28 0 no 40.462433
## 54 1 female no housed 0 0 241 34 0 no 44.351089
## 55 1 female no homeless 4 4 242 36 1 yes 16.469986
## 56 1 female yes housed 10 14 247 34 0 no 26.311474
## 57 1 female no housed 42 48 249 33 0 no 27.471394
## 58 1 female yes housed 0 0 254 20 0 no 13.968738
## 59 1 female no housed 13 13 255 26 0 no 41.867615
## 60 1 female no housed 1 2 264 41 1 yes 23.547628
## 61 1 female no housed 4 4 269 27 0 no 34.048084
## 62 1 female no housed 1 1 272 38 0 no 32.384045
## 63 1 female no housed 10 20 275 23 0 no 47.442879
## 64 1 female no homeless 8 8 284 38 0 no 31.781149
## 65 1 female yes housed 8 13 304 28 0 no 20.911337
## 66 1 female yes homeless 27 33 306 25 1 yes 44.446507
## 67 1 female no housed 0 0 308 33 0 no 21.543468
## 68 1 female no homeless 1 1 313 33 1 yes 27.601431
## 69 1 female no housed 1 1 316 36 0 no 14.415197
## 70 1 female no housed 2 2 320 22 0 no 34.747746
## 71 1 female yes homeless 29 58 324 27 0 no 16.718819
## 72 1 female no housed 0 0 325 32 1 yes 20.220354
## 73 1 female yes homeless 0 0 327 32 1 yes 28.447634
## 74 1 female yes homeless 67 67 333 40 0 no 17.926985
## 75 1 female yes homeless 53 53 339 36 0 no 22.237560
## 76 1 female no housed 0 0 342 40 0 no 7.035307
## 77 1 female yes homeless 67 80 351 41 0 no 16.922634
## 78 1 female no homeless 6 6 354 22 1 yes 24.923189
## 79 1 female no homeless 26 26 364 15 1 yes 60.542084
## 80 1 female yes housed 13 13 367 35 0 no 13.852996
## 81 1 female yes homeless 0 0 370 32 0 no 19.808329
## 82 1 female no housed 0 0 372 44 0 no 9.406377
## 83 1 female yes housed 3 3 374 40 0 no 27.495565
## 84 1 female no homeless 58 58 379 13 0 no 44.767254
## 85 1 female no housed 6 6 391 34 0 no 7.226597
## 86 1 female no housed 13 26 402 38 1 yes 19.819555
## 87 1 female no housed 0 0 403 41 0 no 29.213017
## 88 1 female no housed 0 0 421 37 0 no 31.077631
## 89 1 female no housed 21 21 431 13 1 yes 51.922516
## 90 1 female no housed 0 0 442 37 0 no 24.930353
## 91 1 female no homeless 24 51 445 44 0 no 25.710777
## 92 1 female yes homeless 6 13 461 34 0 no 16.863588
## 93 1 female yes housed 26 41 465 35 1 yes 30.701563
## 94 1 female no housed 0 0 466 6 0 no 41.624706
## 95 1 female yes housed 3 16 470 33 0 no 22.337873
## 96 1 female no homeless 19 19 55 31 0 no 27.717655
## 97 1 female no housed 1 1 139 21 0 no 57.834595
## 98 1 female yes housed 0 0 155 35 NA <NA> 47.773228
## 99 1 female no homeless 11 14 157 35 0 no 9.732559
## 100 1 female no homeless 19 26 162 25 0 no 55.479382
## 101 1 female no housed 13 26 171 38 0 no 28.590870
## 102 1 female no housed 4 4 303 20 1 yes 45.425110
## 103 1 female no homeless 26 26 345 28 NA <NA> 18.594315
## 104 1 female no housed 7 8 349 27 0 no 25.676130
## 105 1 female yes homeless 26 32 427 37 0 no 34.152245
## 106 1 female yes homeless 56 61 451 41 0 no 17.050970
## 107 1 female no homeless 1 24 460 28 0 no 33.434536
## pcs pss_fr racegrp satreat sexrisk substance treat avg_drinks
## 1 61.93168 11 white yes 4 heroin no 5
## 2 46.47521 5 black no 5 cocaine yes 4
## 3 24.51504 1 black yes 8 cocaine no 13
## 4 38.27088 5 white no 8 alcohol no 71
## 5 60.07915 0 white no 2 heroin yes 0
## 6 41.93376 13 black yes 0 alcohol no 13
## 7 44.77651 7 hispanic yes 3 heroin yes 0
## 8 37.45214 13 white no 3 heroin yes 9
## 9 36.01007 6 black no 7 cocaine no 6
## 10 52.69898 12 black no 4 cocaine yes 6
## 11 46.75086 1 black no 7 cocaine yes 13
## 12 55.23372 13 white yes 4 alcohol yes 3
## 13 35.12470 4 hispanic yes 4 heroin no 0
## 14 36.04016 1 black no 4 alcohol yes 59
## 15 54.38272 6 white no 9 cocaine no 12
## 16 55.32189 14 white no 3 heroin no 0
## 17 43.94296 11 black no 3 cocaine no 2
## 18 40.48884 1 other no 7 alcohol yes 64
## 19 44.53589 7 white yes 3 alcohol no 33
## 20 51.81045 12 black yes 5 alcohol yes 9
## 21 54.10854 14 hispanic no 4 cocaine yes 0
## 22 54.79462 7 black no 5 alcohol yes 34
## 23 36.68874 4 black no 10 cocaine no 39
## 24 34.33698 10 white no 6 heroin no 0
## 25 39.33260 13 black yes 6 cocaine yes 1
## 26 49.29042 11 black yes 3 cocaine yes 26
## 27 33.48925 2 white no 0 alcohol no 13
## 28 28.74085 9 other no 7 cocaine yes 0
## 29 37.79718 7 black yes 7 cocaine yes 7
## 30 40.96234 7 hispanic yes 9 alcohol no 15
## 31 40.24271 1 white no 5 cocaine yes 2
## 32 45.16520 8 black no 9 cocaine yes 1
## 33 57.38777 9 black yes 2 cocaine no 0
## 34 55.73845 13 black yes 7 cocaine yes 29
## 35 30.50811 6 white yes 0 alcohol no 12
## 36 32.96189 3 white no 4 heroin yes 0
## 37 42.12069 7 hispanic no 5 alcohol no 59
## 38 43.24062 14 black no 11 cocaine no 16
## 39 35.90619 8 white no 11 alcohol no 26
## 40 37.75567 3 white no 9 alcohol yes 50
## 41 35.97361 0 black no 14 cocaine no 13
## 42 35.22702 10 white no 4 heroin yes 20
## 43 38.10227 2 black yes 7 cocaine no 19
## 44 21.91906 9 black no 8 heroin no 0
## 45 37.03778 11 black yes 2 alcohol yes 58
## 46 41.66993 8 other no 3 heroin no 32
## 47 48.21926 11 white no 6 heroin no 0
## 48 57.64361 11 black no 0 cocaine no 0
## 49 48.37090 12 white no 4 heroin no 0
## 50 38.51597 3 white yes 11 cocaine yes 38
## 51 23.48208 12 black yes 0 cocaine no 13
## 52 57.83691 3 white no 5 alcohol yes 16
## 53 56.90286 3 black yes 4 cocaine yes 1
## 54 46.79942 4 black no 2 cocaine no 0
## 55 58.49455 2 black no 8 cocaine no 4
## 56 43.25021 8 white no 5 alcohol no 10
## 57 52.42204 10 black no 5 heroin no 42
## 58 48.97176 11 black no 4 cocaine yes 0
## 59 46.36879 7 hispanic no 4 heroin no 13
## 60 37.35865 7 black yes 2 heroin yes 1
## 61 57.24648 12 black no 2 cocaine no 4
## 62 44.85584 10 black no 4 cocaine no 1
## 63 52.85658 11 black no 7 alcohol yes 10
## 64 51.49556 7 black yes 8 cocaine yes 8
## 65 33.07642 6 hispanic yes 4 heroin yes 8
## 66 45.79400 12 black no 4 alcohol yes 27
## 67 52.35651 10 white no 4 heroin no 0
## 68 37.83872 11 black no 6 cocaine no 1
## 69 46.74971 2 black no 11 heroin yes 1
## 70 64.35030 3 white no 1 heroin yes 2
## 71 35.70664 3 black no 11 alcohol yes 29
## 72 32.44772 2 black no 9 alcohol yes 0
## 73 39.93384 2 other no 0 heroin yes 0
## 74 39.09279 7 black no 6 alcohol no 67
## 75 36.52407 3 black yes 5 alcohol no 53
## 76 52.51404 8 other no 7 heroin yes 0
## 77 34.09209 0 other no 2 alcohol no 67
## 78 63.77832 8 black no 4 cocaine yes 6
## 79 55.44015 13 white no 1 heroin yes 26
## 80 31.11147 9 black no 0 cocaine yes 13
## 81 27.09086 13 white yes 3 alcohol no 0
## 82 41.95401 13 white no 4 heroin no 0
## 83 51.27790 3 black no 9 cocaine no 3
## 84 53.42212 14 black no 4 cocaine no 58
## 85 47.60948 9 white no 4 alcohol yes 6
## 86 32.99675 0 black no 4 alcohol yes 13
## 87 56.69189 3 black yes 3 heroin no 0
## 88 64.91865 14 black no 12 cocaine yes 0
## 89 54.52398 12 hispanic no 0 alcohol no 21
## 90 33.53111 7 black no 2 heroin yes 0
## 91 49.18084 9 other no 9 alcohol no 24
## 92 46.69877 0 black no 10 cocaine yes 6
## 93 38.40187 5 white no 6 alcohol yes 26
## 94 62.08943 11 black yes 6 cocaine yes 0
## 95 42.31495 8 black no 1 heroin no 3
## 96 41.10135 3 black no 6 alcohol no 19
## 97 58.21511 4 black yes 1 cocaine no 1
## 98 41.09781 14 white no 1 heroin no 0
## 99 69.17161 4 black no 7 cocaine no 11
## 100 54.09069 4 white no 4 alcohol no 19
## 101 57.76270 9 white yes 14 heroin yes 13
## 102 58.75759 1 black no 2 cocaine yes 4
## 103 38.86502 3 white no 4 alcohol no 26
## 104 54.98139 13 white no 4 alcohol yes 7
## 105 45.27036 2 hispanic no 3 alcohol yes 26
## 106 34.51623 8 hispanic yes 14 alcohol no 56
## 107 40.04572 1 white no 2 heroin no 1
## max_drinks
## 1 5
## 2 4
## 3 20
## 4 129
## 5 0
## 6 13
## 7 0
## 8 24
## 9 8
## 10 13
## 11 20
## 12 6
## 13 0
## 14 164
## 15 18
## 16 0
## 17 2
## 18 64
## 19 38
## 20 15
## 21 0
## 22 34
## 23 95
## 24 0
## 25 1
## 26 26
## 27 13
## 28 0
## 29 7
## 30 15
## 31 2
## 32 3
## 33 0
## 34 29
## 35 12
## 36 0
## 37 59
## 38 20
## 39 33
## 40 50
## 41 32
## 42 20
## 43 26
## 44 0
## 45 58
## 46 38
## 47 0
## 48 0
## 49 0
## 50 38
## 51 13
## 52 26
## 53 6
## 54 0
## 55 4
## 56 14
## 57 48
## 58 0
## 59 13
## 60 2
## 61 4
## 62 1
## 63 20
## 64 8
## 65 13
## 66 33
## 67 0
## 68 1
## 69 1
## 70 2
## 71 58
## 72 0
## 73 0
## 74 67
## 75 53
## 76 0
## 77 80
## 78 6
## 79 26
## 80 13
## 81 0
## 82 0
## 83 3
## 84 58
## 85 6
## 86 26
## 87 0
## 88 0
## 89 21
## 90 0
## 91 51
## 92 13
## 93 41
## 94 0
## 95 16
## 96 19
## 97 1
## 98 0
## 99 14
## 100 26
## 101 26
## 102 4
## 103 26
## 104 8
## 105 32
## 106 61
## 107 24
with(female, stem(cesd))
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 3
## 0 | 567
## 1 | 3
## 1 | 555589999
## 2 | 123344
## 2 | 66889999
## 3 | 0000233334444
## 3 | 5556666777888899999
## 4 | 00011112222334
## 4 | 555666777889
## 5 | 011122222333444
## 5 | 67788
## 6 | 0
Concepts of probability through sampling, experiments and simulations
Simple and compound events
Determines probabilities by constructing sample spaces to model situations
Whether you know it or not, you are a fortune teller of sorts. Every day, all day, you are constantly predicting what will happen:
You choose clothes based on what you think the weather will be (or, be honest, what you have that’s clean regardless of the weather).
You choose what table to sit at in the cafeteria based on where you think your friends will sit.
You choose and choose and choose, and every choice is a prediction of how likely you think an event or series of events is to happen. We can actually measure how likely it is for something to happen, and that measurement is called probability.
Fig. 5.1
Flip a coin. What is your sample space?
Let’s start with dice. Take a die (make sure it’s fair, not weighted or “funny” in any way). What are the odds (the probability) of rolling a 3 if you roll the die one time? Hopefully you figured out that it is one in six because there are six sides to the die, and one of those sides has three dots.
Let’s try it. Take the die and roll it 100 times, recording your results below. Calculate the percentage of each result (for example, if you rolled a 2 17 times, that would be 17/100, or 17 percent).
Fig. 5.2
We would expect that the percentages for each number would hover around 16 or 17, which is 1/6 or .1666. This is probability in a nutshell.
Now guess what the percentage would be if you added up the percentages of the rolls of only the oddnumbered sides. When you add up those roles, does the percentage come close to your guess?
Probability is the ratio of the times an event is likely to occur divided by the total possible events.
In the case of our die, there are six possible events, and there is one likely event for each number with each roll, or 1/6. If there were no dots on any of the sides, the probability of rolling a 3 would be zero because there would be no 3 and no other dots either, giving us this ratio: 0/0. If every side had three dots, the probability of rolling a 3 would be 1 because it would be 6/6, or 1. So, probability is expressed as a number somewhere between 0 (not gonna happen) and 1 (definitely going to happen), with ratios closer to 1 being most likely.
Let’s put it in formula form:
Using the formula: 1. What is the probability of getting heads on a coin toss?
So far, we’ve just looked at things that could occur. What about looking at things that have actually happened?
We call that relative frequency, and it has its own formula:
Fig. 5.4
Think back to your die experiment above. The first formula gives you your expectation (1/6). The formula for relative frequency gives you the actual outcome. What was the relative frequency of rolling 5s in your dierolling experiment?
The more times we roll the die, the closer we will get to the outcome we expected (1/6). We call that the Law of Large Numbers — even if you don’t get it to come out like you expect with a few tries, the more you do it, the closer you will come to the expectation.
Flip a penny 10 times and record how many times it lands on heads and how many times it lands on tails.
Heads: ____________
Tails: ____________
Now flip it 100 times and record your results.
Heads: ____________
Tails: ____________
Did you get closer to a ½ ratio the second time? That’s the Law of Large Numbers at work. Remember, the Law of Large Numbers tells us that the more times you repeat an experiment, the closer the relative frequency will come to the probability.
We have one more thing to learn before we move on. Let’s figure out how to calculate the possible outcomes of an event. We’ve figured out the probability of simple events occurring, but what happens when the possible outcomes are harder to figure out? When you are rolling one die or flipping one coin, it’s simple to figure out possible outcomes, but it gets more complicated when you add in more dice or more coins.
Imagine that your parents pay you an allowance of 50¢ a week. Let’s say you love nickels. What is the probability that your 50¢ will contain a nickel? We need to figure out all the possible ways to give someone 50¢. Let’s say that your parents don’t ever use pennies or 50-cent pieces. Besides those, there are three possible coins to use: nickels, dimes and quarters. If your parents pay you in quarters, it’s simple, right? Two quarters make 50¢. But what other possible combinations make 50¢? And how likely are you to get that nickel you want? Let’s set it up in the table below. First, across the top we’ll list the possible coins. Next, we’ll start listing possible combinations by listing the greatest possible number of that type of coin and then decreasing that by one on the next line. Once we’ve gotten to zero of that coin, let’s move to the highest level possible of the next highest coin. Sound confusing? It’s not once you get going. Let’s try it:
First, quarters. We list two quarters, and that makes 50¢ by itself, so the columns for dimes and nickels are zero. There are no other possible combinations with two quarters, so the next line lists one quarter. Remember, we are going from largest to smallest, so first we’ll try one quarter and the greatest number of dimes possible, which is two. There are two other possibilities with one quarter, so we’ll list those on the next lines and then the next lower number of dimes, which is one. That means we’ll need three nickels because we’ve always got to add up to fifty cents.
Fig. 5.5
Now we’re out of possibilities that use quarters, so we move on to dimes. The most dimes we could have is five, so we start with that. We decrease that number by one on each line. For every dime we take away, we have to add two nickels, so notice that the nickels increase by two each line. We end up with 10 possible combinations of coins.
So, how many of the 10 possibilities contain nickels? Did you count eight? You’re right! So let’s put this in our probability formula:
8 = desired (or likely) outcomes
10 = possible outcomes
So 8/10.
Does that seem like good odds to you? Not bad!
What are the odds if you want a quarter? How many possibilities contained a quarter? Did you find four? So the odds of getting a quarter are: 4 = likely outcomes
10 = possible outcomes
which gives 4/10.
Are you more likely to get a quarter or a nickel? All other things being equal (meaning your parents have a wide variety of coins and aren’t out of dimes or something), you are far more likely to get a nickel than a quarter.
In summary, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
The LLN is important because it “guarantees” stable long-term results for the averages of some random events. For example, while a casino may lose money in a single spin of the roulette wheel, its earnings will tend towards a predictable percentage over a large number of spins. Any winning streak by a player will eventually be overcome by the parameters of the game. It is important to remember that the LLN only applies (as the name indicates) when a large number of observations is considered.
Suppose you go to the store and there are six kinds of cereal. Your mom tells you to pick one. When she comes back, you ask her to guess which one you chose. Assuming she had no idea beforehand what your choice would be, how likely is it that she will guess the correct one? _____________________________________________________________________________________________
Let’s say you are playing a coin-tossing game with a friend. You toss the same coin 500 times, and 400 times it comes up heads. Would this be normal? What could explain it? ____________________________________________________________
You are bored one day, so you start rolling a die. You roll it 10 times and you get a 6 eight of the times. You think this is strange, so you keep rolling. You roll 100 more times and only get eight more 6s, leaving a total of 16. What is the rule that accounts for this scenario? ___________________________________________________________________________
What is the relative frequency of the final total of the 6’s rolled in Question 3? _____________________________________________________________________________________________
Someone hands you a standard deck of 52 cards. There are four queens. What is the likelihood that you will draw a queen from the deck the first try? _____________________________________________________________________________________________
Let’s say you earn $20 doing chores (lots of them), and you’re being paid in cash. Create a table of possible outcomes that shows the possible combinations of $10’s, $5’s and $1’s that would total your $20. (Look back at the table we did before to see how to set this problem up.)
You and a friend (who has no understanding of probability) are at the mall on a hot Tuesday afternoon in the middle of the summer. She offers to buy you a snow cone if you can guess the types of dollar bills she has in her wallet in fewer than three guesses. She tells you she has $20 and none of the bills are $1’s or a $20. If you don’t guess correctly by the third try, you have to give her a quarter. Are your odds of guessing correctly greater than ½?
Load the mosaic and mosaicData packages:
require(mosaic)
require(mosaicData)
Consider the “cesd”-variable in the HELPrct data set in the mosaicData package:
HELPrct$cesd
## [1] 49 30 39 15 39 6 52 32 50 46 46 49 22 36 43 35 19 40 52 37 35 18 36
## [24] 28 19 30 27 24 47 45 18 11 26 29 34 37 23 41 21 16 36 17 36 19 5 25
## [47] 36 27 44 29 46 16 44 42 30 25 26 29 33 28 33 44 29 57 26 31 30 43 28
## [70] 29 32 30 34 49 36 42 40 29 31 10 37 32 16 15 4 30 44 8 16 47 49 30
## [93] 36 48 17 39 30 24 25 51 17 37 45 28 17 23 39 38 53 26 47 49 34 51 33
## [116] 58 28 4 15 40 33 35 28 21 33 26 45 45 31 28 22 39 31 48 48 34 35 46
## [139] 34 10 31 34 26 15 48 37 20 38 39 46 17 6 18 29 51 39 31 49 43 45 46
## [162] 44 41 29 38 51 38 53 29 31 57 38 39 43 19 23 44 12 35 47 53 34 15 31
## [185] 27 36 24 54 31 22 41 23 18 60 34 26 40 40 1 41 38 37 16 33 4 24 34
## [208] 40 39 32 40 51 39 40 22 42 13 49 35 43 27 40 38 39 30 35 34 19 39 36
## [231] 58 38 22 46 31 11 32 33 39 33 27 43 30 12 42 31 40 17 44 15 41 51 24
## [254] 29 40 33 51 30 46 38 42 17 22 37 11 56 14 26 36 41 18 19 48 45 44 52
## [277] 19 9 55 18 45 12 33 32 20 37 39 43 51 27 40 8 54 35 58 50 55 19 37
## [300] 20 40 37 43 8 56 51 7 36 49 54 53 15 53 6 54 42 31 40 37 36 40 41
## [323] 39 38 38 9 36 27 26 52 24 16 34 46 24 25 40 33 31 37 28 27 6 21 29
## [346] 23 35 55 3 36 40 29 28 21 34 42 23 36 32 30 25 35 23 16 27 14 44 52
## [369] 48 11 41 41 37 31 34 40 37 30 42 51 42 15 12 39 10 33 57 17 20 49 23
## [392] 26 28 3 18 39 51 39 47 45 28 41 31 34 21 41 38 36 24 10 41 51 45 29
## [415] 56 34 4 32 38 26 27 21 30 7 35 23 36 15 48 31 54 21 21 29 23 33 14
## [438] 27 24 33 25 37 47 40 9 37 47 34 28 37 28 11 35
The “cesd”-score is the Center for Epidemiologic Studies Depression measure at baseline (high scores indicate more depressive symptoms).
Create an object called “depscore” in which we will save the set of depressive scores:
depscore<-HELPrct$cesd
depscore
## [1] 49 30 39 15 39 6 52 32 50 46 46 49 22 36 43 35 19 40 52 37 35 18 36
## [24] 28 19 30 27 24 47 45 18 11 26 29 34 37 23 41 21 16 36 17 36 19 5 25
## [47] 36 27 44 29 46 16 44 42 30 25 26 29 33 28 33 44 29 57 26 31 30 43 28
## [70] 29 32 30 34 49 36 42 40 29 31 10 37 32 16 15 4 30 44 8 16 47 49 30
## [93] 36 48 17 39 30 24 25 51 17 37 45 28 17 23 39 38 53 26 47 49 34 51 33
## [116] 58 28 4 15 40 33 35 28 21 33 26 45 45 31 28 22 39 31 48 48 34 35 46
## [139] 34 10 31 34 26 15 48 37 20 38 39 46 17 6 18 29 51 39 31 49 43 45 46
## [162] 44 41 29 38 51 38 53 29 31 57 38 39 43 19 23 44 12 35 47 53 34 15 31
## [185] 27 36 24 54 31 22 41 23 18 60 34 26 40 40 1 41 38 37 16 33 4 24 34
## [208] 40 39 32 40 51 39 40 22 42 13 49 35 43 27 40 38 39 30 35 34 19 39 36
## [231] 58 38 22 46 31 11 32 33 39 33 27 43 30 12 42 31 40 17 44 15 41 51 24
## [254] 29 40 33 51 30 46 38 42 17 22 37 11 56 14 26 36 41 18 19 48 45 44 52
## [277] 19 9 55 18 45 12 33 32 20 37 39 43 51 27 40 8 54 35 58 50 55 19 37
## [300] 20 40 37 43 8 56 51 7 36 49 54 53 15 53 6 54 42 31 40 37 36 40 41
## [323] 39 38 38 9 36 27 26 52 24 16 34 46 24 25 40 33 31 37 28 27 6 21 29
## [346] 23 35 55 3 36 40 29 28 21 34 42 23 36 32 30 25 35 23 16 27 14 44 52
## [369] 48 11 41 41 37 31 34 40 37 30 42 51 42 15 12 39 10 33 57 17 20 49 23
## [392] 26 28 3 18 39 51 39 47 45 28 41 31 34 21 41 38 36 24 10 41 51 45 29
## [415] 56 34 4 32 38 26 27 21 30 7 35 23 36 15 48 31 54 21 21 29 23 33 14
## [438] 27 24 33 25 37 47 40 9 37 47 34 28 37 28 11 35
Find the summary statistics for this variable:
summary(depscore)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.00 34.00 32.85 41.00 60.00
Create a boxplot and histogram for the scores:
boxplot(depscore)
hist(depscore)
Now consider the variable “sex”. We will count how many males and how many females are in the data set:
tally(~sex, data=HELPrct)
## sex
## female male
## 107 346
We will now filter all the data for females by using the “filter”-command:
female<-filter(HELPrct, sex=='female')
female
## age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b
## 1 39 1 yes 15 2 189 343 0 1
## 2 47 1 yes 6 1 31 365 0 NA
## 3 49 NA <NA> 52 14 NA 334 0 1
## 4 50 1 yes 50 14 31 365 18 7
## 5 34 NA <NA> 46 0 NA 365 8 NA
## 6 58 0 no 49 3 192 365 0 NA
## 7 28 1 yes 35 6 27 41 0 2
## 8 27 0 no 52 0 198 49 10 4
## 9 48 1 yes 19 4 67 365 0 NA
## 10 34 1 yes 5 2 23 14 0 NA
## 11 35 1 yes 46 3 17 365 0 NA
## 12 41 0 no 29 3 181 19 0 2
## 13 29 0 no 33 3 180 365 1 4
## 14 40 0 no 57 5 181 34 0 NA
## 15 26 NA <NA> 30 4 NA NA 0 NA
## 16 41 1 yes 43 0 2 NA 10 NA
## 17 32 1 yes 37 2 175 365 0 NA
## 18 33 NA <NA> 47 9 NA 38 0 3
## 19 40 NA <NA> 36 1 NA 217 0 1
## 20 35 NA <NA> 30 2 NA 16 0 NA
## 21 30 0 no 39 0 201 18 0 1
## 22 32 NA <NA> 53 15 NA 41 0 NA
## 23 42 0 no 26 10 183 358 0 2
## 24 30 NA <NA> 51 9 NA NA 9 1
## 25 35 NA <NA> 58 5 NA 17 0 2
## 26 30 1 yes 15 1 15 365 0 NA
## 27 50 0 no 35 6 178 49 0 NA
## 28 38 NA <NA> 26 4 NA 28 0 NA
## 29 24 1 yes 45 0 68 365 0 1
## 30 49 NA <NA> 28 13 NA 193 0 1
## 31 28 1 yes 48 4 12 413 0 NA
## 32 37 NA <NA> 35 1 NA 106 0 NA
## 33 31 1 yes 15 1 31 365 0 NA
## 34 30 1 yes 29 2 12 365 0 NA
## 35 57 1 yes 39 4 28 380 0 1
## 36 29 NA <NA> 46 6 NA 365 5 3
## 37 33 NA <NA> 44 4 NA 427 0 NA
## 38 28 1 yes 38 3 117 218 0 NA
## 39 31 NA <NA> 38 10 NA 405 20 1
## 40 36 NA <NA> 53 3 NA 45 0 3
## 41 38 NA <NA> 57 4 NA 370 0 NA
## 42 39 NA <NA> 43 1 NA 365 13 1
## 43 33 1 yes 19 40 3 146 0 1
## 44 38 1 yes 34 1 0 348 14 1
## 45 43 NA <NA> 36 1 NA 18 0 NA
## 46 33 1 yes 24 6 2 365 1 NA
## 47 29 NA <NA> 54 0 NA 407 4 NA
## 48 47 0 no 41 1 190 78 0 NA
## 49 31 NA <NA> 18 3 NA NA 8 1
## 50 40 NA <NA> 60 7 NA 406 0 NA
## 51 32 0 no 34 3 184 365 0 NA
## 52 38 0 no 38 3 247 365 0 1
## 53 32 1 yes 37 1 82 348 0 NA
## 54 35 NA <NA> 24 1 NA 365 0 NA
## 55 35 0 no 34 1 172 136 0 NA
## 56 45 1 yes 40 5 7 365 0 1
## 57 47 NA <NA> 39 2 NA 365 1 3
## 58 39 1 yes 42 4 215 428 0 NA
## 59 44 NA <NA> 13 0 NA 365 0 NA
## 60 55 1 yes 30 2 11 40 0 2
## 61 34 NA <NA> 19 1 NA 329 0 NA
## 62 34 NA <NA> 36 1 NA 326 0 NA
## 63 31 NA <NA> 22 0 NA 359 0 NA
## 64 27 1 yes 33 0 4 365 0 2
## 65 33 1 yes 51 1 5 365 1 6
## 66 30 NA <NA> 30 6 NA 83 0 NA
## 67 34 NA <NA> 38 2 NA 365 8 NA
## 68 37 0 no 37 2 179 41 0 NA
## 69 26 NA <NA> 56 2 NA 365 0 NA
## 70 45 1 yes 41 0 33 365 4 1
## 71 23 1 yes 48 1 2 365 0 2
## 72 35 1 yes 45 3 1 26 0 1
## 73 42 NA <NA> 52 3 NA 63 0 NA
## 74 32 1 yes 45 4 1 427 0 2
## 75 36 1 yes 39 1 136 324 0 2
## 76 22 1 yes 51 2 2 374 9 1
## 77 37 NA <NA> 58 8 NA 365 0 2
## 78 33 1 yes 19 0 64 33 0 NA
## 79 43 0 no 7 0 187 41 0 NA
## 80 47 1 yes 54 1 4 349 8 NA
## 81 48 1 yes 53 4 0 302 0 3
## 82 35 1 yes 54 1 5 365 13 NA
## 83 38 NA <NA> 42 4 NA 337 0 NA
## 84 35 0 no 36 0 178 361 0 NA
## 85 47 NA <NA> 52 8 NA 365 0 2
## 86 33 NA <NA> 40 4 NA 21 0 NA
## 87 26 1 yes 33 0 35 296 0 1
## 88 34 1 yes 29 0 12 356 0 NA
## 89 47 0 no 32 3 158 74 0 NA
## 90 39 0 no 52 2 268 449 0 NA
## 91 37 1 yes 41 10 1 393 0 NA
## 92 31 1 yes 42 1 15 365 0 NA
## 93 42 1 yes 42 5 33 98 0 NA
## 94 33 NA <NA> 15 0 NA 365 0 NA
## 95 38 NA <NA> 33 1 NA 286 1 NA
## 96 43 NA <NA> 23 4 NA 365 0 2
## 97 27 NA <NA> 3 0 NA 365 0 NA
## 98 21 NA <NA> 39 0 NA NA 6 NA
## 99 29 NA <NA> 47 2 NA 365 0 NA
## 100 45 NA <NA> 41 2 NA 365 0 1
## 101 24 NA <NA> 34 2 NA 365 14 8
## 102 35 NA <NA> 23 2 NA 28 0 NA
## 103 33 NA <NA> 21 8 NA NA 0 NA
## 104 36 NA <NA> 29 4 NA 365 0 NA
## 105 33 NA <NA> 40 2 NA 365 0 1
## 106 31 NA <NA> 47 1 NA 365 0 NA
## 107 39 NA <NA> 28 0 NA 365 1 NA
## female sex g1b homeless i1 i2 id indtot linkstatus link mcs
## 1 1 female no housed 5 5 4 28 0 no 43.967880
## 2 1 female no housed 4 4 6 29 0 no 55.508991
## 3 1 female yes housed 13 20 7 38 0 no 21.793024
## 4 1 female no homeless 71 129 9 44 0 no 22.029678
## 5 1 female no housed 0 0 11 34 0 no 43.974678
## 6 1 female no housed 13 13 12 11 0 no 13.382205
## 7 1 female yes homeless 0 0 17 26 1 yes 29.799828
## 8 1 female yes housed 9 24 20 37 1 yes 15.458271
## 9 1 female no housed 6 8 27 40 0 no 21.668474
## 10 1 female no housed 6 13 50 8 1 yes 59.454094
## 11 1 female no housed 13 20 57 32 0 no 24.000315
## 12 1 female yes housed 3 6 65 20 1 yes 33.374172
## 13 1 female yes homeless 0 0 66 29 0 no 27.575460
## 14 1 female yes homeless 59 164 71 43 1 yes 17.705963
## 15 1 female yes housed 12 18 74 37 NA <NA> 26.697262
## 16 1 female no housed 0 0 75 40 NA <NA> 15.447794
## 17 1 female yes housed 2 2 90 40 0 no 28.858498
## 18 1 female yes housed 64 64 100 44 1 yes 19.595461
## 19 1 female yes homeless 33 38 104 42 1 yes 27.993336
## 20 1 female no housed 9 15 108 33 1 yes 23.299021
## 21 1 female no housed 0 0 118 19 1 yes 24.747171
## 22 1 female yes homeless 34 34 120 33 1 yes 27.136280
## 23 1 female no homeless 39 95 121 31 0 no 41.321629
## 24 1 female yes housed 0 0 125 43 NA <NA> 19.156574
## 25 1 female yes housed 1 1 127 37 1 yes 18.465418
## 26 1 female no housed 26 26 131 25 0 no 37.438934
## 27 1 female no housed 13 13 134 28 1 yes 20.310446
## 28 1 female no housed 0 0 138 39 1 yes 22.787546
## 29 1 female no homeless 7 7 141 39 0 no 28.505577
## 30 1 female no homeless 15 15 143 36 1 yes 40.156929
## 31 1 female no housed 2 2 150 33 0 no 22.017500
## 32 1 female no homeless 1 3 153 25 1 yes 33.366123
## 33 1 female no housed 0 0 166 38 0 no 50.030434
## 34 1 female no homeless 29 29 179 31 0 no 52.197483
## 35 1 female no housed 12 12 181 36 0 no 36.651463
## 36 1 female no housed 0 0 187 39 0 no 20.119982
## 37 1 female yes homeless 59 59 188 38 0 no 25.257971
## 38 1 female yes housed 16 20 191 35 1 yes 18.324743
## 39 1 female yes homeless 26 33 193 44 0 no 22.442661
## 40 1 female yes homeless 50 50 194 41 1 yes 27.171751
## 41 1 female yes housed 13 32 200 39 0 no 20.356680
## 42 1 female yes housed 20 20 203 37 0 no 22.815102
## 43 1 female no homeless 19 26 204 32 1 yes 40.032974
## 44 1 female no homeless 0 0 213 32 0 no 43.353584
## 45 1 female yes housed 58 58 219 40 1 yes 36.100307
## 46 1 female yes housed 32 38 220 23 0 no 33.259956
## 47 1 female no housed 0 0 221 33 0 no 12.323594
## 48 1 female yes homeless 0 0 224 21 1 yes 37.953403
## 49 1 female yes housed 0 0 226 32 NA <NA> 27.641029
## 50 1 female yes homeless 38 38 228 43 0 no 16.786348
## 51 1 female no housed 13 13 229 31 0 no 54.768539
## 52 1 female yes housed 16 26 236 34 0 no 14.919310
## 53 1 female no housed 1 6 237 28 0 no 40.462433
## 54 1 female no housed 0 0 241 34 0 no 44.351089
## 55 1 female no homeless 4 4 242 36 1 yes 16.469986
## 56 1 female yes housed 10 14 247 34 0 no 26.311474
## 57 1 female no housed 42 48 249 33 0 no 27.471394
## 58 1 female yes housed 0 0 254 20 0 no 13.968738
## 59 1 female no housed 13 13 255 26 0 no 41.867615
## 60 1 female no housed 1 2 264 41 1 yes 23.547628
## 61 1 female no housed 4 4 269 27 0 no 34.048084
## 62 1 female no housed 1 1 272 38 0 no 32.384045
## 63 1 female no housed 10 20 275 23 0 no 47.442879
## 64 1 female no homeless 8 8 284 38 0 no 31.781149
## 65 1 female yes housed 8 13 304 28 0 no 20.911337
## 66 1 female yes homeless 27 33 306 25 1 yes 44.446507
## 67 1 female no housed 0 0 308 33 0 no 21.543468
## 68 1 female no homeless 1 1 313 33 1 yes 27.601431
## 69 1 female no housed 1 1 316 36 0 no 14.415197
## 70 1 female no housed 2 2 320 22 0 no 34.747746
## 71 1 female yes homeless 29 58 324 27 0 no 16.718819
## 72 1 female no housed 0 0 325 32 1 yes 20.220354
## 73 1 female yes homeless 0 0 327 32 1 yes 28.447634
## 74 1 female yes homeless 67 67 333 40 0 no 17.926985
## 75 1 female yes homeless 53 53 339 36 0 no 22.237560
## 76 1 female no housed 0 0 342 40 0 no 7.035307
## 77 1 female yes homeless 67 80 351 41 0 no 16.922634
## 78 1 female no homeless 6 6 354 22 1 yes 24.923189
## 79 1 female no homeless 26 26 364 15 1 yes 60.542084
## 80 1 female yes housed 13 13 367 35 0 no 13.852996
## 81 1 female yes homeless 0 0 370 32 0 no 19.808329
## 82 1 female no housed 0 0 372 44 0 no 9.406377
## 83 1 female yes housed 3 3 374 40 0 no 27.495565
## 84 1 female no homeless 58 58 379 13 0 no 44.767254
## 85 1 female no housed 6 6 391 34 0 no 7.226597
## 86 1 female no housed 13 26 402 38 1 yes 19.819555
## 87 1 female no housed 0 0 403 41 0 no 29.213017
## 88 1 female no housed 0 0 421 37 0 no 31.077631
## 89 1 female no housed 21 21 431 13 1 yes 51.922516
## 90 1 female no housed 0 0 442 37 0 no 24.930353
## 91 1 female no homeless 24 51 445 44 0 no 25.710777
## 92 1 female yes homeless 6 13 461 34 0 no 16.863588
## 93 1 female yes housed 26 41 465 35 1 yes 30.701563
## 94 1 female no housed 0 0 466 6 0 no 41.624706
## 95 1 female yes housed 3 16 470 33 0 no 22.337873
## 96 1 female no homeless 19 19 55 31 0 no 27.717655
## 97 1 female no housed 1 1 139 21 0 no 57.834595
## 98 1 female yes housed 0 0 155 35 NA <NA> 47.773228
## 99 1 female no homeless 11 14 157 35 0 no 9.732559
## 100 1 female no homeless 19 26 162 25 0 no 55.479382
## 101 1 female no housed 13 26 171 38 0 no 28.590870
## 102 1 female no housed 4 4 303 20 1 yes 45.425110
## 103 1 female no homeless 26 26 345 28 NA <NA> 18.594315
## 104 1 female no housed 7 8 349 27 0 no 25.676130
## 105 1 female yes homeless 26 32 427 37 0 no 34.152245
## 106 1 female yes homeless 56 61 451 41 0 no 17.050970
## 107 1 female no homeless 1 24 460 28 0 no 33.434536
## pcs pss_fr racegrp satreat sexrisk substance treat avg_drinks
## 1 61.93168 11 white yes 4 heroin no 5
## 2 46.47521 5 black no 5 cocaine yes 4
## 3 24.51504 1 black yes 8 cocaine no 13
## 4 38.27088 5 white no 8 alcohol no 71
## 5 60.07915 0 white no 2 heroin yes 0
## 6 41.93376 13 black yes 0 alcohol no 13
## 7 44.77651 7 hispanic yes 3 heroin yes 0
## 8 37.45214 13 white no 3 heroin yes 9
## 9 36.01007 6 black no 7 cocaine no 6
## 10 52.69898 12 black no 4 cocaine yes 6
## 11 46.75086 1 black no 7 cocaine yes 13
## 12 55.23372 13 white yes 4 alcohol yes 3
## 13 35.12470 4 hispanic yes 4 heroin no 0
## 14 36.04016 1 black no 4 alcohol yes 59
## 15 54.38272 6 white no 9 cocaine no 12
## 16 55.32189 14 white no 3 heroin no 0
## 17 43.94296 11 black no 3 cocaine no 2
## 18 40.48884 1 other no 7 alcohol yes 64
## 19 44.53589 7 white yes 3 alcohol no 33
## 20 51.81045 12 black yes 5 alcohol yes 9
## 21 54.10854 14 hispanic no 4 cocaine yes 0
## 22 54.79462 7 black no 5 alcohol yes 34
## 23 36.68874 4 black no 10 cocaine no 39
## 24 34.33698 10 white no 6 heroin no 0
## 25 39.33260 13 black yes 6 cocaine yes 1
## 26 49.29042 11 black yes 3 cocaine yes 26
## 27 33.48925 2 white no 0 alcohol no 13
## 28 28.74085 9 other no 7 cocaine yes 0
## 29 37.79718 7 black yes 7 cocaine yes 7
## 30 40.96234 7 hispanic yes 9 alcohol no 15
## 31 40.24271 1 white no 5 cocaine yes 2
## 32 45.16520 8 black no 9 cocaine yes 1
## 33 57.38777 9 black yes 2 cocaine no 0
## 34 55.73845 13 black yes 7 cocaine yes 29
## 35 30.50811 6 white yes 0 alcohol no 12
## 36 32.96189 3 white no 4 heroin yes 0
## 37 42.12069 7 hispanic no 5 alcohol no 59
## 38 43.24062 14 black no 11 cocaine no 16
## 39 35.90619 8 white no 11 alcohol no 26
## 40 37.75567 3 white no 9 alcohol yes 50
## 41 35.97361 0 black no 14 cocaine no 13
## 42 35.22702 10 white no 4 heroin yes 20
## 43 38.10227 2 black yes 7 cocaine no 19
## 44 21.91906 9 black no 8 heroin no 0
## 45 37.03778 11 black yes 2 alcohol yes 58
## 46 41.66993 8 other no 3 heroin no 32
## 47 48.21926 11 white no 6 heroin no 0
## 48 57.64361 11 black no 0 cocaine no 0
## 49 48.37090 12 white no 4 heroin no 0
## 50 38.51597 3 white yes 11 cocaine yes 38
## 51 23.48208 12 black yes 0 cocaine no 13
## 52 57.83691 3 white no 5 alcohol yes 16
## 53 56.90286 3 black yes 4 cocaine yes 1
## 54 46.79942 4 black no 2 cocaine no 0
## 55 58.49455 2 black no 8 cocaine no 4
## 56 43.25021 8 white no 5 alcohol no 10
## 57 52.42204 10 black no 5 heroin no 42
## 58 48.97176 11 black no 4 cocaine yes 0
## 59 46.36879 7 hispanic no 4 heroin no 13
## 60 37.35865 7 black yes 2 heroin yes 1
## 61 57.24648 12 black no 2 cocaine no 4
## 62 44.85584 10 black no 4 cocaine no 1
## 63 52.85658 11 black no 7 alcohol yes 10
## 64 51.49556 7 black yes 8 cocaine yes 8
## 65 33.07642 6 hispanic yes 4 heroin yes 8
## 66 45.79400 12 black no 4 alcohol yes 27
## 67 52.35651 10 white no 4 heroin no 0
## 68 37.83872 11 black no 6 cocaine no 1
## 69 46.74971 2 black no 11 heroin yes 1
## 70 64.35030 3 white no 1 heroin yes 2
## 71 35.70664 3 black no 11 alcohol yes 29
## 72 32.44772 2 black no 9 alcohol yes 0
## 73 39.93384 2 other no 0 heroin yes 0
## 74 39.09279 7 black no 6 alcohol no 67
## 75 36.52407 3 black yes 5 alcohol no 53
## 76 52.51404 8 other no 7 heroin yes 0
## 77 34.09209 0 other no 2 alcohol no 67
## 78 63.77832 8 black no 4 cocaine yes 6
## 79 55.44015 13 white no 1 heroin yes 26
## 80 31.11147 9 black no 0 cocaine yes 13
## 81 27.09086 13 white yes 3 alcohol no 0
## 82 41.95401 13 white no 4 heroin no 0
## 83 51.27790 3 black no 9 cocaine no 3
## 84 53.42212 14 black no 4 cocaine no 58
## 85 47.60948 9 white no 4 alcohol yes 6
## 86 32.99675 0 black no 4 alcohol yes 13
## 87 56.69189 3 black yes 3 heroin no 0
## 88 64.91865 14 black no 12 cocaine yes 0
## 89 54.52398 12 hispanic no 0 alcohol no 21
## 90 33.53111 7 black no 2 heroin yes 0
## 91 49.18084 9 other no 9 alcohol no 24
## 92 46.69877 0 black no 10 cocaine yes 6
## 93 38.40187 5 white no 6 alcohol yes 26
## 94 62.08943 11 black yes 6 cocaine yes 0
## 95 42.31495 8 black no 1 heroin no 3
## 96 41.10135 3 black no 6 alcohol no 19
## 97 58.21511 4 black yes 1 cocaine no 1
## 98 41.09781 14 white no 1 heroin no 0
## 99 69.17161 4 black no 7 cocaine no 11
## 100 54.09069 4 white no 4 alcohol no 19
## 101 57.76270 9 white yes 14 heroin yes 13
## 102 58.75759 1 black no 2 cocaine yes 4
## 103 38.86502 3 white no 4 alcohol no 26
## 104 54.98139 13 white no 4 alcohol yes 7
## 105 45.27036 2 hispanic no 3 alcohol yes 26
## 106 34.51623 8 hispanic yes 14 alcohol no 56
## 107 40.04572 1 white no 2 heroin no 1
## max_drinks
## 1 5
## 2 4
## 3 20
## 4 129
## 5 0
## 6 13
## 7 0
## 8 24
## 9 8
## 10 13
## 11 20
## 12 6
## 13 0
## 14 164
## 15 18
## 16 0
## 17 2
## 18 64
## 19 38
## 20 15
## 21 0
## 22 34
## 23 95
## 24 0
## 25 1
## 26 26
## 27 13
## 28 0
## 29 7
## 30 15
## 31 2
## 32 3
## 33 0
## 34 29
## 35 12
## 36 0
## 37 59
## 38 20
## 39 33
## 40 50
## 41 32
## 42 20
## 43 26
## 44 0
## 45 58
## 46 38
## 47 0
## 48 0
## 49 0
## 50 38
## 51 13
## 52 26
## 53 6
## 54 0
## 55 4
## 56 14
## 57 48
## 58 0
## 59 13
## 60 2
## 61 4
## 62 1
## 63 20
## 64 8
## 65 13
## 66 33
## 67 0
## 68 1
## 69 1
## 70 2
## 71 58
## 72 0
## 73 0
## 74 67
## 75 53
## 76 0
## 77 80
## 78 6
## 79 26
## 80 13
## 81 0
## 82 0
## 83 3
## 84 58
## 85 6
## 86 26
## 87 0
## 88 0
## 89 21
## 90 0
## 91 51
## 92 13
## 93 41
## 94 0
## 95 16
## 96 19
## 97 1
## 98 0
## 99 14
## 100 26
## 101 26
## 102 4
## 103 26
## 104 8
## 105 32
## 106 61
## 107 24
We will do the same for males and create a new dataframe called “males” to save the data in:
males<-filter(HELPrct, sex=='male')
males
## age anysubstatus anysub cesd d1 daysanysub dayslink drugrisk e2b
## 1 37 1 yes 49 3 177 225 0 NA
## 2 37 1 yes 30 22 2 NA 0 NA
## 3 26 1 yes 39 0 3 365 20 NA
## 4 32 1 yes 39 12 2 57 0 1
## 5 28 1 yes 32 1 47 365 7 8
## 6 39 1 yes 46 4 115 382 20 3
## 7 58 1 yes 22 5 6 365 0 NA
## 8 60 1 yes 36 10 6 22 0 1
## 9 36 1 yes 43 2 0 443 0 NA
## 10 35 1 yes 19 1 2 405 0 NA
## 11 29 0 no 40 2 220 449 0 1
## 12 27 1 yes 37 1 52 367 0 NA
## 13 41 NA <NA> 35 1 NA 391 12 1
## 14 33 1 yes 18 1 129 272 0 NA
## 15 34 NA <NA> 36 4 NA 293 0 2
## 16 31 1 yes 28 2 3 428 0 3
## 17 34 1 yes 30 1 154 56 0 NA
## 18 35 1 yes 27 0 34 361 1 NA
## 19 34 0 no 24 0 204 365 0 NA
## 20 29 1 yes 47 1 142 79 0 3
## 21 35 0 no 45 2 189 364 0 NA
## 22 43 1 yes 18 10 4 365 0 NA
## 23 37 0 no 11 0 203 203 3 NA
## 24 29 0 no 26 1 193 354 0 NA
## 25 33 1 yes 29 1 10 29 0 NA
## 26 20 1 yes 34 1 177 365 0 NA
## 27 38 0 no 37 2 195 365 0 3
## 28 28 1 yes 23 0 7 365 1 2
## 29 33 1 yes 41 7 14 365 0 3
## 30 40 NA <NA> 21 0 NA 365 1 NA
## 31 43 0 no 16 15 191 414 0 NA
## 32 28 1 yes 36 1 31 414 0 NA
## 33 45 0 no 17 2 174 43 0 2
## 34 42 1 yes 36 2 17 38 7 NA
## 35 30 NA <NA> 19 0 NA 264 0 NA
## 36 36 1 yes 25 2 2 377 0 NA
## 37 44 NA <NA> 36 5 NA 321 19 1
## 38 41 1 yes 27 0 30 NA 0 NA
## 39 30 0 no 44 2 209 26 21 2
## 40 37 1 yes 29 2 111 18 0 NA
## 41 37 1 yes 16 5 137 171 0 NA
## 42 44 1 yes 44 1 4 27 0 NA
## 43 47 1 yes 42 2 3 190 0 4
## 44 38 1 yes 30 5 18 30 0 2
## 45 37 1 yes 25 0 2 365 1 NA
## 46 34 1 yes 26 1 1 365 0 11
## 47 35 1 yes 28 1 36 400 0 1
## 48 36 NA <NA> 33 0 NA 365 0 1
## 49 27 0 no 44 3 252 431 0 1
## 50 36 0 no 29 1 195 195 0 1
## 51 38 NA <NA> 26 4 NA 133 1 NA
## 52 42 1 yes 31 2 103 48 8 3
## 53 43 1 yes 28 10 78 365 0 NA
## 54 28 1 yes 29 3 9 129 0 2
## 55 30 1 yes 32 2 53 NA 3 NA
## 56 42 NA <NA> 30 4 NA 35 0 NA
## 57 22 1 yes 34 7 4 365 0 1
## 58 31 NA <NA> 49 2 NA 439 3 1
## 59 30 0 no 36 0 177 44 0 3
## 60 25 NA <NA> 42 1 NA 365 1 1
## 61 26 1 yes 40 1 4 77 10 NA
## 62 35 1 yes 29 1 47 35 0 1
## 63 53 1 yes 31 3 5 365 0 1
## 64 29 NA <NA> 10 2 NA 143 0 NA
## 65 24 1 yes 32 2 168 115 3 1
## 66 35 1 yes 16 1 20 386 1 3
## 67 32 1 yes 15 0 55 365 0 NA
## 68 47 1 yes 4 2 56 63 1 NA
## 69 26 NA <NA> 30 2 NA 365 0 NA
## 70 45 1 yes 44 2 63 35 14 1
## 71 33 NA <NA> 8 1 NA NA 0 NA
## 72 45 NA <NA> 16 20 NA 365 0 2
## 73 27 1 yes 49 1 222 136 0 NA
## 74 40 1 yes 30 2 9 37 1 NA
## 75 37 1 yes 48 3 16 349 0 NA
## 76 26 1 yes 17 1 59 NA 0 NA
## 77 27 1 yes 39 0 102 365 0 3
## 78 29 NA <NA> 24 0 NA NA 10 2
## 79 33 NA <NA> 25 2 NA 60 0 NA
## 80 39 1 yes 51 3 2 365 0 5
## 81 33 1 yes 17 3 3 365 7 NA
## 82 35 1 yes 37 20 63 399 0 NA
## 83 38 NA <NA> 45 0 NA NA 0 1
## 84 44 1 yes 28 1 47 112 17 1
## 85 28 NA <NA> 17 3 NA 365 0 NA
## 86 33 NA <NA> 23 0 NA NA 0 NA
## 87 35 1 yes 38 2 114 365 0 4
## 88 37 0 no 47 0 183 169 0 NA
## 89 41 NA <NA> 49 4 NA 365 0 1
## 90 28 1 yes 34 5 0 325 17 2
## 91 35 1 yes 33 2 2 345 0 14
## 92 41 1 yes 28 1 17 104 0 NA
## 93 37 0 no 4 2 183 36 0 NA
## 94 39 1 yes 40 3 11 365 0 4
## 95 32 NA <NA> 33 2 NA NA 0 NA
## 96 33 NA <NA> 28 1 NA 90 0 2
## 97 27 1 yes 21 0 163 169 0 NA
## 98 33 1 yes 33 0 7 399 1 NA
## 99 43 1 yes 45 6 4 358 0 8
## 100 35 1 yes 31 10 185 387 0 1
## 101 49 1 yes 22 5 1 126 0 4
## 102 33 NA <NA> 39 1 NA 365 1 1
## 103 24 0 no 31 0 183 52 9 1
## 104 45 0 no 48 2 185 50 0 7
## 105 46 NA <NA> 34 20 NA NA 0 NA
## 106 32 0 no 46 2 183 42 0 NA
## 107 45 NA <NA> 34 1 NA 303 11 2
## 108 39 0 no 10 0 186 30 0 1
## 109 34 1 yes 31 1 146 113 0 NA
## 110 32 NA <NA> 34 2 NA 365 0 3
## 111 32 1 yes 26 2 5 369 0 1
## 112 45 NA <NA> 48 1 NA 98 0 2
## 113 30 NA <NA> 37 1 NA 338 0 NA
## 114 36 1 yes 20 8 57 365 7 1
## 115 25 1 yes 38 3 0 414 8 1
## 116 48 0 no 39 8 178 58 0 NA
## 117 42 0 no 46 1 256 368 0 1
## 118 33 1 yes 17 1 61 364 0 1
## 119 36 NA <NA> 6 1 NA 365 1 NA
## 120 41 NA <NA> 18 4 NA 365 0 NA
## 121 57 NA <NA> 51 10 NA 365 0 NA
## 122 47 NA <NA> 31 2 NA 365 5 NA
## 123 54 1 yes 49 0 0 38 0 4
## 124 55 0 no 43 1 164 31 0 NA
## 125 33 1 yes 45 1 13 330 10 1
## 126 28 NA <NA> 41 3 NA 443 11 2
## 127 37 0 no 29 2 163 29 0 NA
## 128 32 NA <NA> 51 1 NA 365 0 NA
## 129 39 NA <NA> 29 2 NA 14 0 2
## 130 29 NA <NA> 31 1 NA 424 13 1
## 131 33 NA <NA> 38 0 NA NA 0 2
## 132 31 NA <NA> 39 10 NA 17 2 NA
## 133 31 1 yes 23 0 9 15 0 NA
## 134 46 1 yes 44 1 144 14 0 6
## 135 36 1 yes 12 1 11 140 0 NA
## 136 22 1 yes 35 0 1 365 0 4
## 137 33 1 yes 47 2 27 365 0 2
## 138 35 NA <NA> 53 2 NA 365 14 2
## 139 28 NA <NA> 15 1 NA 48 0 NA
## 140 33 NA <NA> 31 2 NA 32 0 2
## 141 49 1 yes 27 2 61 365 0 NA
## 142 34 0 no 31 2 183 30 0 NA
## 143 41 NA <NA> 22 4 NA 365 0 NA
## 144 24 NA <NA> 23 0 NA 365 0 NA
## 145 32 0 no 26 4 192 22 0 3
## 146 39 NA <NA> 40 1 NA 365 0 1
## 147 19 NA <NA> 40 1 NA 63 0 8
## 148 49 1 yes 1 2 166 78 0 NA
## 149 27 NA <NA> 41 4 NA 365 1 4
## 150 22 0 no 16 1 162 357 0 NA
## 151 36 1 yes 33 3 47 12 0 NA
## 152 32 1 yes 4 0 88 50 0 NA
## 153 41 1 yes 40 2 63 22 0 NA
## 154 36 1 yes 39 2 94 7 0 NA
## 155 43 1 yes 32 2 73 70 0 NA
## 156 39 1 yes 51 4 33 331 0 NA
## 157 32 1 yes 40 6 183 76 0 NA
## 158 33 1 yes 22 0 9 183 0 NA
## 159 35 NA <NA> 49 4 NA 43 0 1
## 160 31 1 yes 35 1 32 307 1 3
## 161 25 NA <NA> 43 0 NA 365 0 NA
## 162 48 1 yes 27 1 74 353 0 6
## 163 35 NA <NA> 40 1 NA 37 0 NA
## 164 42 NA <NA> 38 4 NA 349 0 2
## 165 51 1 yes 39 6 4 272 0 4
## 166 32 1 yes 35 6 70 37 0 NA
## 167 41 1 yes 34 2 2 365 0 3
## 168 30 NA <NA> 39 2 NA 442 0 NA
## 169 38 NA <NA> 58 8 NA 452 0 1
## 170 41 NA <NA> 38 2 NA 24 2 NA
## 171 29 NA <NA> 46 2 NA 336 0 3
## 172 36 NA <NA> 31 10 NA 365 0 1
## 173 45 NA <NA> 11 0 NA 379 0 NA
## 174 36 NA <NA> 32 2 NA 434 10 NA
## 175 30 1 yes 33 1 59 12 0 NA
## 176 40 1 yes 39 1 16 294 0 NA
## 177 39 NA <NA> 27 1 NA 21 0 NA
## 178 39 0 no 43 4 170 350 0 2
## 179 37 1 yes 30 1 2 440 0 5
## 180 43 1 yes 12 4 11 236 0 4
## 181 20 1 yes 42 1 20 365 0 NA
## 182 35 1 yes 31 2 32 35 5 17
## 183 32 NA <NA> 40 6 NA 29 11 2
## 184 42 0 no 17 0 188 456 0 NA
## 185 27 NA <NA> 44 0 NA 279 0 NA
## 186 30 NA <NA> 15 2 NA 365 0 NA
## 187 27 NA <NA> 41 0 NA 365 8 3
## 188 41 NA <NA> 51 3 NA 349 0 NA
## 189 32 1 yes 24 20 7 46 6 1
## 190 47 1 yes 29 1 31 368 0 1
## 191 36 NA <NA> 40 2 NA 365 0 2
## 192 32 1 yes 33 2 2 365 0 1
## 193 29 NA <NA> 46 0 NA 79 8 NA
## 194 34 1 yes 42 0 52 365 1 2
## 195 40 NA <NA> 17 2 NA 365 0 2
## 196 45 NA <NA> 22 3 NA 365 7 21
## 197 32 NA <NA> 11 2 NA 17 0 NA
## 198 31 1 yes 14 0 2 365 0 1
## 199 39 1 yes 26 0 94 425 0 NA
## 200 49 1 yes 36 1 94 365 0 NA
## 201 43 NA <NA> 18 0 NA 365 10 NA
## 202 38 NA <NA> 19 1 NA 365 0 NA
## 203 23 1 yes 44 20 45 207 0 NA
## 204 29 NA <NA> 19 1 NA 318 0 NA
## 205 43 1 yes 9 2 0 365 0 2
## 206 29 NA <NA> 55 0 NA 365 0 NA
## 207 39 1 yes 18 0 16 358 0 2
## 208 35 NA <NA> 12 1 NA 441 0 2
## 209 22 1 yes 33 2 3 30 0 NA
## 210 39 1 yes 32 1 132 41 0 NA
## 211 38 1 yes 20 1 NA 285 0 2
## 212 56 1 yes 37 36 0 412 3 11
## 213 40 NA <NA> 43 1 NA 15 17 2
## 214 39 NA <NA> 27 5 NA 293 8 4
## 215 47 1 yes 40 2 3 365 0 NA
## 216 32 1 yes 8 3 30 373 0 1
## 217 41 1 yes 54 3 1 356 4 NA
## 218 32 0 no 35 1 191 21 0 NA
## 219 41 0 no 50 2 174 17 1 1
## 220 31 1 yes 55 5 65 365 0 1
## 221 30 1 yes 37 6 8 303 16 1
## 222 32 1 yes 20 1 93 449 0 1
## 223 35 NA <NA> 40 1 NA 77 0 NA
## 224 32 NA <NA> 37 1 NA 35 0 3
## 225 33 NA <NA> 43 0 NA 365 0 2
## 226 30 1 yes 8 8 5 32 1 NA
## 227 44 NA <NA> 56 3 NA 365 0 2
## 228 46 1 yes 51 0 62 365 0 2
## 229 47 NA <NA> 36 4 NA 365 13 5
## 230 34 1 yes 49 0 93 32 0 NA
## 231 40 1 yes 53 2 1 393 0 7
## 232 34 1 yes 15 15 5 NA 0 NA
## 233 37 1 yes 6 5 1 364 1 NA
## 234 27 NA <NA> 31 1 NA 31 0 1
## 235 39 0 no 40 0 178 9 4 NA
## 236 23 1 yes 37 1 0 359 20 4
## 237 53 0 no 40 2 175 80 19 2
## 238 31 NA <NA> 41 1 NA 365 0 NA
## 239 32 1 yes 39 0 15 14 0 1
## 240 33 0 no 38 1 219 398 0 1
## 241 25 1 yes 38 0 1 40 1 1
## 242 37 NA <NA> 9 1 NA 40 0 NA
## 243 26 1 yes 36 0 18 74 0 NA
## 244 29 NA <NA> 27 0 NA 308 5 2
## 245 30 0 no 26 1 215 7 0 NA
## 246 33 NA <NA> 24 1 NA 300 0 NA
## 247 36 1 yes 16 1 125 361 0 1
## 248 23 NA <NA> 34 3 NA 393 0 1
## 249 36 1 yes 46 8 5 9 0 5
## 250 34 1 yes 24 1 2 350 2 1
## 251 28 1 yes 25 2 1 365 0 2
## 252 30 1 yes 31 0 15 6 0 NA
## 253 41 NA <NA> 37 1 NA 19 0 NA
## 254 31 NA <NA> 28 1 NA 123 1 4
## 255 28 NA <NA> 27 0 NA 44 0 NA
## 256 59 NA <NA> 6 2 NA 365 0 NA
## 257 39 1 yes 21 0 31 363 0 NA
## 258 36 NA <NA> 29 0 NA 33 0 NA
## 259 47 1 yes 23 1 32 152 0 NA
## 260 26 NA <NA> 35 0 NA 365 0 NA
## 261 22 1 yes 55 0 10 338 11 2
## 262 36 NA <NA> 3 0 NA 365 0 NA
## 263 34 NA <NA> 36 1 NA 365 2 6
## 264 27 NA <NA> 40 1 NA 365 3 2
## 265 21 NA <NA> 28 3 NA 331 0 1
## 266 33 NA <NA> 21 0 NA 309 0 NA
## 267 42 1 yes 34 5 3 289 11 1
## 268 46 NA <NA> 42 2 NA 306 0 2
## 269 26 1 yes 23 4 106 410 0 NA
## 270 36 1 yes 36 3 3 362 0 NA
## 271 48 0 no 30 2 191 16 0 1
## 272 32 NA <NA> 25 5 NA 340 10 NA
## 273 38 NA <NA> 35 7 NA 365 0 1
## 274 43 1 yes 23 2 61 11 0 2
## 275 30 1 yes 16 0 30 365 0 NA
## 276 40 0 no 27 1 176 41 0 NA
## 277 38 NA <NA> 14 0 NA 292 1 NA
## 278 22 0 no 44 1 260 376 NA 5
## 279 22 NA <NA> 48 2 NA 8 2 3
## 280 37 0 no 11 1 210 370 0 2
## 281 44 1 yes 41 3 0 365 0 1
## 282 38 0 no 37 1 165 166 1 NA
## 283 37 1 yes 31 2 2 89 0 3
## 284 43 1 yes 34 4 2 418 5 NA
## 285 39 1 yes 40 8 0 247 0 3
## 286 45 1 yes 37 2 2 322 3 NA
## 287 39 0 no 30 8 154 265 0 NA
## 288 32 1 yes 51 0 5 NA 6 3
## 289 47 1 yes 12 1 NA 345 0 NA
## 290 24 1 yes 39 2 32 365 0 3
## 291 27 1 yes 10 1 2 20 0 NA
## 292 53 NA <NA> 57 4 NA 365 0 NA
## 293 39 NA <NA> 17 1 NA 34 0 4
## 294 32 NA <NA> 20 4 NA 365 1 NA
## 295 27 NA <NA> 49 2 NA 365 0 1
## 296 31 NA <NA> 26 1 NA 365 0 NA
## 297 41 NA <NA> 28 3 NA 365 0 NA
## 298 28 NA <NA> 18 17 NA 85 0 NA
## 299 39 NA <NA> 39 8 NA 365 0 NA
## 300 39 NA <NA> 51 0 NA 365 12 3
## 301 31 NA <NA> 45 5 NA 365 5 NA
## 302 29 NA <NA> 28 2 NA 118 2 1
## 303 25 NA <NA> 31 7 NA 68 0 NA
## 304 41 NA <NA> 21 5 NA 365 0 NA
## 305 27 NA <NA> 41 3 NA 365 0 1
## 306 21 NA <NA> 38 1 NA 44 14 4
## 307 27 NA <NA> 36 5 NA NA 0 NA
## 308 31 NA <NA> 24 1 NA 365 0 1
## 309 41 NA <NA> 10 0 NA 365 0 NA
## 310 33 NA <NA> 41 1 NA 365 0 NA
## 311 49 NA <NA> 51 1 NA 365 8 3
## 312 41 NA <NA> 45 4 NA 365 1 1
## 313 25 NA <NA> 29 0 NA 44 6 NA
## 314 41 NA <NA> 56 4 NA 10 0 NA
## 315 34 NA <NA> 34 1 NA 87 0 2
## 316 29 NA <NA> 4 0 NA 365 0 NA
## 317 28 NA <NA> 32 0 NA 365 0 NA
## 318 29 NA <NA> 38 2 NA NA 9 1
## 319 36 NA <NA> 26 0 NA 115 0 5
## 320 36 NA <NA> 27 0 NA 365 0 NA
## 321 24 NA <NA> 21 4 NA 365 0 NA
## 322 38 NA <NA> 30 2 NA 6 0 2
## 323 31 NA <NA> 7 1 NA 365 0 NA
## 324 26 NA <NA> 35 1 NA 365 0 NA
## 325 26 NA <NA> 36 4 NA 365 1 NA
## 326 33 NA <NA> 15 0 NA 365 0 NA
## 327 46 NA <NA> 48 100 NA 365 0 NA
## 328 33 NA <NA> 31 0 NA 365 0 1
## 329 39 NA <NA> 54 6 NA 64 0 NA
## 330 27 NA <NA> 21 1 NA 365 9 1
## 331 23 NA <NA> 23 0 NA 365 5 2
## 332 33 NA <NA> 33 2 NA 365 11 1
## 333 26 NA <NA> 14 0 NA 365 0 NA
## 334 38 NA <NA> 27 10 NA 365 0 NA
## 335 52 NA <NA> 24 1 NA 365 0 1
## 336 39 NA <NA> 33 2 NA 365 3 1
## 337 36 NA <NA> 25 1 NA 2 1 NA
## 338 44 NA <NA> 37 0 NA NA 0 2
## 339 37 NA <NA> 47 2 NA 4 21 NA
## 340 31 NA <NA> 9 1 NA 365 0 NA
## 341 25 NA <NA> 37 3 NA 365 0 3
## 342 24 NA <NA> 34 0 NA 365 13 2
## 343 33 NA <NA> 28 1 NA 365 0 1
## 344 49 NA <NA> 37 0 NA 7 0 NA
## 345 59 NA <NA> 11 2 NA 365 0 1
## 346 45 NA <NA> 35 1 NA 365 0 1
## female sex g1b homeless i1 i2 id indtot linkstatus link mcs
## 1 0 male yes housed 13 26 1 39 1 yes 25.111990
## 2 0 male yes homeless 56 62 2 43 NA <NA> 26.670307
## 3 0 male no housed 0 0 3 41 0 no 6.762923
## 4 0 male no homeless 10 13 5 38 1 yes 21.675755
## 5 0 male yes homeless 12 24 8 44 0 no 9.160530
## 6 0 male no homeless 20 27 10 44 0 no 36.143761
## 7 0 male no homeless 20 31 14 40 0 no 49.089302
## 8 0 male no homeless 13 20 15 41 1 yes 25.846157
## 9 0 male no housed 51 51 16 38 0 no 23.608444
## 10 0 male no housed 0 0 18 17 0 no 42.166462
## 11 0 male yes homeless 1 1 19 40 0 no 16.732292
## 12 0 male no housed 23 23 21 37 0 no 55.128109
## 13 0 male no housed 26 26 22 36 0 no 20.871447
## 14 0 male no housed 0 0 23 27 1 yes 47.286739
## 15 0 male yes homeless 34 34 24 42 0 no 19.620596
## 16 0 male no homeless 4 5 25 42 0 no 44.442104
## 17 0 male no housed 3 3 28 34 1 yes 37.371555
## 18 0 male no homeless 7 7 30 37 0 no 34.335667
## 19 0 male yes housed 24 48 31 41 0 no 46.340755
## 20 0 male no homeless 0 0 32 37 1 yes 27.717710
## 21 0 male no homeless 20 20 33 44 0 no 18.984324
## 22 0 male no homeless 3 3 34 41 0 no 58.241264
## 23 0 male no homeless 6 6 35 35 1 yes 27.852608
## 24 0 male no housed 0 0 36 21 0 no 54.774349
## 25 0 male no housed 0 0 37 30 1 yes 27.495481
## 26 0 male no homeless 32 135 38 33 0 no 56.324333
## 27 0 male no housed 2 24 39 43 0 no 37.006042
## 28 0 male no housed 3 3 40 41 0 no 39.897774
## 29 0 male yes homeless 27 27 42 41 0 no 18.640594
## 30 0 male no housed 3 7 43 32 0 no 45.134098
## 31 0 male no homeless 24 36 44 41 0 no 15.861924
## 32 0 male no homeless 6 12 45 39 0 no 24.148815
## 33 0 male no homeless 0 0 46 22 1 yes 29.901625
## 34 0 male no housed 13 13 47 39 1 yes 29.412977
## 35 0 male no homeless 25 28 49 38 1 yes 35.206970
## 36 0 male no housed 13 61 51 36 0 no 20.999893
## 37 0 male yes homeless 15 26 52 42 0 no 29.390280
## 38 0 male yes housed 7 7 53 31 NA <NA> 26.773279
## 39 0 male yes homeless 9 15 54 44 1 yes 17.925251
## 40 0 male no homeless 5 13 56 40 1 yes 34.434696
## 41 0 male yes housed 34 34 58 29 1 yes 47.671936
## 42 0 male yes housed 3 6 59 44 1 yes 26.653036
## 43 0 male yes homeless 37 43 60 43 1 yes 28.469273
## 44 0 male no homeless 36 36 61 38 1 yes 26.065777
## 45 0 male yes housed 13 15 62 34 0 no 31.501711
## 46 0 male no housed 3 19 63 41 0 no 24.998930
## 47 0 male no housed 32 32 67 38 0 no 35.839642
## 48 0 male no housed 35 42 68 42 0 no 17.565235
## 49 0 male yes homeless 20 20 69 41 0 no 20.025341
## 50 0 male no homeless 7 25 70 38 1 yes 25.812592
## 51 0 male no housed 0 0 72 38 1 yes 39.934162
## 52 0 male no homeless 26 51 73 44 1 yes 23.996725
## 53 0 male no housed 18 36 76 38 0 no 38.752102
## 54 0 male no housed 6 12 78 29 1 yes 34.839962
## 55 0 male no housed 13 17 80 35 NA <NA> 22.957235
## 56 0 male yes homeless 5 5 81 28 1 yes 28.418003
## 57 0 male no homeless 2 2 82 31 0 no 33.115913
## 58 0 male no homeless 102 102 83 40 0 no 14.913925
## 59 0 male yes homeless 0 0 84 44 1 yes 17.449858
## 60 0 male yes housed 21 21 85 36 0 no 13.134663
## 61 0 male yes homeless 6 8 86 29 1 yes 19.344807
## 62 0 male no housed 1 1 87 42 1 yes 26.221968
## 63 0 male no homeless 19 19 88 40 0 no 34.210976
## 64 0 male no housed 1 22 89 29 1 yes 52.926834
## 65 0 male no homeless 0 0 91 39 1 yes 26.918222
## 66 0 male no housed 26 47 93 39 0 no 39.298168
## 67 0 male no housed 0 0 94 35 0 no 47.550678
## 68 0 male no homeless 9 19 95 38 1 yes 54.053368
## 69 0 male no housed 10 10 96 40 0 no 37.845036
## 70 0 male yes homeless 4 5 97 44 1 yes 20.202173
## 71 0 male no housed 6 15 98 19 NA <NA> 51.788670
## 72 0 male yes homeless 26 51 99 43 0 no 32.566528
## 73 0 male yes homeless 26 26 102 34 1 yes 16.302422
## 74 0 male yes housed 2 3 103 42 1 yes 15.754984
## 75 0 male yes housed 61 184 105 40 0 no 23.659925
## 76 0 male yes housed 2 2 106 39 NA <NA> 34.737865
## 77 0 male no homeless 19 19 107 40 0 no 15.618371
## 78 0 male no housed 0 0 109 38 NA <NA> 40.941338
## 79 0 male yes housed 18 47 110 41 1 yes 24.330456
## 80 0 male yes homeless 51 51 111 42 0 no 15.196477
## 81 0 male no housed 0 0 112 37 0 no 50.788845
## 82 0 male no homeless 36 66 113 43 0 no 23.554617
## 83 0 male no housed 31 91 114 38 NA <NA> 15.822761
## 84 0 male no housed 0 0 115 33 1 yes 45.402626
## 85 0 male no housed 26 69 116 34 0 no 53.616177
## 86 0 male no housed 2 20 117 28 NA <NA> 59.264427
## 87 0 male yes homeless 51 51 119 43 0 no 12.432887
## 88 0 male no housed 19 26 122 42 1 yes 21.912630
## 89 0 male no homeless 13 13 123 33 0 no 28.972683
## 90 0 male no homeless 0 0 124 36 0 no 16.284695
## 91 0 male no homeless 13 13 126 19 0 no 41.590557
## 92 0 male no housed 22 22 128 25 1 yes 39.450993
## 93 0 male no homeless 13 33 129 42 1 yes 42.539974
## 94 0 male yes homeless 19 30 132 39 0 no 22.669971
## 95 0 male yes homeless 26 26 133 41 NA <NA> 45.529411
## 96 0 male no homeless 3 3 135 40 1 yes 23.729639
## 97 0 male no housed 24 24 136 40 1 yes 40.676174
## 98 0 male no housed 0 0 137 29 0 no 28.075939
## 99 0 male yes homeless 53 53 140 39 0 no 21.460621
## 100 0 male no homeless 25 25 142 38 0 no 33.652927
## 101 0 male yes homeless 64 179 144 42 1 yes 45.491100
## 102 0 male yes homeless 4 4 148 42 0 no 23.371147
## 103 0 male yes homeless 3 6 149 37 1 yes 34.598862
## 104 0 male no homeless 13 13 151 42 1 yes 29.082914
## 105 0 male no housed 20 51 152 37 NA <NA> 24.422007
## 106 0 male no homeless 38 38 154 43 1 yes 18.690155
## 107 0 male no homeless 8 8 156 40 0 no 27.683458
## 108 0 male no homeless 0 0 158 34 1 yes 47.145802
## 109 0 male no housed 13 13 160 43 1 yes 33.517311
## 110 0 male no homeless 39 39 163 30 0 no 41.131794
## 111 0 male no housed 12 20 164 44 0 no 24.090509
## 112 0 male no housed 0 0 167 37 1 yes 20.069775
## 113 0 male no housed 1 1 168 29 0 no 18.211269
## 114 0 male no housed 19 32 169 43 0 no 30.071957
## 115 0 male no housed 0 0 170 30 0 no 28.679745
## 116 0 male yes housed 26 51 172 37 1 yes 20.517740
## 117 0 male no housed 19 19 173 29 0 no 31.188143
## 118 0 male no homeless 3 6 174 41 0 no 43.881058
## 119 0 male no housed 1 1 177 35 0 no 56.784805
## 120 0 male no housed 12 17 178 41 0 no 39.074711
## 121 0 male no homeless 38 38 180 42 0 no 21.200043
## 122 0 male no housed 4 4 182 38 0 no 10.564762
## 123 0 male yes homeless 19 50 183 41 1 yes 22.640652
## 124 0 male no housed 41 54 185 40 1 yes 39.270416
## 125 0 male no housed 1 3 186 36 0 no 18.771036
## 126 0 male no homeless 19 19 189 42 0 no 21.049545
## 127 0 male no housed 8 8 190 34 1 yes 50.018494
## 128 0 male no housed 12 12 192 34 0 no 7.938221
## 129 0 male no homeless 12 20 198 36 1 yes 41.054363
## 130 0 male yes homeless 1 3 199 36 0 no 29.860514
## 131 0 male no homeless 10 13 201 44 NA <NA> 26.252979
## 132 0 male yes housed 3 24 202 41 1 yes 40.167236
## 133 0 male yes housed 6 12 206 32 1 yes 25.615507
## 134 0 male no homeless 102 102 208 38 1 yes 14.358881
## 135 0 male no housed 1 4 209 39 1 yes 27.122667
## 136 0 male no housed 0 0 210 29 0 no 36.823708
## 137 0 male yes housed 58 58 211 41 0 no 17.509274
## 138 0 male no housed 9 9 212 37 0 no 17.927528
## 139 0 male no housed 35 65 214 43 1 yes 47.711655
## 140 0 male no housed 33 51 215 42 1 yes 20.731987
## 141 0 male no housed 19 19 217 28 0 no 52.455845
## 142 0 male no housed 0 0 222 38 1 yes 23.058514
## 143 0 male no housed 6 6 223 40 0 no 45.011848
## 144 0 male no housed 18 18 225 36 0 no 48.410297
## 145 0 male yes homeless 0 0 230 41 1 yes 46.119808
## 146 0 male no housed 46 46 231 32 0 no 35.955441
## 147 0 male no homeless 27 30 232 40 1 yes 30.300137
## 148 0 male no homeless 3 3 233 40 1 yes 59.453930
## 149 0 male yes homeless 12 12 235 42 0 no 23.546112
## 150 0 male no homeless 26 26 238 29 0 no 46.729744
## 151 0 male no housed 23 92 239 40 1 yes 37.674961
## 152 0 male no housed 13 13 240 34 1 yes 57.260887
## 153 0 male yes homeless 26 26 243 43 1 yes 35.235611
## 154 0 male no homeless 13 13 245 35 1 yes 48.239128
## 155 0 male no homeless 13 13 246 35 1 yes 30.371395
## 156 0 male yes homeless 23 42 248 42 1 yes 22.884369
## 157 0 male no housed 15 15 250 34 1 yes 30.280018
## 158 0 male no housed 19 20 253 30 1 yes 47.979435
## 159 0 male no housed 2 3 256 39 1 yes 25.039495
## 160 0 male yes homeless 13 26 257 45 0 no 26.453758
## 161 0 male no housed 14 16 258 43 0 no 14.480626
## 162 0 male no homeless 51 51 259 36 0 no 52.789551
## 163 0 male no homeless 10 26 260 37 1 yes 35.576111
## 164 0 male no homeless 16 16 261 42 0 no 26.799009
## 165 0 male yes homeless 102 102 262 44 1 yes 27.808109
## 166 0 male yes housed 6 20 265 33 1 yes 27.650967
## 167 0 male no homeless 27 27 268 42 0 no 27.177586
## 168 0 male no homeless 27 41 270 33 1 yes 31.328341
## 169 0 male yes housed 54 73 273 45 0 no 16.125675
## 170 0 male yes housed 24 36 274 40 1 yes 17.625854
## 171 0 male no homeless 30 41 276 42 0 no 27.898603
## 172 0 male no homeless 43 43 277 39 0 no 23.683241
## 173 0 male no housed 2 2 278 21 0 no 58.168713
## 174 0 male no housed 16 16 279 37 0 no 31.777193
## 175 0 male no housed 3 3 280 4 1 yes 52.955296
## 176 0 male no housed 34 51 283 36 1 yes 24.813925
## 177 0 male no housed 28 28 285 42 1 yes 46.830055
## 178 0 male no housed 13 13 287 44 0 no 16.398746
## 179 0 male no housed 51 51 288 38 0 no 36.798199
## 180 0 male no homeless 134 140 289 42 1 yes 55.991005
## 181 0 male yes homeless 5 6 290 28 0 no 41.624405
## 182 0 male no homeless 5 5 291 40 1 yes 19.645632
## 183 0 male yes housed 3 3 292 44 1 yes 26.919926
## 184 0 male no housed 0 0 293 37 0 no 37.953053
## 185 0 male yes housed 26 26 294 32 0 no 31.877844
## 186 0 male no housed 15 30 295 30 0 no 54.970051
## 187 0 male yes homeless 9 20 296 39 0 no 30.701992
## 188 0 male no housed 10 15 297 41 0 no 27.607288
## 189 0 male yes housed 0 0 298 31 1 yes 29.505835
## 190 0 male yes housed 24 45 299 39 0 no 21.931257
## 191 0 male yes homeless 33 51 300 40 0 no 20.979116
## 192 0 male no housed 0 0 302 32 0 no 28.558788
## 193 0 male yes housed 0 0 307 39 1 yes 11.819070
## 194 0 male no homeless 3 3 309 40 0 no 25.548498
## 195 0 male no homeless 14 20 310 39 0 no 34.139271
## 196 0 male no homeless 12 12 311 38 0 no 29.400602
## 197 0 male no housed 0 0 315 27 1 yes 56.963795
## 198 0 male no housed 0 0 317 29 0 no 41.195469
## 199 0 male no housed 25 33 318 39 0 no 36.719200
## 200 0 male yes housed 42 57 319 40 0 no 48.008137
## 201 0 male no housed 6 6 322 32 0 no 58.477470
## 202 0 male no housed 19 19 323 38 0 no 62.031616
## 203 0 male yes homeless 0 0 326 37 1 yes 24.378925
## 204 0 male no homeless 22 32 328 31 0 no 18.677704
## 205 0 male no homeless 19 19 329 19 0 no 58.899960
## 206 0 male yes homeless 13 19 331 41 0 no 15.773271
## 207 0 male yes housed 1 1 332 34 0 no 34.541599
## 208 0 male no homeless 13 13 334 40 0 no 51.918278
## 209 0 male no housed 20 20 335 37 1 yes 23.137871
## 210 0 male no housed 0 0 336 39 1 yes 22.939909
## 211 0 male no housed 3 9 337 26 0 no 33.888065
## 212 0 male no homeless 142 142 338 37 0 no 34.412716
## 213 0 male yes homeless 64 64 341 32 1 yes 22.354912
## 214 0 male no housed 2 2 343 42 0 no 19.718121
## 215 0 male no homeless 51 51 346 43 0 no 28.747435
## 216 0 male no housed 1 1 347 12 0 no 55.912579
## 217 0 male no homeless 24 30 348 44 0 no 18.948950
## 218 0 male no housed 35 35 350 40 1 yes 38.851971
## 219 0 male no homeless 0 0 352 41 1 yes 31.739616
## 220 0 male yes housed 13 26 353 38 0 no 17.837486
## 221 0 male yes homeless 12 12 355 41 0 no 20.911737
## 222 0 male no homeless 7 7 356 37 0 no 32.773659
## 223 0 male yes homeless 26 26 357 40 1 yes 23.771542
## 224 0 male yes homeless 41 56 359 41 1 yes 23.242210
## 225 0 male yes homeless 3 3 360 41 0 no 22.447948
## 226 0 male no housed 18 31 361 31 1 yes 58.851147
## 227 0 male no homeless 38 55 362 43 0 no 27.218351
## 228 0 male no housed 12 15 363 39 0 no 18.287806
## 229 0 male no homeless 4 4 365 40 0 no 37.835770
## 230 0 male no housed 32 32 366 24 1 yes 37.698196
## 231 0 male no homeless 34 102 368 42 0 no 18.615227
## 232 0 male no homeless 38 51 369 29 NA <NA> 47.255920
## 233 0 male no homeless 13 13 371 31 0 no 57.873539
## 234 0 male no housed 49 49 376 42 1 yes 41.010502
## 235 0 male no homeless 18 36 377 35 1 yes 39.963680
## 236 0 male yes housed 0 0 378 37 0 no 21.599306
## 237 0 male yes homeless 2 2 380 43 1 yes 29.332056
## 238 0 male no housed 6 13 381 40 0 no 18.604780
## 239 0 male no homeless 6 13 382 33 1 yes 19.291830
## 240 0 male no housed 10 10 383 37 0 no 31.856297
## 241 0 male no housed 0 0 385 36 1 yes 26.698538
## 242 0 male no homeless 6 20 386 26 1 yes 53.340359
## 243 0 male no homeless 6 6 387 42 1 yes 51.003738
## 244 0 male no housed 0 0 388 35 0 no 28.639238
## 245 0 male no housed 32 32 389 41 1 yes 44.215485
## 246 0 male no housed 3 12 392 36 0 no 57.296200
## 247 0 male no homeless 6 6 394 42 0 no 30.918043
## 248 0 male no homeless 0 0 395 33 0 no 24.849377
## 249 0 male no homeless 25 25 399 38 1 yes 17.863741
## 250 0 male no homeless 13 26 400 41 0 no 48.483433
## 251 0 male no homeless 18 18 401 36 0 no 27.514502
## 252 0 male no housed 2 2 404 39 1 yes 36.029205
## 253 0 male yes homeless 26 38 405 41 1 yes 25.465322
## 254 0 male no homeless 5 25 406 39 1 yes 38.778580
## 255 0 male yes housed 10 23 407 25 1 yes 31.255833
## 256 0 male no homeless 0 0 408 32 0 no 58.750145
## 257 0 male no housed 4 4 409 39 0 no 32.313843
## 258 0 male no housed 29 85 411 31 1 yes 40.056877
## 259 0 male no homeless 20 20 413 40 1 yes 37.504734
## 260 0 male no housed 3 12 415 29 0 no 18.340139
## 261 0 male yes homeless 6 12 416 41 0 no 14.108759
## 262 0 male no homeless 13 13 418 9 0 no 59.930012
## 263 0 male no homeless 36 36 419 39 0 no 26.474701
## 264 0 male no homeless 18 18 420 37 0 no 57.489437
## 265 0 male no homeless 45 45 422 40 0 no 41.324745
## 266 0 male no housed 13 13 423 31 0 no 38.907230
## 267 0 male no homeless 4 10 424 42 0 no 22.673281
## 268 0 male no homeless 6 26 425 42 0 no 30.106504
## 269 0 male no housed 6 6 428 15 0 no 38.276970
## 270 0 male no housed 25 42 430 37 0 no 45.859604
## 271 0 male no homeless 13 13 432 35 1 yes 25.544411
## 272 0 male no housed 37 37 433 30 0 no 22.730097
## 273 0 male no housed 25 25 435 44 0 no 25.445648
## 274 0 male yes homeless 38 38 436 32 1 yes 46.967522
## 275 0 male no housed 12 29 437 32 0 no 47.133209
## 276 0 male no housed 6 24 438 38 1 yes 42.632927
## 277 0 male no homeless 6 6 440 34 0 no 54.851093
## 278 0 male no housed 0 0 441 44 0 no 15.101494
## 279 0 male yes homeless 8 8 443 40 1 yes 19.116766
## 280 0 male no housed 32 32 444 41 0 no 51.843193
## 281 0 male no homeless 51 51 447 30 0 no 32.484653
## 282 0 male no housed 35 35 448 42 1 yes 43.498222
## 283 0 male no homeless 73 73 449 36 1 yes 18.795931
## 284 0 male yes homeless 9 31 452 45 0 no 18.525930
## 285 0 male yes homeless 51 51 457 44 1 yes 25.738285
## 286 0 male no housed 6 8 458 28 0 no 14.891697
## 287 0 male no housed 6 16 459 32 1 yes 41.360710
## 288 0 male no homeless 2 3 464 44 NA <NA> 17.082233
## 289 0 male no housed 1 1 467 31 0 no 43.441059
## 290 0 male no homeless 49 109 468 42 0 no 27.801510
## 291 0 male no housed 19 25 469 35 1 yes 42.457150
## 292 0 male no housed 38 51 13 45 0 no 18.750151
## 293 0 male no homeless 26 40 26 45 1 yes 28.556833
## 294 0 male no homeless 83 145 29 42 0 no 28.602417
## 295 0 male yes housed 32 40 48 43 0 no 15.268264
## 296 0 male no housed 30 101 64 41 0 no 40.633827
## 297 0 male no housed 42 42 130 31 0 no 46.269627
## 298 0 male no housed 18 26 145 36 1 yes 33.659222
## 299 0 male yes homeless 35 105 146 36 0 no 21.645960
## 300 0 male no homeless 20 20 147 41 0 no 23.724752
## 301 0 male yes housed 26 26 159 33 0 no 15.599421
## 302 0 male no homeless 43 54 161 43 1 yes 28.475632
## 303 0 male no housed 1 2 165 35 1 yes 36.594727
## 304 0 male no housed 51 51 175 37 0 no 15.078867
## 305 0 male yes homeless 24 48 176 44 0 no 38.950596
## 306 0 male yes housed 13 13 184 43 1 yes 31.680859
## 307 0 male no housed 20 26 195 41 NA <NA> 19.096197
## 308 0 male no housed 26 26 197 35 0 no 48.442287
## 309 0 male no housed 8 18 205 36 0 no 52.697727
## 310 0 male yes housed 61 61 207 34 0 no 19.919922
## 311 0 male no housed 13 19 216 33 0 no 13.312669
## 312 0 male yes homeless 28 37 218 43 0 no 15.686288
## 313 0 male no housed 6 7 227 32 1 yes 33.820976
## 314 0 male no housed 10 10 234 41 1 yes 11.499865
## 315 0 male no housed 0 0 244 36 1 yes 26.392733
## 316 0 male no housed 4 10 251 19 0 no 52.945427
## 317 0 male no housed 25 37 252 33 0 no 39.972664
## 318 0 male no homeless 2 2 263 40 NA <NA> 23.446474
## 319 0 male no homeless 26 26 266 44 1 yes 42.341843
## 320 0 male no housed 24 24 267 33 0 no 28.061911
## 321 0 male yes homeless 0 0 271 38 0 no 28.073883
## 322 0 male no homeless 13 13 281 38 1 yes 37.116608
## 323 0 male no housed 12 12 282 31 0 no 57.800064
## 324 0 male no housed 12 30 301 41 0 no 12.204219
## 325 0 male no housed 12 18 305 38 0 no 39.038631
## 326 0 male no homeless 3 3 312 36 0 no 37.102394
## 327 0 male no housed 51 69 314 29 0 no 23.898293
## 328 0 male no homeless 5 5 321 29 0 no 46.330513
## 329 0 male no homeless 68 68 330 42 1 yes 13.412563
## 330 0 male no homeless 29 29 340 43 0 no 49.503277
## 331 0 male no housed 5 5 373 38 0 no 33.345051
## 332 0 male no homeless 32 32 390 41 0 no 18.530807
## 333 0 male no housed 0 0 393 14 0 no 54.525818
## 334 0 male no housed 76 78 396 10 0 no 44.171612
## 335 0 male no homeless 26 26 397 32 0 no 47.779892
## 336 0 male no homeless 41 62 398 39 0 no 21.271496
## 337 0 male no homeless 18 18 410 43 1 yes 39.929405
## 338 0 male no homeless 22 30 412 31 NA <NA> 25.632202
## 339 0 male no housed 53 63 417 43 1 yes 23.716438
## 340 0 male no homeless 4 13 434 34 0 no 52.792542
## 341 0 male no housed 3 3 439 30 0 no 28.609346
## 342 0 male no homeless 0 0 453 36 0 no 25.851772
## 343 0 male no housed 0 0 454 38 0 no 41.943066
## 344 0 male no housed 13 20 455 39 1 yes 62.175503
## 345 0 male no homeless 13 13 462 26 0 no 54.424816
## 346 0 male no homeless 51 51 463 43 0 no 30.212227
## pcs pss_fr racegrp satreat sexrisk substance treat avg_drinks
## 1 58.41369 0 black no 4 cocaine yes 13
## 2 36.03694 1 white no 7 alcohol yes 56
## 3 74.80633 13 black no 2 heroin no 0
## 4 37.34558 10 black no 6 cocaine no 10
## 5 65.13801 4 white yes 6 alcohol yes 12
## 6 22.61060 0 white yes 0 heroin yes 20
## 7 39.24264 13 white yes 1 alcohol no 20
## 8 31.82965 1 black no 4 cocaine yes 13
## 9 55.16998 1 white no 8 alcohol yes 51
## 10 56.43837 9 black no 4 heroin no 0
## 11 58.29807 1 other no 4 cocaine no 1
## 12 34.33926 11 black yes 7 cocaine no 23
## 13 36.58481 8 black no 4 heroin yes 26
## 14 61.64098 14 black no 4 cocaine yes 0
## 15 46.22176 10 white no 6 alcohol no 34
## 16 51.56324 6 black yes 9 cocaine no 4
## 17 63.06006 3 white no 5 cocaine yes 3
## 18 61.82597 6 black no 4 heroin yes 7
## 19 43.53374 4 white no 5 alcohol no 24
## 20 42.22490 5 black yes 2 cocaine yes 0
## 21 42.40059 3 black no 6 alcohol no 20
## 22 50.14700 12 black no 0 alcohol no 3
## 23 63.52000 2 black yes 5 cocaine yes 6
## 24 53.35109 10 black no 2 cocaine yes 0
## 25 56.73985 10 black no 0 cocaine yes 0
## 26 53.23396 8 black no 3 alcohol yes 32
## 27 62.04113 6 white no 4 alcohol no 2
## 28 38.39529 11 black no 4 heroin no 3
## 29 51.30330 1 white yes 0 alcohol no 27
## 30 56.68389 10 hispanic no 4 heroin no 3
## 31 71.39259 3 white no 7 cocaine yes 24
## 32 52.61977 4 black no 7 cocaine yes 6
## 33 36.04588 7 black no 6 cocaine no 0
## 34 50.06427 14 white no 4 heroin yes 13
## 35 62.03183 10 black no 5 alcohol no 25
## 36 56.38669 12 black no 1 alcohol no 13
## 37 40.38438 11 black no 10 heroin yes 15
## 38 58.16169 6 black no 6 cocaine no 7
## 39 45.48341 6 other no 9 heroin yes 9
## 40 63.05807 2 black no 7 alcohol yes 5
## 41 29.45625 8 white no 3 alcohol no 34
## 42 40.46056 13 other no 4 cocaine yes 3
## 43 57.20213 1 white yes 2 alcohol no 37
## 44 47.60514 10 black no 4 alcohol yes 36
## 45 50.16318 7 black yes 6 heroin no 13
## 46 50.39870 6 black yes 7 cocaine no 3
## 47 52.68871 12 black yes 6 cocaine yes 32
## 48 67.53625 11 black yes 4 alcohol no 35
## 49 36.98058 5 white no 6 alcohol no 20
## 50 64.29022 5 black yes 9 alcohol no 7
## 51 53.15686 8 white yes 2 heroin yes 0
## 52 45.18499 3 white yes 6 alcohol yes 26
## 53 27.36663 4 black no 5 cocaine yes 18
## 54 58.25895 5 white no 8 cocaine no 6
## 55 63.91367 10 white no 12 cocaine no 13
## 56 56.90441 2 black no 4 cocaine yes 5
## 57 48.79136 4 black no 9 heroin no 2
## 58 52.59380 9 black yes 6 cocaine no 102
## 59 68.12395 7 white yes 6 alcohol yes 0
## 60 57.07777 1 hispanic no 3 heroin no 21
## 61 42.62894 12 white no 11 heroin yes 6
## 62 59.56708 1 black no 7 cocaine yes 1
## 63 44.16995 10 black no 4 alcohol no 19
## 64 58.21477 13 black no 4 alcohol yes 1
## 65 59.82454 9 other no 5 alcohol yes 0
## 66 38.46090 8 white yes 2 alcohol no 26
## 67 37.18519 3 black no 1 heroin yes 0
## 68 56.50476 12 white no 1 heroin yes 9
## 69 57.33492 14 black no 4 cocaine no 10
## 70 28.85472 4 hispanic no 5 heroin yes 4
## 71 60.58733 10 black no 5 cocaine no 6
## 72 30.05406 9 white no 3 alcohol no 26
## 73 55.98083 10 black yes 4 cocaine yes 26
## 74 48.05733 9 black no 10 cocaine yes 2
## 75 30.23405 6 black no 3 cocaine yes 61
## 76 65.74425 5 black no 2 cocaine no 2
## 77 55.50122 9 black yes 7 cocaine no 19
## 78 63.61380 1 white yes 4 heroin no 0
## 79 46.41464 4 white no 6 alcohol yes 18
## 80 54.13217 8 white no 0 alcohol yes 51
## 81 46.75063 4 white yes 5 heroin no 0
## 82 40.18310 9 black no 6 alcohol no 36
## 83 63.48228 6 black no 3 alcohol no 31
## 84 43.62142 5 white no 4 heroin yes 0
## 85 57.95000 7 black no 4 alcohol yes 26
## 86 54.44389 2 black no 5 cocaine no 2
## 87 48.89978 4 white yes 13 alcohol no 51
## 88 43.00148 3 black no 9 cocaine yes 19
## 89 59.74108 6 white yes 4 alcohol no 13
## 90 48.89844 12 white yes 4 heroin yes 0
## 91 40.88239 3 other no 9 cocaine no 13
## 92 28.93009 7 black no 3 alcohol yes 22
## 93 60.92048 4 black no 7 cocaine yes 13
## 94 35.39379 5 white yes 3 alcohol no 19
## 95 57.32318 2 white no 5 alcohol no 26
## 96 45.54259 8 black yes 7 cocaine yes 3
## 97 59.10600 9 black no 7 cocaine yes 24
## 98 42.01285 11 black no 4 heroin yes 0
## 99 45.01618 7 white no 0 alcohol no 53
## 100 48.87681 1 black no 7 alcohol no 25
## 101 38.13606 5 black no 6 alcohol yes 64
## 102 29.47202 12 black no 4 heroin no 4
## 103 50.21533 9 white yes 5 heroin yes 3
## 104 36.24839 8 white no 7 alcohol no 13
## 105 45.56750 7 white no 5 alcohol no 20
## 106 59.47648 7 other no 5 cocaine yes 38
## 107 31.97959 6 black yes 6 heroin no 8
## 108 53.66537 3 black yes 6 cocaine yes 0
## 109 29.78529 3 black no 2 cocaine no 13
## 110 24.43518 8 white no 5 alcohol yes 39
## 111 53.75950 10 black no 7 cocaine no 12
## 112 50.23810 11 white no 4 heroin yes 0
## 113 56.00507 11 other yes 3 heroin no 1
## 114 44.92406 9 white yes 1 alcohol no 19
## 115 61.78611 2 white no 6 heroin no 0
## 116 54.35444 8 white no 4 alcohol yes 26
## 117 55.74972 8 black no 7 alcohol yes 19
## 118 61.44474 7 black no 8 cocaine no 3
## 119 56.84005 3 black yes 9 cocaine yes 1
## 120 36.56960 5 white yes 5 alcohol no 12
## 121 32.28706 2 white no 8 alcohol no 38
## 122 52.94168 9 black no 0 heroin no 4
## 123 31.00380 7 white no 0 alcohol yes 19
## 124 26.45694 11 hispanic no 3 alcohol yes 41
## 125 40.46645 2 other no 0 heroin no 1
## 126 45.46138 1 white no 6 heroin yes 19
## 127 54.07817 7 black no 2 cocaine yes 8
## 128 53.61504 10 black no 4 cocaine no 12
## 129 57.70763 14 white no 0 alcohol yes 12
## 130 53.68318 11 white no 1 heroin yes 1
## 131 54.42475 3 hispanic no 7 cocaine no 10
## 132 61.28633 4 black no 8 cocaine yes 3
## 133 66.59317 10 white no 4 alcohol yes 6
## 134 49.27981 2 black yes 7 alcohol yes 102
## 135 58.16642 10 black no 7 cocaine no 1
## 136 31.52861 2 other yes 5 heroin yes 0
## 137 49.36320 12 black no 8 heroin no 58
## 138 43.17081 2 white no 4 heroin yes 9
## 139 57.81969 2 black no 6 cocaine yes 35
## 140 54.82264 5 black no 8 alcohol yes 33
## 141 60.41816 13 black yes 3 alcohol no 19
## 142 54.36913 6 white no 6 cocaine yes 0
## 143 35.79145 10 black no 3 cocaine yes 6
## 144 59.32288 6 hispanic no 6 alcohol no 18
## 145 23.50237 5 black no 3 alcohol yes 0
## 146 56.30513 11 black no 3 alcohol no 46
## 147 41.06454 4 white yes 2 heroin yes 27
## 148 58.16510 14 black no 2 cocaine no 3
## 149 41.57280 7 white no 4 heroin no 12
## 150 54.59662 1 white no 0 alcohol no 26
## 151 47.36353 2 black yes 7 alcohol yes 23
## 152 56.89963 0 black yes 5 alcohol no 13
## 153 48.48331 0 white yes 5 alcohol yes 26
## 154 56.39499 3 black no 2 cocaine yes 13
## 155 47.35083 1 other no 5 cocaine yes 13
## 156 29.11139 5 black no 4 alcohol no 23
## 157 34.58012 12 white no 4 alcohol yes 15
## 158 48.27899 6 black no 4 cocaine yes 19
## 159 63.25544 14 black no 8 cocaine no 2
## 160 46.76894 3 white no 5 heroin no 13
## 161 70.14779 5 white no 5 cocaine yes 14
## 162 50.25876 1 white yes 6 alcohol no 51
## 163 29.49112 3 black no 7 cocaine yes 10
## 164 42.42209 10 white no 0 alcohol no 16
## 165 25.61815 7 white no 1 alcohol yes 102
## 166 53.05504 6 black no 5 cocaine yes 6
## 167 43.00587 6 black no 11 cocaine no 27
## 168 41.78789 1 black no 7 cocaine yes 27
## 169 47.65467 11 white yes 1 alcohol no 54
## 170 44.01194 13 hispanic no 4 heroin yes 24
## 171 43.68238 2 white yes 1 alcohol no 30
## 172 43.55378 9 black no 4 alcohol yes 43
## 173 49.47607 3 black no 8 cocaine no 2
## 174 41.87122 4 black no 0 heroin no 16
## 175 60.10658 12 black no 4 cocaine yes 3
## 176 35.46683 12 black no 5 cocaine no 34
## 177 62.44834 1 black no 1 alcohol yes 28
## 178 42.32603 3 black no 6 cocaine no 13
## 179 57.78556 13 black yes 7 cocaine yes 51
## 180 32.58783 11 black yes 13 alcohol yes 134
## 181 53.04678 6 black yes 2 alcohol no 5
## 182 46.33508 9 white no 5 heroin yes 5
## 183 48.62301 3 white no 4 alcohol no 3
## 184 61.60262 6 black yes 4 cocaine no 0
## 185 51.38743 11 hispanic no 3 cocaine no 26
## 186 33.79744 12 white no 1 alcohol yes 15
## 187 51.40308 4 white no 4 heroin yes 9
## 188 44.29502 5 white yes 0 alcohol no 10
## 189 46.76040 10 hispanic no 3 heroin no 0
## 190 49.87759 6 other no 2 alcohol yes 24
## 191 59.28272 1 white yes 0 alcohol yes 33
## 192 36.63770 4 hispanic no 5 heroin no 0
## 193 62.81930 2 hispanic yes 4 heroin yes 0
## 194 46.98674 5 hispanic no 3 heroin yes 3
## 195 56.95329 14 white no 4 alcohol yes 14
## 196 44.11552 3 white yes 2 heroin no 12
## 197 46.56849 5 black no 4 cocaine yes 0
## 198 40.11784 11 hispanic no 3 heroin yes 0
## 199 30.27282 9 other no 4 alcohol no 25
## 200 51.74989 11 hispanic no 0 alcohol no 42
## 201 58.89470 11 white yes 6 heroin yes 6
## 202 36.10949 12 black no 5 cocaine no 19
## 203 35.89378 4 hispanic no 4 heroin no 0
## 204 71.62856 6 black no 6 cocaine no 22
## 205 59.34274 12 black yes 6 alcohol yes 19
## 206 48.61113 3 black yes 5 alcohol no 13
## 207 54.08614 5 black no 8 cocaine yes 1
## 208 51.16233 12 black yes 11 cocaine no 13
## 209 51.24271 10 white no 2 heroin yes 20
## 210 33.03571 10 black no 2 heroin yes 0
## 211 33.92213 2 black yes 7 cocaine yes 3
## 212 25.92422 5 white no 8 alcohol no 142
## 213 31.76573 1 white yes 0 heroin no 64
## 214 41.32350 7 other no 3 heroin no 2
## 215 51.08913 10 white no 6 alcohol yes 51
## 216 51.01180 11 black yes 6 cocaine yes 1
## 217 40.42006 7 black no 9 alcohol no 24
## 218 45.13578 12 hispanic no 1 alcohol no 35
## 219 31.52352 0 white no 4 heroin yes 0
## 220 54.94331 5 black no 1 cocaine no 13
## 221 44.87310 2 hispanic no 8 heroin yes 12
## 222 63.90699 2 hispanic no 0 alcohol no 7
## 223 47.50178 5 white no 7 alcohol yes 26
## 224 30.34914 9 white yes 6 alcohol yes 41
## 225 45.32498 2 hispanic yes 7 alcohol no 3
## 226 58.71478 4 black no 5 alcohol yes 18
## 227 34.31445 0 white no 9 alcohol no 38
## 228 43.60749 2 black yes 4 alcohol no 12
## 229 32.12609 4 black yes 0 heroin no 4
## 230 52.02918 11 hispanic no 4 alcohol yes 32
## 231 58.15246 5 white no 4 alcohol no 34
## 232 46.52069 6 black no 7 cocaine yes 38
## 233 57.59651 14 black no 8 cocaine yes 13
## 234 62.97789 5 black yes 11 cocaine yes 49
## 235 37.80672 0 white no 3 alcohol no 18
## 236 36.64597 1 hispanic yes 3 heroin yes 0
## 237 25.43683 5 hispanic no 4 heroin yes 2
## 238 66.09068 4 black no 5 cocaine yes 6
## 239 59.91458 6 black no 5 cocaine yes 6
## 240 64.18298 7 black yes 6 cocaine no 10
## 241 43.39342 3 hispanic no 1 heroin yes 0
## 242 57.65739 12 white no 0 alcohol no 6
## 243 51.70669 2 hispanic yes 2 cocaine yes 6
## 244 48.98777 10 white no 2 heroin no 0
## 245 54.15862 9 black no 6 alcohol yes 32
## 246 59.14530 5 hispanic yes 4 alcohol no 3
## 247 63.34270 11 black yes 5 cocaine yes 6
## 248 51.15330 1 hispanic yes 4 cocaine yes 0
## 249 38.19618 6 white no 0 alcohol yes 25
## 250 57.44889 5 white no 5 alcohol no 13
## 251 64.07393 3 black yes 5 alcohol no 18
## 252 61.19665 1 black no 3 cocaine yes 2
## 253 65.26759 5 black no 9 cocaine no 26
## 254 41.73849 10 black yes 8 cocaine yes 5
## 255 56.56525 7 black no 6 cocaine yes 10
## 256 53.01821 12 black no 8 cocaine no 0
## 257 57.04919 14 black no 4 heroin no 4
## 258 57.73149 11 black no 5 cocaine yes 29
## 259 54.06671 3 other no 2 alcohol no 20
## 260 43.89911 12 black no 2 cocaine yes 3
## 261 48.81484 5 white yes 5 heroin no 6
## 262 58.22468 3 black no 7 cocaine no 13
## 263 48.76114 12 white yes 3 heroin yes 36
## 264 37.74971 8 white yes 3 heroin no 18
## 265 36.81136 3 hispanic no 2 heroin no 45
## 266 49.43321 11 black yes 2 heroin yes 13
## 267 45.18067 4 white no 0 heroin no 4
## 268 36.35557 5 white yes 4 alcohol no 6
## 269 36.49366 5 black no 3 heroin no 6
## 270 14.07429 8 white no 4 alcohol no 25
## 271 42.86974 12 other no 4 heroin yes 13
## 272 56.85568 11 hispanic no 2 heroin yes 37
## 273 44.17665 8 black yes 5 alcohol no 25
## 274 58.74847 4 white yes 1 alcohol yes 38
## 275 51.92163 8 hispanic no 3 alcohol no 12
## 276 56.86680 6 black no 4 cocaine yes 6
## 277 50.26602 3 black no 5 heroin yes 6
## 278 48.11589 0 white yes 5 heroin no 0
## 279 45.58474 4 white yes 1 heroin yes 8
## 280 59.72128 5 white no 5 alcohol no 32
## 281 44.22039 4 white yes 3 alcohol no 51
## 282 20.74029 3 black no 4 alcohol yes 35
## 283 54.93296 4 hispanic yes 2 alcohol no 73
## 284 47.58062 3 white no 4 heroin yes 9
## 285 34.90893 2 white no 5 alcohol yes 51
## 286 60.11456 2 black no 7 heroin no 6
## 287 44.59728 7 white no 3 alcohol yes 6
## 288 47.00855 14 white no 5 heroin yes 2
## 289 59.99293 14 black yes 5 cocaine no 1
## 290 51.69448 1 black no 5 cocaine no 49
## 291 53.54025 11 hispanic no 4 heroin yes 19
## 292 46.04046 5 white no 2 alcohol no 38
## 293 53.17226 14 white no 3 alcohol yes 26
## 294 47.83191 6 white no 5 alcohol no 83
## 295 40.83885 7 white no 7 heroin no 32
## 296 58.78673 4 hispanic yes 12 cocaine yes 30
## 297 36.50988 14 black no 3 alcohol no 42
## 298 45.00826 8 other no 7 cocaine yes 18
## 299 41.52777 8 white no 4 alcohol no 35
## 300 32.87765 7 hispanic no 0 alcohol yes 20
## 301 47.65695 4 hispanic yes 2 heroin yes 26
## 302 45.82243 7 white no 6 alcohol yes 43
## 303 59.08202 8 white no 5 cocaine yes 1
## 304 41.00370 3 white yes 4 alcohol yes 51
## 305 59.73408 11 white no 1 alcohol no 24
## 306 60.97185 1 white no 10 heroin yes 13
## 307 59.91701 3 white yes 1 alcohol no 20
## 308 58.50863 7 black yes 3 alcohol no 26
## 309 58.58452 11 white no 3 alcohol no 8
## 310 64.95238 5 white no 8 alcohol no 61
## 311 49.44656 8 black no 3 heroin no 13
## 312 58.84382 1 hispanic no 8 alcohol yes 28
## 313 27.27006 4 other no 5 heroin yes 6
## 314 66.23132 4 other no 3 alcohol yes 10
## 315 32.35484 7 hispanic no 8 heroin yes 0
## 316 58.86002 11 white yes 5 cocaine yes 4
## 317 56.95388 3 black yes 2 alcohol no 25
## 318 40.40644 8 hispanic no 9 heroin no 2
## 319 61.74688 3 white no 0 alcohol yes 26
## 320 53.93607 12 black no 4 cocaine no 24
## 321 63.86327 9 hispanic no 3 cocaine yes 0
## 322 35.98627 9 black yes 8 cocaine yes 13
## 323 49.21747 12 black no 4 cocaine no 12
## 324 51.45133 11 white no 2 cocaine no 12
## 325 47.92621 0 black no 7 cocaine no 12
## 326 51.63569 2 black no 3 cocaine no 3
## 327 23.55043 9 white yes 2 alcohol no 51
## 328 59.16547 8 black no 3 cocaine no 5
## 329 42.08535 7 white no 3 cocaine no 68
## 330 51.01598 3 white no 5 alcohol yes 29
## 331 46.42344 1 white no 6 heroin yes 5
## 332 52.71838 1 white yes 7 heroin no 32
## 333 59.42862 5 hispanic no 4 heroin no 0
## 334 38.49107 1 white no 4 alcohol no 76
## 335 52.73988 10 white no 0 alcohol no 26
## 336 45.72916 2 white no 8 alcohol yes 41
## 337 61.97865 3 white yes 4 heroin yes 18
## 338 60.46511 14 white no 3 alcohol no 22
## 339 38.24600 7 hispanic no 1 heroin yes 53
## 340 57.12674 11 other no 2 alcohol no 4
## 341 52.02338 6 black yes 4 heroin no 3
## 342 50.60834 5 white yes 4 heroin no 0
## 343 56.96868 7 white no 4 heroin yes 0
## 344 57.25384 11 white no 0 alcohol yes 13
## 345 53.73204 7 black yes 9 cocaine no 13
## 346 43.47607 11 white no 4 alcohol no 51
## max_drinks
## 1 26
## 2 62
## 3 0
## 4 13
## 5 24
## 6 27
## 7 31
## 8 20
## 9 51
## 10 0
## 11 1
## 12 23
## 13 26
## 14 0
## 15 34
## 16 5
## 17 3
## 18 7
## 19 48
## 20 0
## 21 20
## 22 3
## 23 6
## 24 0
## 25 0
## 26 135
## 27 24
## 28 3
## 29 27
## 30 7
## 31 36
## 32 12
## 33 0
## 34 13
## 35 28
## 36 61
## 37 26
## 38 7
## 39 15
## 40 13
## 41 34
## 42 6
## 43 43
## 44 36
## 45 15
## 46 19
## 47 32
## 48 42
## 49 20
## 50 25
## 51 0
## 52 51
## 53 36
## 54 12
## 55 17
## 56 5
## 57 2
## 58 102
## 59 0
## 60 21
## 61 8
## 62 1
## 63 19
## 64 22
## 65 0
## 66 47
## 67 0
## 68 19
## 69 10
## 70 5
## 71 15
## 72 51
## 73 26
## 74 3
## 75 184
## 76 2
## 77 19
## 78 0
## 79 47
## 80 51
## 81 0
## 82 66
## 83 91
## 84 0
## 85 69
## 86 20
## 87 51
## 88 26
## 89 13
## 90 0
## 91 13
## 92 22
## 93 33
## 94 30
## 95 26
## 96 3
## 97 24
## 98 0
## 99 53
## 100 25
## 101 179
## 102 4
## 103 6
## 104 13
## 105 51
## 106 38
## 107 8
## 108 0
## 109 13
## 110 39
## 111 20
## 112 0
## 113 1
## 114 32
## 115 0
## 116 51
## 117 19
## 118 6
## 119 1
## 120 17
## 121 38
## 122 4
## 123 50
## 124 54
## 125 3
## 126 19
## 127 8
## 128 12
## 129 20
## 130 3
## 131 13
## 132 24
## 133 12
## 134 102
## 135 4
## 136 0
## 137 58
## 138 9
## 139 65
## 140 51
## 141 19
## 142 0
## 143 6
## 144 18
## 145 0
## 146 46
## 147 30
## 148 3
## 149 12
## 150 26
## 151 92
## 152 13
## 153 26
## 154 13
## 155 13
## 156 42
## 157 15
## 158 20
## 159 3
## 160 26
## 161 16
## 162 51
## 163 26
## 164 16
## 165 102
## 166 20
## 167 27
## 168 41
## 169 73
## 170 36
## 171 41
## 172 43
## 173 2
## 174 16
## 175 3
## 176 51
## 177 28
## 178 13
## 179 51
## 180 140
## 181 6
## 182 5
## 183 3
## 184 0
## 185 26
## 186 30
## 187 20
## 188 15
## 189 0
## 190 45
## 191 51
## 192 0
## 193 0
## 194 3
## 195 20
## 196 12
## 197 0
## 198 0
## 199 33
## 200 57
## 201 6
## 202 19
## 203 0
## 204 32
## 205 19
## 206 19
## 207 1
## 208 13
## 209 20
## 210 0
## 211 9
## 212 142
## 213 64
## 214 2
## 215 51
## 216 1
## 217 30
## 218 35
## 219 0
## 220 26
## 221 12
## 222 7
## 223 26
## 224 56
## 225 3
## 226 31
## 227 55
## 228 15
## 229 4
## 230 32
## 231 102
## 232 51
## 233 13
## 234 49
## 235 36
## 236 0
## 237 2
## 238 13
## 239 13
## 240 10
## 241 0
## 242 20
## 243 6
## 244 0
## 245 32
## 246 12
## 247 6
## 248 0
## 249 25
## 250 26
## 251 18
## 252 2
## 253 38
## 254 25
## 255 23
## 256 0
## 257 4
## 258 85
## 259 20
## 260 12
## 261 12
## 262 13
## 263 36
## 264 18
## 265 45
## 266 13
## 267 10
## 268 26
## 269 6
## 270 42
## 271 13
## 272 37
## 273 25
## 274 38
## 275 29
## 276 24
## 277 6
## 278 0
## 279 8
## 280 32
## 281 51
## 282 35
## 283 73
## 284 31
## 285 51
## 286 8
## 287 16
## 288 3
## 289 1
## 290 109
## 291 25
## 292 51
## 293 40
## 294 145
## 295 40
## 296 101
## 297 42
## 298 26
## 299 105
## 300 20
## 301 26
## 302 54
## 303 2
## 304 51
## 305 48
## 306 13
## 307 26
## 308 26
## 309 18
## 310 61
## 311 19
## 312 37
## 313 7
## 314 10
## 315 0
## 316 10
## 317 37
## 318 2
## 319 26
## 320 24
## 321 0
## 322 13
## 323 12
## 324 30
## 325 18
## 326 3
## 327 69
## 328 5
## 329 68
## 330 29
## 331 5
## 332 32
## 333 0
## 334 78
## 335 26
## 336 62
## 337 18
## 338 30
## 339 63
## 340 13
## 341 3
## 342 0
## 343 0
## 344 20
## 345 13
## 346 51
We create a stem-and-leaf plot for the variable “cesd”-scores of the females:
with(female, stem(cesd))
##
## The decimal point is 1 digit(s) to the right of the |
##
## 0 | 3
## 0 | 567
## 1 | 3
## 1 | 555589999
## 2 | 123344
## 2 | 66889999
## 3 | 0000233334444
## 3 | 5556666777888899999
## 4 | 00011112222334
## 4 | 555666777889
## 5 | 011122222333444
## 5 | 67788
## 6 | 0
We can also create side-by-side histograms to compare the “cesd”-scores for females and males:
histogram(~cesd|sex, data=HELPrct)
For uniformly distributed (flat) random numbers, use runif(). By default, its range is from 0 to 1. If we want to generate 1 random number between 0 and 1, then we use the code:
runif(1)
## [1] 0.1511837
If we want to generate 5 random numbers between 0 and 1, then we use the code:
runif(5)
## [1] 0.57579225 0.94263535 0.50566641 0.16260997 0.03687694
To generate a random integer between 1 and 10, we use the sample function:
x3<-sample(1:10, 1)
x3
## [1] 4
A permutation is an arrangement or ordering. For a permutation, the order matters.
Recall that:
\(n\)-factorial gives the number of permutations of \(n\) items.
\(n! = n(n - 1)(n - 2)(n - 3) ... (3)(2)(1)\)
Let’s say we have 8 people:
1: Alice
2: Bob
3: Charlie
4: David
5: Eve
6: Frank
7: George
8: Horatio
How many ways can we award a 1st, 2nd and 3rd place prize among eight contestants? (Gold / Silver / Bronze)
Fig. 6.1
We’re going to use permutations since the order we hand out these medals matters. Here’s how it breaks down:
Gold medal:
8 choices:
A B C D E F G H
Let’s say A wins the Gold.
Silver medal:
7 choices:
B C D E F G H.
Let’s say B wins the silver.
Bronze medal:
6 choices: C D E F G H.
Let’s say… C wins the bronze.
We picked certain people to win, but the details don’t matter: we had 8 choices at first, then 7, then 6. The total number of options was 8 · 7 · 6 = 336.
Let’s look at the details. We had to order 3 people out of 8. To do this, we started with all options (8) then took them away one at a time (7, then 6) until we ran out of medals.
We know the factorial is:
\(\displaystyle{ 8! = 8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2 \cdot 1 }\)
Unfortunately, that does too much! We only want 8 · 7 · 6. How can we “stop” the factorial at 5?
This is where permutations get cool: notice how we want to get rid of 5 · 4 · 3 · 2 · 1. What’s another name for this? 5 factorial!
So, if we do 8!/5! we get:
\(\displaystyle{\frac{8!}{5!} = \frac{8 \cdot 7 \cdot 6 \cdot 5 \cdot 4 \cdot 3 \cdot 2 \cdot 1}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 8 \cdot 7 \cdot 6}\)
And why did we use the number 5? Because it was left over after we picked 3 medals from 8. So, a better way to write this would be:
\(\displaystyle{\frac{8!}{(8-3)!}}\)
where 8!/(8-3)! is just a fancy way of saying “Use the first 3 numbers of 8!”. If we have n items total and want to pick k in a certain order, we get:
\(\displaystyle{\frac{n!}{(n-k)!}}\)
And this is the fancy permutation formula: You have n items and want to find the number of ways k items can be ordered:
\(\displaystyle{P(n,k) = \frac{n!}{(n-k)!}}\)
A license plate begins with three letters. If the possible letters are A, B, C, D and E, how many different permutations of these letters can be made if no letter is used more than once?
Using reasoning:
For the first letter, there are 5 possible choices. After that letter is chosen, there are 4 possible choices. Finally, there are 3 possible choices.
\(5 × 4 × 3 = 60\)
Using the permutation formula:
The problem involves 5 things (A, B, C, D, E) taken 3 at a time.
\(P(5,3)= \dfrac{5!}{(5-3)!}=\dfrac{5!}{2!}=\dfrac{5*4*3*2*1}{2*1}=60\)
There are 60 different permutations for the license plate.
In how many ways can a president, a treasurer and a secretary be chosen from among 7 candidates?
Using reasoning:
For the first position, there are 7 possible choices. After that candidate is chosen, there are 6 possible choices. Finally, there are 5 possible choices.
7 × 6 × 5 = 210
Using permutation formula:
The problem involves 7 candidates taken 3 at a time.
\(P(7,3)= \dfrac{7!}{(7-3)!}=\dfrac{7!}{4!}=\dfrac{7*6*5*4*3*2*1}{4*3*2*1}=210\)
There are 210 possible ways to choose a president, a treasurer and a secretary be chosen from among 7 candidates.
The number of different permutations of n objects where there are \(n_1\) indistinguishable items, \(n_2\) indistinguishable items, … \(n_k\) indistinguishable items, is
\(\dfrac{n!}{(n_1!*n_2!*...*n_k!)}.\)
In how many ways can the letters of the word MATHEMATICS be arranged?
\(\dfrac{11!}{2!*2!*2!}=4,989,600\)
Since we have a total of 11 letters and 2 x M’s, 2 x T’s and 2 x A’s.
In how many ways can the letters of MISSISSIPPI be arranged?
We have a total of 11 letters, but but 4 x I’s, 4 x S’s, and 2 x P’s.
So we have
\(\dfrac{11!}{4!*4!*2!} = 34,650\)
The letters in the word MISSISSIPPI can be rearranged in 34,650 many ways.
A permutation is an ordered combination. There are basically two types of permutations, with repetition (or replacement) and without repetition (without replacement).
The number of permutations with repetition (or with replacement) is simply calculated by:
\(n^r\),
where \(n\) is the number of things to choose from, \(r\) number of times.
Suppose you have an urn with a red, blue and black ball. If you choose two balls with replacement/repetition, there are \(3^2\) permutations:
{red, red},
{red, blue},
{red, black},
{blue, red},
{blue, blue},
{blue, black},
{black, red},
{black, blue}, and
{black, black}.
In R:
Install the “gtools” package.
#load library
library(gtools)
##
## Attaching package: 'gtools'
## The following object is masked from 'package:mosaic':
##
## logit
#urn with 3 balls
x <- c('red', 'blue', 'black')
#pick 2 balls from the urn with replacement
#get all permutations
permutations(n=3,r=2,v=x,repeats.allowed=T)
## [,1] [,2]
## [1,] "black" "black"
## [2,] "black" "blue"
## [3,] "black" "red"
## [4,] "blue" "black"
## [5,] "blue" "blue"
## [6,] "blue" "red"
## [7,] "red" "black"
## [8,] "red" "blue"
## [9,] "red" "red"
#number of permutations
nrow(permutations(n=3,r=2,v=x,repeats.allowed=T))
## [1] 9
#[1] 9
Calculating permutations without repetition/replacement, just means that for cases where \(r > 1\), \(n\) gets smaller after each pick. For example, if we choose two balls from the urn with the red, blue and black ball but without repetition/replacement, the first pick has 3 choices and the second pick has 2 choices:
{red, blue},
{red, black},
{blue, red},
{blue, black},
{black, red} and
{black, blue}.
In R:
#load library
library(gtools)
#urn with 3 balls
x <- c('red', 'blue', 'black')
#pick 2 balls from the urn with replacement
#get all permutations
permutations(n=3,r=2,v=x)
## [,1] [,2]
## [1,] "black" "blue"
## [2,] "black" "red"
## [3,] "blue" "black"
## [4,] "blue" "red"
## [5,] "red" "black"
## [6,] "red" "blue"
# [,1] [,2]
#number of permutations
nrow(permutations(n=3,r=2,v=x))
## [1] 6
A combination is a selection of items from a collection, such that (unlike permutations) the order of selection does not matter.
For example, given three fruits, say an apple, an orange and a pear, there are three combinations of two that can be drawn from this set:
an apple and a pear; an apple and an orange; or a pear and an orange.
More formally, a \(k\)-combination of a set \(S\) is a subset of \(k\) distinct elements of \(S\).
\({\binom {n}{k}}={\frac {n!}{k!(n-k)!}}\).
Five people are in a club and three are going to be in the ‘planning committee,’ to determine how many different ways this committee can be created we use our combination formula as follows:
\({\binom {5}{3}}={\frac {5!}{3!(5-3)!}} = 10\).
Eleven students put their names on slips of paper inside a box. Three names are going to be taken out. How many different ways can the three names be chosen?
\({\binom {11}{3}}={\frac {11!}{3!(11-3)!}} = 165\).
Over the weekend, your family is going on vacation, and your mom is letting you bring your favorite video game console as well as five of your games. How many ways can you choose the five games if you have 12 games total?
\({\binom {12}{5}}={\frac {12!}{5!(12-5)!}} = 792\).
Suppose we have 12 adults and 10 kids as an audience of a certain show. Find the number of ways the host can select three persons from the audiences to volunteer. The choice must contain two kids and one adult.
The order here does not matter so we have:
\(C (10, 2) * C (12, 1) = [10 * 92] * [121] = 45 * 12 = 540\).
The choose() function computes the combination \(nCr\),
where
choose(n,r)
n: n elements
r: r subset elements
Choose 3 elements from a total of 6 elements:
choose(6,3)
## [1] 20
Let us say there are five flavors of icecream: banana, chocolate, lemon, strawberry and vanilla.
We can have three scoops. How many variations will there be?
Let’s use letters for the flavors: {b, c, l, s, v}. Example selections include
{c, c, c} (3 scoops of chocolate)
{b, l, v} (one each of banana, lemon and vanilla)
{b, v, v} (one of banana, two of vanilla)
(And just to be clear: There are n=5 things to choose from, and we choose r=3 of them. Order does not matter, and we can repeat!)
Now, I can’t describe directly to you how to calculate this, but I can show you a special technique that lets you work it out.
Think about the ice cream being in boxes, we could say “move past the first box, then take 3 scoops, then move along 3 more boxes to the end” and we will have 3 scoops of chocolate!
So it is like we are ordering a robot to get our ice cream, but it doesn’t change anything, we still get what we want.
We can write this down as:
$####Spinner
The expected value of a random variable \(X\) is the sum of the values of the random variable with each value multiplied by its probability of occurrence.
If grades of five students are 65, 76, 88, 34, and 90, then find expected value of mark for a random student.
As discrete values are given, the expected value is the mean of all the values given.
\(E(X)=\frac{65 + 76 + 88 + 34 + 90}{5}=155\)
Take the Midterm examination.
Independent Events:
When two events are said to be independent of each other, what this means is that the probability that one event occurs in no way affects the probability of the other event occurring. An example of two independent events is as follows; say you rolled a die and flipped a coin. The probability of getting any number face on the die in no way influences the probability of getting a head or a tail on the coin.
Dependent Events:
When two events are said to be dependent, the probability of one event occurring influences the likelihood of the other event.
Two events \(A\) and \(B\) are if the occurrence of one affects the occurrence of the other. The probability that \(B\) will occur given that \(A\) has occurred is called the conditional probability of \(B\) given \(A\) and is written \(P(B|A)\).
If \(A\) and \(B\) are dependent events, then the probability that both \(A\) and \(B\) occur is
\(P(A \cap B) = P(A)\cdot P(B|A)\)
We can calculate the chances of two or more independent events by multiplying the chances.
What is the probability of getting 3 Heads in a Row when tossing a coin?
For each toss of a coin a “Head” has a probability of 0.5, and so the probability of getting 3 heads in a row is:
\(\dfrac{1}{2}\cdot\dfrac{1}{2}\cdot\dfrac{1}{2}=\dfrac{1}{8}.\)
You are playing a game that involves spinning the money wheel shown. During your turn you get to spin the wheel twice. What is the probability that you get more than $500 on your first spin and then go bankrupt on your second spin?
Moneywheel
Let event \(A\) be getting more than $500 on the first spin, and let event \(B\) be going bankrupt on the second spin. The two events are independent. So, the probability is:
\(P(A \cap B) = = P(A) • P(B) = \frac{8}{24} \cdot \frac{2}{24} = \frac{1}{36} \approx 0.028\)
During the 1997 baseball season, the Florida Marlins won 5 out of 7 home games and 3 out of 7 away games against the San Francisco Giants. During the 1997 National League Division Series with the Giants, the Marlins played the first two games at home and the third game away. The Marlins won all three games. Estimate the probability of this happening.
Let events \(A\), \(B\), and \(C\) be winning the first, second, and third games. The three events are independent and have experimental probabilities based on the regular season games. So, the probability of winning the first three games is: \(P(A \cap B \cap C) = P(A) \cdot P(B) \cdot P(C) = \frac{5}{7}\cdot \frac{5}{7}\cdot\frac{3}{7} = \frac{75}{343} \approx 0.219\)
A computer chip manufacturer has found that only 1 out of 1000 of its chips is defective. You are ordering a shipment of chips for the computer store where you work. How many chips can you order before the probability that at least one chip is defective reaches 50%?
Let \(n\) be the number of chips you order. From the given information you know that P(chip is not defective) \(=\frac{999}{1000}=0.999.\) Use this probability and the fact that each chip ordered represents an independent event to find the value of \(n\).
P(at least one chip is defective) = 0.5
1 - P(no chips are defective) = 0.5
\(1 - (0.999)^n = 0.5\)
\(-(0.999)^n = -0.5\)
\((0.999)^n = 0.5\)
\(n = \dfrac{log(0.5)}{log(0.999)}\)
\(n \approx 693\)
If you order 693 chips, you have a 50% chance of getting a defective chip. Therefore, you can order 692 chips before the probability that at least one chip is defective reaches 50%.
The table shows the number of endangered and threatened animal species in the United States as of November 30, 1998.
Find
the probability that a listed animal is a reptile and
the probability that an endangered animal is a reptile.
Table 9.5
P(reptile) \(=\dfrac{number\ of\ reptiles }{total\ number \ of \ animals } = \frac{35}{475} \approx 0.0737\)
P(reptile | endangered) \(=\dfrac{number\ of\ endangered \ reptiles }{total\ number \ of \ endangered \ animals } = \frac{14}{355} \approx 0.0394\).
You randomly select two cards from a standard 52-card deck. What is the probability that the first card is not a face card (a king, queen, or jack) and the second card is a face card if
you replace the first card before selecting the second, and
you do not replace the first card?
\(P(A \cap B) = P(A) \cdot P(B) = \frac{40}{52} \cdot \frac{12}{52} = \frac{30}{169} \approx 0.178\).
\(P(A \cap B) = P(A) \cdot P(B|A) = \frac{40}{52} \cdot \frac{12}{51} = \frac{40}{221} \approx 0.181\).
We use the “mosaic”-package for this project. Make sure to call the package:
require(mosaic)
The favstats() function can provide more statistics by group.
favstats(cesd~sex, data=HELPrct)
## sex min Q1 median Q3 max mean sd n missing
## 1 female 3 29 38.0 46.5 60 36.88785 13.01764 107 0
## 2 male 1 24 32.5 40.0 58 31.59827 12.10332 346 0
Boxplots are particularly helpful to compare distributions. The bwplot() function can be used to display the boxplots for the CESD scores separately by sex.
bwplot(sex~cesd, data=HELPrct)
It is clear from the box-and-whiskers plots that females have a higher “cesd”-score.
Bayes’ theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It follows simply from the axioms of conditional probability, but can be used to powerfully reason about a wide range of problems involving belief updates.
Given a hypothesis \(H\) and evidence \(E\) , Bayes’ theorem states that the relationship between the probability of the hypothesis \(P(H)\) before getting the evidence and the probability of the hypothesis after getting the evidence \(P(H|E)\) is:
\(P(H|E)=\dfrac{P(E|H)}{P(E)}*P(H)\)
Many modern machine learning techniques rely on Bayes’ theorem. For instance, spam filters use Bayesian updating to determine whether an email is real or spam, given the words in the email. Additionally, many specific techniques in statistics, such as calculating p-values or interpreting medical results, are best described in terms of how they contribute to updating hypotheses using Bayes’ theorem.
The formula relates the probability of the hypothesis before getting the evidence \(P(H)\) to the probability of the hypothesis after getting the evidence, \(P(H|E)\). For this reason, \(P(H)\) is called the prior probability, while \(P(H|E)\) is called the posterior probability. The factor that relates the two, \(\dfrac{P(E|H)}{P(E)}\) is called the likelihood ratio. Using these terms, Bayes’ theorem can be rephrased as “the posterior probability equals the prior probability times the likelihood ratio.”
If a single card is drawn from a standard deck of playing cards, the probability that the card is a king is 4/52, since there are 4 kings in a standard deck of 52 cards. Rewording this, if \(KING\) is the event “this card is a king,” the prior probability
\(P(KING)= \frac{4}{52}= \frac{1}{13}\).
If evidence is provided (for instance, someone looks at the card) that the single card is a face card, then the posterior probability \(P(KING | FACE)\) can be calculated using Bayes’ theorem:
\(P(KING | FACE)= \dfrac{P(FACE | KING)}{P(FACE)}*P(KING)\)
Since every \(KING\) is also a face card,
\(P(FACE | KING)= 1\)
Since there are 3 face cards in each suit \((JACK, QUEEN, KING)\), the probability of a face card is:
\(P(FACE)=\frac{3}{13}\)
Using Bayes’ theorem gives
\(P(KING | FACE)= \dfrac{P(FACE | KING)}{P(FACE)}*P(KING) = \frac{1}{\frac{3}{13}}*\frac{1}{13}= \frac{1}{3}\)
A probability distribution is a table or an equation that links each outcome of a statistical experiment with its probability of occurrence.
“Bi” means “two” (like a bicycle has two wheels), so this is about experiments with two possible outcomes. In other words, the Binomial distribution is a discrete probability distribution with parameters \(n\) and \(p\), where \(n\) is representing the number of trials attempted and \(p\) represents the probability of success.
The binomial distribution is frequently used to model the number of successes in a sample of size \(n\) drawn with replacement from a population of size \(N\). If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one.
In general, if the random variable \(X\) follows the binomial distribution with parameters \(n \in ℕ\) and \(p \in [0,1]\), we write \(X \sim B(n, p)\). The probability of getting exactly k successes in n trials is given by the probability mass function:
A binomial experiment is a statistical experiment that has the following properties:
The experiment consists of n repeated trials.
Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure.
The probability of success, denoted by P, is the same on every trial.
The trials are independent; that is, the outcome on one trial does not affect the outcome on other trials.
Consider the following statistical experiment. You flip a coin 2 times and count the number of times the coin lands on heads. This is a binomial experiment because:
The experiment consists of repeated trials. We flip a coin 2 times. Each trial can result in just two possible outcomes - heads or tails. The probability of success is constant - 0.5 on every trial. The trials are independent; that is, getting heads on one trial does not affect whether we get heads on other trials.
When you flip a coin, there are two possible outcomes: heads and tails. Each outcome has a fixed probability, the same from trial to trial.
Fig. 10.1
Find the probability that X is greater than 5 and less than 20:
Make the selections:
Fig. 10.1
It is clear that: P(5≤X≤20) = 0.9941.
A Poisson distribution is the probability distribution that results from a Poisson experiment.
A Poisson experiment is a statistical experiment that has the following properties: 1. The experiment results in outcomes that can be classified as successes or failures.
The average number of successes (μ) that occurs in a specified region is known.
The probability that a success will occur is proportional to the size of the region.
The probability that a success will occur in an extremely small region is virtually zero.
Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc.
Fig. 10.3
The Poisson distribution has the following properties:
The mean of the distribution is equal to \(μ\).
The variance is also equal to \(μ\).
It has been observed that the average number of traffic accidents on the Hollywood Freeway between 7 and 8 PM on Wednesday mornings is 1 per hour. What is the chance that there will be 2 accidents on the Freeway, on some specified Wednesday morning?
The basic rate is r = 1 (in hour units), and our window is 1 hour. We wish to know the chance of observing 2 events in that window. The rate r = 1 is included in the Poisson Table, so we don’t have to calculate anything. Reading down the r = 1 column, we come to the p(2) row, and there we find that the probability of 2 accidents is 0.1839, or a little less than 1 chance in 5. It’s not unlikely. You might get that situation about once a week.
Poisson table
Introduction to Normal Distributions and the Standard Normal Distribution The normal distribution is a bell-shaped distribution. A normal distribution is a continuous probability distribution for a random variable x with the following properties:
The mean, median, and mode are equal.
The curve is bell-shaped and symmetric about the mean.
The total area under the curve equals 1.
The curve approaches, but never touches, the \(x\)-axis as it extends farther from the mean.
Between \(\mu - \sigma\) and \(\mu + \sigma\), the graph curves downward. To the left of \(\mu - \sigma\) and to the right of \(\mu + \sigma\), the graph curves upward. The points at which the curve changes from curving upward to downward are called points of inflection. The graph of a normal distribution is called the normal curve. The equation for the curve is:
\(y =\frac{1} {\sigma \cdot \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)
The standard normal distribution is a normal distribution with \(\mu = 0\) and \(\sigma = 1\).
Any observation \(x\) from a normal distribution can be “converted” to data from a standard normal distribution by calculating a \(z\)-score:
\(z = \dfrac{Value - Mean} {Standard \ deviation} = \dfrac{x - \mu}{\sigma}\).
Heights of males at a certain university are approximately normal with a mean of 70.9 inches and a standard deviation of 2.9 inches. Find the \(z\)-score for a male who is 6 feet tall.
First, we need to convert 6 feet to inches, so we want a \(z\)-score for 72 inches.
\(z = \dfrac{72 - 70.9}{2.9} = 0.3793\).
For continuous distributions, the probability that a random variable takes an interval of values is the area under the distribution curve over that interval.
For normal distributions, tables are used to calculate probabilities.
The normal distribution table:
Fig. 11.1
Navigate to:
https://www.geogebra.org/classic/probability
To find \(P(Z\leq 0)\), make sure to select the “Normal Distribution”.
Select the left-sided bracket.
Type \(P(Z\leq 0)\) and it should give the answer as 0.5.
Find the probability that z falls below 2.74.
From the table or by using the GeoGebra applet, we can see that
\(P(Z\leq 2.74)=0.9969\).
Find the probability that z is at least 0.62.
Looking up 0.62 in the table gives us that the probability that z is less than 0.62 is 0.7324, but these are complementary events, so the probability that we want is 1 − 0.7324 = 0.2676. (Note the value in the table for −0.62 is also 0.2676. This happens due to symmetry.)
Find \(P (z ≥ −2.6)\).
Note that \(P (z ≥ −2.6) = 1 − P (z < −2.6) = 1 − 0.0047 = 0.9953\).
Find \(P (−0.24 ≤ z ≤ 0.43)\).
From the table, we know that \(P (z ≤ 0.43) = 0.6664\) and \(P (z ≤ −0.24) = 0.4052\), so the area in between the two values is 0.6664 − 0.4052 = 0.2612.
Find \(P(z = 1)\).
We can think of the above probability as \(P(1 ≤ z ≤ 1)\), then use the reasoning from the last part to get that \(P (z = 1) = 0\). We can also get this answer with the “area under the curve” definition for probability, which leads us to a rectangle of width zero, which has area zero.
Fig. 11.3
Given that \(X\) is a random variable that is normally distributed with \(\mu = 30\) and \(\sigma = 4\). Determine the following:
\(P (30 < x < 35)\).
Here we are simply finding the area under the standard type of normal curve under given conditions.
Now, \(Z = \dfrac{(30−30)}{4} = 0\).
Also, \(Z = \dfrac{(35−30)}{4} = 1.25\).
Thus \(P (30 < x < 35) = P (0 < z < 1.25) = 0.3944\)
Suppose the reaction times of teenage drivers are normally distributed with a mean of 0.53 seconds and a standard deviation of 0.11 seconds.
What is the probability that a teenage driver chosen at random will have a reaction time less than 0.65 seconds?
Find the probability that a teenage driver chosen at random will have a reaction time between 0.4 and 0.6 seconds.
1.The goal is to find \(P(x < 0.65)\).
1.1. The first step is to convert 0.65 to a standard score.
\(z = \dfrac{(x - mean)}{standard \ deviation} = \dfrac{(0.65 - 0.53)}{0.11} = 1.09\).
1.2. The problem now is to find \(P(z < 1.09)\). This is a left tail problem as shown in the illustration to the right.
\(P(z < 1.09) = 0.8621\) (see table or use GeoGebra).
2.The goal is to find P(0.4 < x < 0.6).
2.1. The first step is to convert 0.4 and 0.6 to the corresponding standard scores.
\(z_1 = \dfrac{(x - mean)}{standard \ deviation} = \dfrac{(0.4 - 0.53)}{0.11} = -1.18\)
\(z_2 = \dfrac{(x - mean)}{standard \ deviation} = \dfrac{(0.6 - 0.53)}{0.11} = 0.64\).
2.2. The problem now is to find \(P(-1.18 < z < 0.64)\). This is a “between” problem as shown in the illustration to the right.
\(P(-1.18 < z < 0.64) = P(z < 0.64) - P(z < -1.18) = 0.7389 - 0.1190 = 0.6199\)
Therefore, \(P(0.4 < x < 0.6) = 0.6199\).
Complete Project 3 on Canvas.
Researchers usually cannot make direct observations of every individual in the population they are studying. Instead, they collect data from a subset of individuals – a sample – and use those observations to make inferences about the entire population.
Ideally, the sample corresponds to the larger population on the characteristic(s) of interest. In that case, the researcher’s conclusions from the sample are probably applicable to the entire population.
Sampling Methods can be classified into one of two categories:
Probability Sampling: Sample has a known probability of being selected
Non-probability Sampling: Sample does not have known probability of being selected as in convenience or voluntary response surveys.
Probability Sampling
In probability sampling it is possible to both determine which sampling units belong to which sample and the probability that each sample will be selected. The following sampling methods are examples of probability sampling:
Simple Random Sampling (SRS)
Stratified Sampling
Cluster Sampling
Systematic Sampling
Multistage Sampling (in which some of the methods above are combined in stages)
Stratified Sampling is possible when it makes sense to partition the population into groups based on a factor that may influence the variable that is being measured. These groups are then called strata. An individual group is called a stratum. With stratified sampling one should:
partition the population into groups (strata)
obtain a simple random sample from each group (stratum)
collect data on each sampling unit that was randomly sampled from each group (stratum)
Stratified sampling works best when a heterogeneous population is split into fairly homogeneous groups. Under these conditions, stratification generally produces more precise estimates of the population percents than estimates that would be found from a simple random sample.
Example 12.1:
Fig 12.1
Cluster Sampling is very different from Stratified Sampling. With cluster sampling one should
divide the population into groups (clusters).
obtain a simple random sample of so many clusters from all possible clusters.
obtain data on every sampling unit in each of the randomly selected clusters.
It is important to note that, unlike with the strata in stratified sampling, the clusters should be microcosms, rather than subsections, of the population. Each cluster should be heterogeneous. Additionally, the statistical analysis used with cluster sampling is not only different, but also more complicated than that used with stratified sampling.
Fig 12.2
In the two examples above, stratified sampling would be preferred over cluster sampling, particularly if the questions of interest are affected by time zone. For example the percentage of people watching a live sporting event on television might be highly affected by the time zone they are in. Cluster sampling really works best when there are a reasonable number of clusters relative to the entire population. In this case, selecting 2 clusters from 4 possible clusters really does not provide much advantage over simple random sampling.
The most common method of carrying out a poll today is using Random Digit Dialing in which a machine random dials phone numbers. Some polls go even farther and have a machine conduct the interview itself rather than just dialing the number! Such “robo call polls” can be very biased because they have extremely low response rates (most people don’t like speaking to a machine) and because federal law prevents such calls to cell phones. Since the people who have landline phone service tend to be older than people who have cell phone service only, another potential source of bias is introduced. National polling organizations that use random digit dialing in conducting interviewer based polls are very careful to match the number of landline versus cell phones to the population they are trying to survey.
Non-probability Sampling
The following sampling methods that are types of non-probability sampling that should be avoided:
volunteer samples
haphazard (convenience) samples
Since such non-probability sampling methods are based on human choice rather than random selection, statistical theory cannot explain how they might behave and potential sources of bias are rampant.
More examples of sampling:
Fig 12.3
Complete Online Discussion 5 on CANVAS.
Suppose we have an unknown population parameter, such as a population mean \(\mu\) or a population proportion \(p\), which we’d like to estimate. For example, suppose we are interested in estimating:
\(p\) = the (unknown) proportion of American college students, 18-24, who have a smart phone
\(\mu\) = the (unknown) mean number of days it takes Alzheimer’s patients to achieve certain milestones
In either case, we can’t possibly survey the entire population. That is, we can’t survey all American college students between the ages of 18 and 24. Nor can we survey all patients with Alzheimer’s disease. So, of course, we do what comes naturally and take a random sample from the population, and use the resulting data to estimate the value of the population parameter. Of course, we want the estimate to be “good” in some way.
Statistical inference is the process by which we infer population properties from sample properties.
There are two types of statistical inference:
• Estimation
• Hypotheses Testing
The concepts involved are actually very similar.
A poll may seek to estimate the proportion of adult residents of a city that support a proposition to build a new sports stadium. Out of a random sample of 200 people, 106 say they support the proposition. Thus in the sample, 0.53 of the people supported the proposition. This value of 0.53 is called a point estimate of the population proportion. It is called a point estimate because the estimate consists of a single value or point.
We’re interested in the value of \(\mu\). We collected data and we use the observed \(\bar{x}\) as a point estimate for \(\mu\).
The value we get for \(\bar{X}\) (the sample mean) depends on the specific sample chosen. This means, \(\bar{X}\) is a random variable! The distribution of the random variable \(\bar{X}\) is called the sampling distribution of \(\bar{X}\).
We expect \(\bar{X}\) to be close to \(\mu\) (we ARE using it to estimate \(\mu\)) but there is variability in \(\bar{X}\) before it is observed because we use random sampling to choose our sample of size \(n\).
The Sampling Distribution of \(\bar{X}\) … • Tells us what kind of values are likely to occur for \(\bar{X}\). • Puts a probability distribution over the possible values for \(\bar{X}\).
Turns out the random variable \(\bar{X}\) is normally distributed no matter what your original distribution was IF \(n\) is large enough… What’s large enough? Rule of thumb is \(n ≥ 30\).
Complete Project 4 on CANVAS.
The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about a parameter.
Is there statistical evidence, from a random sample of potential customers, to support the hypothesis that more than 10% of the potential customers will purchase a new product?
The manager of a department store is interested in the cost effectiveness of establishing a new billing system for the store’s credit customers. After a careful analysis, she determines that the new system is justified only if the mean monthly account size is more than $170 . The manager wishes to find out if there is sufficient statistical support for this.
The manager takes a random sample of 400 monthly accounts. The sample mean turns out to be $178. Historical data indicate that the standard deviation of monthly accounts is about $65.
Observe that what we are trying to find out is whether or not there is sufficient support for the hypothesis that the mean monthly accounts are “more than $170.” The standard procedure is then to let \(\mu > 170\) be the alternative hypothesis. For this reason, the alternative hypothesis is also often referred to as the research hypothesis.
It follows that the null hypothesis should be defined as \(\mu = 170\). Note that we do not use \(\mu ≤ 170\) as the null hypothesis; this is because the null hypothesis must be precise enough for us to determine a “unique” sampling distribution. The choice \(\mu = 170\) also gives \(H_0\), our favored assumption, the least probability of being rejected.
Thus,
\(H_0 : \mu = 170\)
\(H_1 : \mu > 170\)
where \(H_1\) is what we want to determine and \(H_0\) specifies a single value for the parameter of interest.
Clearly, if the sample mean is “large” relative to 170, i.e., if
\(X > \bar{X_L}\), for a suitably-chosen control limit \(\bar{X_L}\), then we should reject the null hypothesis in favor of the alternative. Pictorially, this means that for a given \(\alpha\), we wish to find \(\bar{X_L}\), such that:
Fig. 14.1
Formally, from the central limit theorem, we know that if our null hypothesis is true, i.e., if \(\mu = 170\), then
\(P(\bar{X_L} \leq 170 + z_{\alpha}\dfrac{\sigma}{\sqrt{n}})\),
then
\(P(X > \bar{X_L}) = \alpha\) and the rejection region is the interval
$ ({X_L}, )$.
For \(\alpha = 0.05\), we have
\(\bar{X_L}=170 + 1.645 \cdot \dfrac{65}{\sqrt{400}}=175.34.\)
Since the observed sample mean \(\bar{X} = 178\) is greater than 175.34, we reject the null hypothesis in favor of the research hypothesis (which is what we are investigating). In other words, statistical evidence suggests that the installation of the new billing system will be cost effective.