library(tidyverse)
library(openintro)The variables collected on each launch include year range, outcome, and launching agency.
Launch year is a ordinal categorical variable as the year is interpreted in the context of two ranges rather than independently. Launch outcome and launch agency are also categorical but are nominal.
The explanatory variable is the launching agency and the response variable is the success rate (successful launches/total launches).
This would classify as an observational study since no treatment was explicitly applied or withheld. The researchers simply observed results.
The explanatory variables include screen time, sex, age, mothers education, ethnicity, psychological distress, employment.
The response variable is the psychological well-being score.
Yes, the data came from “three nationally representative large-scale data sets”.
No, there is no control group to eliminate confounding variables rendering it impossible to establish casual relationships.
The study will likely provide an representative view of the town as a whole as it is including 200 randomly drawn households from the entire area (Simple Random Sample). While it would likely provide an accurate representation of the town it would be costly to travel to all the houses as they will most likely be spread out in different areas of the suburb.
Similar to the first example this study will be effective at providing an accurate representation of the town as a whole though it uses a stratified sampling approach dividing the town into neighborhoods (which likely share common characteristics). One issue with this approach is the sampling is not random within the neighborhood which presents an element of bias. Also, the study intends to visit all 20 different neighborhoods which will be just as costly to visit all of them as all the houses in the first study.
While this example of cluster sampling prevents the previous issues of sampling a wide area it compromises representation as it only views 3 of the towns 20 neighborhoods. Neighborhoods tend to share similar quality and if 17 are left out the studies results will skew in the direction of the chosen neighborhoods.
This studies technique of multistage cluster sampling works to provide a larger representation of the town by including 8 neighborhoods and sampling 50 households from each neighborhood. This larger number of neighborhoods to be sampled results in more expenses as the study is forced to cover a wider area. This approach effectively combines both the 2nd and 3rd sampling techniques and carriers over some of the pros and cons from both techniques just not as pronounced.
This example of convenience sampling while the most cost effective–very small area near the city council offices–doesn’t provide representation for the town as a whole. It only takes in the viewpoints of the neighborhood(s) near the city council which likely share similarities different than those in other areas of town.
The population of interest is children between the ages of 5 and 15 and the sample is 160 children between the ages of 5 and 15.
The results of the study cannot be generalized to the population as it is a specific group individuals(ages 5 to 15) and it wasn’t explicitly stated that the studies participants were randomly selected. The findings of the study however can be used to establish casual relationships as the groupings within the study were all random.
Percentage of all videos on YouTube that are cat videos is a population parameter.
2% is a simple statistic.
A video in the sample is an observation.
Whether a video is a cat video is a variable.
\(C = 2\times \pi * r\)
\(A = \pi \times r^2\)
rad <- c(0.25, 1.5)
C <- 2*pi*rad
C## [1] 1.570796 9.424778
The circumferences of circles with radii of 0.25 and 1.5 are 1.57 and 9.42 inches respectively.
Example code showing how you can include equations. You will need to replace this with the correct formulas. Can also use % to get a percent sign in an answer. Note that a single $ is used for an inline equation. A double $$ is used if you want the equation centered on it’s own line.
\(xy\) \(x^2\) \[A = 1/2 \times b \times h\]
# We are providing the code for reading in the data and converting
# Sex and MaritalStat to type factor
PDat <- read_csv("Patient_Data.csv")## Rows: 105 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Sex, MaritalStat
## dbl (5): ID, Age, Weight, TotChol, SystolicP
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Create a factor variable for Sex; the levels will be in
# alphabetical order
PDat$Sex <- factor(PDat$Sex)
# Create a factor variable with a specified order to the levels
# of MaritalStat. Marital status is not an ordinal variable,
# but you can still set the order of the levels to control
# the order they are printed in output
PDat$MaritalStat <- factor(PDat$MaritalStat,
level=c("S","M","D","W"))
##############
# The code on lines 77 - 79 demonstrate the use of functions
# you will need for parts a) - c). Copy/paste and edit the code
# as needed for each part of the problem. After you figure out
# the code you need for each part, comment out or remove these
# lines so they do not clutter up your assignment with extra output.
PDat %>% count(MaritalStat)## # A tibble: 4 × 2
## MaritalStat n
## <fct> <int>
## 1 S 30
## 2 M 34
## 3 D 25
## 4 W 16
PDat.Married <- PDat %>% filter(MaritalStat == "M")
percent_married <- nrow(PDat.Married) / nrow(PDat)
#Percent Married
round(percent_married, 2)## [1] 0.32
glimpse(PDat.Married)## Rows: 34
## Columns: 7
## $ ID <dbl> 8, 12, 16, 17, 19, 22, 24, 25, 29, 30, 31, 34, 35, 38, 40,…
## $ Sex <fct> F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, M, F, F, F…
## $ MaritalStat <fct> M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M…
## $ Age <dbl> 56, 36, 32, 30, 34, 61, 40, 55, 29, 41, 38, 53, 38, 46, 56…
## $ Weight <dbl> 118, 121, 127, 130, 130, 133, 135, 135, 138, 139, 139, 144…
## $ TotChol <dbl> 233, 233, 193, 198, 219, 240, 225, 193, 206, 230, 234, 230…
## $ SystolicP <dbl> 111, 138, 124, 115, 130, 107, 150, 115, 131, 112, 113, 116…
#Widowed Patients (Part 2)
PDat.Widowed <- PDat %>% filter(MaritalStat == "W")
glimpse(PDat.Widowed)## Rows: 16
## Columns: 7
## $ ID <dbl> 6, 10, 13, 15, 20, 28, 36, 37, 39, 45, 58, 65, 83, 87, 91,…
## $ Sex <fct> F, F, F, F, F, F, F, F, F, F, M, M, M, M, M, M
## $ MaritalStat <fct> W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W
## $ Age <dbl> 40, 53, 63, 58, 37, 21, 70, 62, 39, 79, 61, 54, 30, 33, 51…
## $ Weight <dbl> 113, 119, 124, 125, 131, 137, 150, 151, 151, 162, 178, 186…
## $ TotChol <dbl> 194, 280, 184, 246, 180, 211, 232, 235, 191, 189, 181, 197…
## $ SystolicP <dbl> 138, 120, 149, 137, 135, 131, 119, 129, 139, 125, 121, 124…
#Percent widowed female
percentWidowedFemale <- nrow(PDat.Widowed %>% filter(Sex == "F")) / nrow(PDat.Widowed)
round(percentWidowedFemale, 2)## [1] 0.62
PDat %>% count(MaritalStat)## # A tibble: 4 × 2
## MaritalStat n
## <fct> <int>
## 1 S 30
## 2 M 34
## 3 D 25
## 4 W 16
PDat.Married <- PDat %>% filter(MaritalStat == "M")
percent_married <- nrow(PDat.Married) / nrow(PDat)
#Percent Married
round(percent_married, 2)## [1] 0.32
glimpse(PDat.Married)## Rows: 34
## Columns: 7
## $ ID <dbl> 8, 12, 16, 17, 19, 22, 24, 25, 29, 30, 31, 34, 35, 38, 40,…
## $ Sex <fct> F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, F, M, F, F, F…
## $ MaritalStat <fct> M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M, M…
## $ Age <dbl> 56, 36, 32, 30, 34, 61, 40, 55, 29, 41, 38, 53, 38, 46, 56…
## $ Weight <dbl> 118, 121, 127, 130, 130, 133, 135, 135, 138, 139, 139, 144…
## $ TotChol <dbl> 233, 233, 193, 198, 219, 240, 225, 193, 206, 230, 234, 230…
## $ SystolicP <dbl> 111, 138, 124, 115, 130, 107, 150, 115, 131, 112, 113, 116…
PDat.Widowed <- PDat %>% filter(MaritalStat == "W")
glimpse(PDat.Widowed)## Rows: 16
## Columns: 7
## $ ID <dbl> 6, 10, 13, 15, 20, 28, 36, 37, 39, 45, 58, 65, 83, 87, 91,…
## $ Sex <fct> F, F, F, F, F, F, F, F, F, F, M, M, M, M, M, M
## $ MaritalStat <fct> W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W
## $ Age <dbl> 40, 53, 63, 58, 37, 21, 70, 62, 39, 79, 61, 54, 30, 33, 51…
## $ Weight <dbl> 113, 119, 124, 125, 131, 137, 150, 151, 151, 162, 178, 186…
## $ TotChol <dbl> 194, 280, 184, 246, 180, 211, 232, 235, 191, 189, 181, 197…
## $ SystolicP <dbl> 138, 120, 149, 137, 135, 131, 119, 129, 139, 125, 121, 124…
PDat.Widowed <- PDat %>% filter(MaritalStat == "W")
glimpse(PDat.Widowed)## Rows: 16
## Columns: 7
## $ ID <dbl> 6, 10, 13, 15, 20, 28, 36, 37, 39, 45, 58, 65, 83, 87, 91,…
## $ Sex <fct> F, F, F, F, F, F, F, F, F, F, M, M, M, M, M, M
## $ MaritalStat <fct> W, W, W, W, W, W, W, W, W, W, W, W, W, W, W, W
## $ Age <dbl> 40, 53, 63, 58, 37, 21, 70, 62, 39, 79, 61, 54, 30, 33, 51…
## $ Weight <dbl> 113, 119, 124, 125, 131, 137, 150, 151, 151, 162, 178, 186…
## $ TotChol <dbl> 194, 280, 184, 246, 180, 211, 232, 235, 191, 189, 181, 197…
## $ SystolicP <dbl> 138, 120, 149, 137, 135, 131, 119, 129, 139, 125, 121, 124…
#Percent widowed female
percentWidowedFemale <- nrow(PDat.Widowed %>% filter(Sex == "F")) / nrow(PDat.Widowed)
round(percentWidowedFemale, 2)## [1] 0.62
This would classify as an experiment as the researchers proposed several different name changes and asked participants to select one of the variables.
Individuals connected to schizophrenia (patients, family members, mental health providers, researchers, and government officials).
The researchers sought out those with connections to schizophrenia online, in-person, and by way of mouth. The recruiters utilized mental health facilities, conferences, and social media outlets to gain participants. The sample size was about 1,200.
74% of participants favored a new name in principle after hearing the proposed new names. After being asked the first time the percentage was 69.
Two limitations in the study include, a relatively small amount of minority participation, and a preponderance of female responders.
The inclusion of limitations ensures the study is not a misrepresentation of the target population. It ensures readers understand possible flaws in the study and how those flaws may impact results and conclusions.