Target Population: PARADE Readers
Sampling Frame: List of individuals who received and viewed page 5 of the June 12th Issue
Sampling Unit: Recipients of magazine
Observation Unit: Readers who dialed in
A large part of the target population may not have received the current issue or if they did, they may not have read that issue. Often times, people recieve magazines they don’t even read. This leaves out a large population. Additionally, readers were asked to call in/opt-in to the survey AND pay a fee, which may have discouraged many of the readers and only a wealthier population or readers who care more about the subject may be represented. The article claims that 75% of the readers who took part in the survey had this opinion, but there are many individuals who decided not to opt-in because they did not feel as strongly about the subject or did not wish to pay. Lastly, the survey lasted only for 4 days and readers may not have had the opportunity to read all of the survey.
Target Population: Homeless People who are mentally ill
Sampling Frame: List of homeless people who received medical attention from one of the clinics in the HCH project
Sampling Unit: Healthcare for the Homeless Clinic
Observation Unit: Homeless people
The one clinic sampled from the Health Care for the Homeless Project might not be a representative sample of all homeless people. This sample, while it may be representative demographic wise to the general homeless population near that city, it does not account for other cities. This particular clinic may have a higher or lower rate of mental illness.
Target Population: All cows
Sampling Frame: List of all farms
Sampling Unit: Farm
Observation Unit: Cow
The sample of 50 farms, while randomized, may still be too small to represent all cows. Also, the weight of the cow may be dependent on their environments. If there is a shortage of cow food in a certain region and that farm is selected, it may skew the data.
Target Population: Ann Lander’s Column Readers
Sampling Frame: List of people who read this column
Sampling Unit: Readers
Observation Unit: Readers
The question is not worded as well as it could be. “Do it over again” does not clearly explain the intent of the question. Additionally, only individuals with strong opinions may answer this question. Additionally, Ann Lander’s column readers is the target population she is trying to read, but the response rate is skewed towards women and it is unclear what the sex makeup of her readers is.
Target Population: All Women
Sampling Frame: List of individuals who received the poll by the Harris Organization for Virginia Slims
Sampling Unit: Poll takers
Observation Unit: women
The question is framed as a statement instead of a question, making it difficult and confusing to respond to, which may deter poll takers. Additionally, a poll given by a cigarette company that markets to women definitely does not capture all women as there are many women who do not smoke. There may be men in the poll as well and it is unclear what percentage of the poll takers make up the ratio between men and women respondees.
DO NOT COMPLETE RIGHT NOW- Could not find the article to read!
https://www.businesswire.com/news/home/20180319005643/en/New-Survey-Finds-95-Percent-Shoppers-Left
HRC Retail Advisory’s survey was conducted online February 20 – March 7, 2018. The total sample size was 2,903 U.S. and Canadian consumers ages 10–73 (with those ages 10–17 recruited to participate through their parents).
Target Population: Consumers
Sampled Population: List of individuals who received the online survey
Conculsions: 95% percent of consumers want to be left alone while shopping unless they need a store associate’s help, according to a new consumer survey by HRC Retail Advisory. 85% of consumers surveyed want to be able to check prices at price scanners throughout a store rather than having to ask a sales associate for pricing information. Further, 69% of shoppers said that being able to order a technology product online and then pick it up in store is important (likely where they can see it and test it before buying), with a similar 65% saying it is important for apparel. Nearly 70% of Generation Z and 63% of Millennial respondents are turning to social media to share pictures and gather opinions from their friends and family before they buy, particularly in apparel. Free in-store Wi-Fi was ranked as important by 30% of respondents overall, and the rate was higher among younger generations, who tend to seek opinions from their social networks and share photos via social media when they shop.
Conclusions Reasonable/Unreasonable: The conclusions they have provided, assuming the survey was conducted to minimize bias with a large enough response rate, seems plausible. With increasing trends in technology, it seems logical that a large amount of consumers would prefer to check their price at a price scanner rather than having to ask a sales associate. However, that is my opinion as a a young woman part of the Millenial generation.
Sources of Bias: The survey results could be biased in multiple ways. First, they have surveyed a wide range of individuals, but it unclear if the sample was distributed evenly or if the survey was weighted correctly. Millenials and Generation Z may have stronger opinions on technology because they have been growing up with more and more technology. Secondly, the survey was conducted online. It is unclear if it was distibuted by email or if it was an “opt-in” survey which may result in skewed data. We may need more information to decipher if this survey was conducted justly.
kcdata <- c(98,102,154,133,190,175)
y_U <- sum(kcdata)/6
y_U
## [1] 142
y1 <- 98
y2 <- 102
y3 <- 154
y4 <- 133
y5 <- 190
y6 <- 175
#Plan 1 Sampling Distribution
S11ybar <- (y1+y3+y5)/3
S12ybar <- (y1+y3+y6)/3
S13ybar <- (y1+y4+y5)/3
S14ybar <- (y1+y4+y6)/3
S15ybar <- (y2+y3+y5)/3
S16ybar <- (y2+y3+y6)/3
S17ybar <- (y2+y4+y5)/3
S18ybar <- (y2+y4+y6)/3
p <- 0.125
x <- c(S11ybar, S12ybar, S13ybar, S14ybar, S15ybar, S16ybar, S17ybar, S18ybar)
y <- c(p, p, p, p, p, p, p, p)
cbind(x,y)
## x y
## [1,] 147.3333 0.125
## [2,] 142.3333 0.125
## [3,] 140.3333 0.125
## [4,] 135.3333 0.125
## [5,] 148.6667 0.125
## [6,] 143.6667 0.125
## [7,] 141.6667 0.125
## [8,] 136.6667 0.125
#Plan 2 Sampling Distribution
S21ybar <- (y1+y4+y6)/3
S22ybar <- (y2+y3+y6)/3
S23ybar <- (y1+y3+y5)/3
p2 <- 0.25
x2 <- c(S21ybar, S22ybar, S23ybar)
y2 <- c(p2, 2*p2, p2)
cbind(x2,y2)
## x2 y2
## [1,] 135.3333 0.25
## [2,] 143.6667 0.50
## [3,] 147.3333 0.25
###Find E[y_bar] = Sum of y_bars*p
#Plan 1
E_Y_bar1<- weighted.mean(x,y)
E_Y_bar1
## [1] 142
#Plan 2
E_Y_bar2<- weighted.mean(x2,y2)
E_Y_bar2
## [1] 142.5
###Find V[y_bar] = E(y^2) - (E(y))^2, BUT E(y^2) = Sum of (y_bars^2)*p
#Plan 1
S1_yb_sq <- x^2
E_Ysq <- weighted.mean(x^2,y)
V1_Y<- E_Ysq -(E_Y_bar1)^2
V1_Y
## [1] 18.94444
#Plan 2
S2_yb_sq <- x2^2
E_Y2sq <- weighted.mean(x2^2,y2)
V2_Y<- E_Y2sq -(E_Y_bar2)^2
V2_Y
## [1] 19.36111
###Find Bias. This is simply E[y_bar] - y_bar_U
###Plan 1
B1 <- E_Y_bar1-y_U
B1
## [1] 0
###Plan 2
B2<-E_Y_bar2-y_U
B2
## [1] 0.5
###Find MSE. This is simple as well. V[y_bar] - (y_bar^2)
###Plan 1
MSE_1 <- V1_Y + B1
MSE_1
## [1] 18.94444
###Plan 2
MSE_2 <- V2_Y + B2
MSE_2
## [1] 19.86111
Plan 1 gives an unbiased estimator. Plan 2 has a y_bar with less variability and smaller MSE. It depends, but I suppose I would pick plan 2.
In order to be a Simple Random Sample, each unit must possess the same probability of being chosen. This example makes the assumption that all books have the same probability of being chosen in this experiment. This is not true as bigger or wider books have a much larger probability of being chosen.
Based on the results below, the precision of SRS Design 3 of size 3000 from population size of 300000000 would be best as it has the lowest variance.
V1 <- (1/400)*(1-(400/4000))
V1
## [1] 0.00225
V2 <-(1/30)*(1-(30/300))
V2
## [1] 0.03
V3 <-(1/3000)*(1-(3000/300000000))
V3
## [1] 0.00033333
#"SDaA" package contains the data sets for the Lohr book.
#Install the package if not already installed
if (!"SDaA"%in%installed.packages()[,1]){
install.packages("SDaA")
}
library(SDaA)
data("golfsrs")
It looks like the weekday green fees for nine holes of golf are positively skewed. It seems most individuals are paying between 0 and 40.
hist(golfsrs$wkday9, main= "Histogram Weekday 9 Holes")
The average weekday greens fees for nine holes of golf is standard error is 20.1533. The SE is 1.629619.
y_bar_golf <- mean(golfsrs$wkday9)
y_bar_golf
## [1] 20.15333
golfsrs$wkday9
## [1] 25.00 24.00 10.00 37.00 10.00 12.00 8.00 40.00 5.00 23.50
## [11] 40.00 12.00 35.00 12.00 10.00 20.00 5.00 15.00 9.00 20.00
## [21] 10.00 18.00 20.00 10.25 18.00 7.00 10.00 8.00 10.00 9.00
## [31] 10.00 30.00 75.00 6.00 50.00 11.50 40.00 6.00 15.00 6.00
## [41] 8.50 3.25 14.00 101.00 22.50 9.50 8.00 30.00 7.00 15.00
## [51] 15.00 9.60 50.00 30.00 20.00 8.50 3.00 16.00 12.00 40.00
## [61] 12.00 8.00 10.00 12.00 40.00 15.00 35.00 25.00 10.80 9.75
## [71] 30.00 30.00 18.00 14.00 20.00 7.50 13.00 27.00 75.00 75.00
## [81] 12.00 28.00 9.00 9.50 7.50 16.00 7.00 22.50 100.00 20.00
## [91] 35.00 25.00 11.00 20.00 9.00 7.00 12.00 20.00 9.50 9.50
## [101] 30.00 10.00 50.00 8.00 10.00 18.00 12.00 12.00 11.50 25.00
## [111] 10.00 9.00 40.00 7.00 10.00 50.00 40.00 5.25 8.00 11.00
diff_g<-golfsrs$wkday9-y_bar_golf
diff_g
## [1] 4.8466667 3.8466667 -10.1533333 16.8466667 -10.1533333
## [6] -8.1533333 -12.1533333 19.8466667 -15.1533333 3.3466667
## [11] 19.8466667 -8.1533333 14.8466667 -8.1533333 -10.1533333
## [16] -0.1533333 -15.1533333 -5.1533333 -11.1533333 -0.1533333
## [21] -10.1533333 -2.1533333 -0.1533333 -9.9033333 -2.1533333
## [26] -13.1533333 -10.1533333 -12.1533333 -10.1533333 -11.1533333
## [31] -10.1533333 9.8466667 54.8466667 -14.1533333 29.8466667
## [36] -8.6533333 19.8466667 -14.1533333 -5.1533333 -14.1533333
## [41] -11.6533333 -16.9033333 -6.1533333 80.8466667 2.3466667
## [46] -10.6533333 -12.1533333 9.8466667 -13.1533333 -5.1533333
## [51] -5.1533333 -10.5533333 29.8466667 9.8466667 -0.1533333
## [56] -11.6533333 -17.1533333 -4.1533333 -8.1533333 19.8466667
## [61] -8.1533333 -12.1533333 -10.1533333 -8.1533333 19.8466667
## [66] -5.1533333 14.8466667 4.8466667 -9.3533333 -10.4033333
## [71] 9.8466667 9.8466667 -2.1533333 -6.1533333 -0.1533333
## [76] -12.6533333 -7.1533333 6.8466667 54.8466667 54.8466667
## [81] -8.1533333 7.8466667 -11.1533333 -10.6533333 -12.6533333
## [86] -4.1533333 -13.1533333 2.3466667 79.8466667 -0.1533333
## [91] 14.8466667 4.8466667 -9.1533333 -0.1533333 -11.1533333
## [96] -13.1533333 -8.1533333 -0.1533333 -10.6533333 -10.6533333
## [101] 9.8466667 -10.1533333 29.8466667 -12.1533333 -10.1533333
## [106] -2.1533333 -8.1533333 -8.1533333 -8.6533333 4.8466667
## [111] -10.1533333 -11.1533333 19.8466667 -13.1533333 -10.1533333
## [116] 29.8466667 19.8466667 -14.9033333 -12.1533333 -9.1533333
diff_sq<-(diff_g)^2
sum_diff <- sum(diff_sq)
asd <- sum_diff/120
sqrt(asd)
## [1] 17.85158
sqrt(asd)/(sqrt(120))
## [1] 1.629619
The probability that an SRS of size 300 would have no missing data is 0.1416.
library(gmp)
##
## Attaching package: 'gmp'
## The following objects are masked from 'package:base':
##
## %*%, apply, crossprod, matrix, tcrossprod
chooseZ(3059,300)/chooseZ(3078,300)
## Big Rational ('bigq') :
## [1] 71016485063544213774143698593961946003/501379873065078909152598946182976157664
7.1016485063544213774143698593961946003/50.1379873065078909152598946182976157664
## [1] 0.1416421