STAT 488 Sampling HW 1

Question One

Problem 1.

Target Population: PARADE Readers

Sampling Frame: List of individuals who received and viewed page 5 of the June 12th Issue

Sampling Unit: Recipients of magazine

Observation Unit: Readers who dialed in

Selection Bias/Inaccuracy of Responses:

A large part of the target population may not have received the current issue or if they did, they may not have read that issue. Often times, people recieve magazines they don’t even read. This leaves out a large population. Additionally, readers were asked to call in/opt-in to the survey AND pay a fee, which may have discouraged many of the readers and only a wealthier population or readers who care more about the subject may be represented. The article claims that 75% of the readers who took part in the survey had this opinion, but there are many individuals who decided not to opt-in because they did not feel as strongly about the subject or did not wish to pay. Lastly, the survey lasted only for 4 days and readers may not have had the opportunity to read all of the survey.

Problem 5.

Target Population: Homeless People who are mentally ill

Sampling Frame: List of homeless people who received medical attention from one of the clinics in the HCH project

Sampling Unit: Healthcare for the Homeless Clinic

Observation Unit: Homeless people

Selection Bias/Inaccuracy of Responses:

The one clinic sampled from the Health Care for the Homeless Project might not be a representative sample of all homeless people. This sample, while it may be representative demographic wise to the general homeless population near that city, it does not account for other cities. This particular clinic may have a higher or lower rate of mental illness.

Problem 7.

Target Population: All cows

Sampling Frame: List of all farms

Sampling Unit: Farm

Observation Unit: Cow

Selection Bias/Inaccuracy of Responses:

The sample of 50 farms, while randomized, may still be too small to represent all cows. Also, the weight of the cow may be dependent on their environments. If there is a shortage of cow food in a certain region and that farm is selected, it may skew the data.

Problem 18.

Target Population: Ann Lander’s Column Readers

Sampling Frame: List of people who read this column

Sampling Unit: Readers

Observation Unit: Readers

Selection Bias/Inaccuracy of Responses:

The question is not worded as well as it could be. “Do it over again” does not clearly explain the intent of the question. Additionally, only individuals with strong opinions may answer this question. Additionally, Ann Lander’s column readers is the target population she is trying to read, but the response rate is skewed towards women and it is unclear what the sex makeup of her readers is.

Problem 22.

Target Population: All Women

Sampling Frame: List of individuals who received the poll by the Harris Organization for Virginia Slims

Sampling Unit: Poll takers

Observation Unit: women

Selection Bias/Inaccuracy of Responses:

The question is framed as a statement instead of a question, making it difficult and confusing to respond to, which may deter poll takers. Additionally, a poll given by a cigarette company that markets to women definitely does not capture all women as there are many women who do not smoke. There may be men in the poll as well and it is unclear what percentage of the poll takers make up the ratio between men and women respondees.

Question Two

DO NOT COMPLETE RIGHT NOW- Could not find the article to read!

Question Three

https://www.businesswire.com/news/home/20180319005643/en/New-Survey-Finds-95-Percent-Shoppers-Left

HRC Retail Advisory’s survey was conducted online February 20 – March 7, 2018. The total sample size was 2,903 U.S. and Canadian consumers ages 10–73 (with those ages 10–17 recruited to participate through their parents).

Target Population: Consumers

Sampled Population: List of individuals who received the online survey

Conculsions: 95% percent of consumers want to be left alone while shopping unless they need a store associate’s help, according to a new consumer survey by HRC Retail Advisory. 85% of consumers surveyed want to be able to check prices at price scanners throughout a store rather than having to ask a sales associate for pricing information. Further, 69% of shoppers said that being able to order a technology product online and then pick it up in store is important (likely where they can see it and test it before buying), with a similar 65% saying it is important for apparel. Nearly 70% of Generation Z and 63% of Millennial respondents are turning to social media to share pictures and gather opinions from their friends and family before they buy, particularly in apparel. Free in-store Wi-Fi was ranked as important by 30% of respondents overall, and the rate was higher among younger generations, who tend to seek opinions from their social networks and share photos via social media when they shop.

Conclusions Reasonable/Unreasonable: The conclusions they have provided, assuming the survey was conducted to minimize bias with a large enough response rate, seems plausible. With increasing trends in technology, it seems logical that a large amount of consumers would prefer to check their price at a price scanner rather than having to ask a sales associate. However, that is my opinion as a a young woman part of the Millenial generation.

Sources of Bias: The survey results could be biased in multiple ways. First, they have surveyed a wide range of individuals, but it unclear if the sample was distributed evenly or if the survey was weighted correctly. Millenials and Generation Z may have stronger opinions on technology because they have been growing up with more and more technology. Secondly, the survey was conducted online. It is unclear if it was distibuted by email or if it was an “opt-in” survey which may result in skewed data. We may need more information to decipher if this survey was conducted justly.

Question Four

Part A.

kcdata <- c(98,102,154,133,190,175)
y_U <- sum(kcdata)/6
y_U

## [1] 142

Part B. Data/Pre-Work

y1 <- 98
y2 <- 102
y3 <- 154
y4 <- 133
y5 <- 190
y6 <- 175

#Plan 1 Sampling Distribution

S11ybar <- (y1+y3+y5)/3
S12ybar <- (y1+y3+y6)/3
S13ybar <- (y1+y4+y5)/3
S14ybar <- (y1+y4+y6)/3
S15ybar <- (y2+y3+y5)/3
S16ybar <- (y2+y3+y6)/3
S17ybar <- (y2+y4+y5)/3
S18ybar <- (y2+y4+y6)/3

p <- 0.125

x <- c(S11ybar, S12ybar, S13ybar, S14ybar, S15ybar, S16ybar, S17ybar, S18ybar)
y <- c(p, p, p, p, p, p, p, p)
cbind(x,y)

##             x     y
## [1,] 147.3333 0.125
## [2,] 142.3333 0.125
## [3,] 140.3333 0.125
## [4,] 135.3333 0.125
## [5,] 148.6667 0.125
## [6,] 143.6667 0.125
## [7,] 141.6667 0.125
## [8,] 136.6667 0.125

#Plan 2 Sampling Distribution

S21ybar <- (y1+y4+y6)/3
S22ybar <- (y2+y3+y6)/3
S23ybar <- (y1+y3+y5)/3

p2 <- 0.25

x2 <- c(S21ybar, S22ybar, S23ybar)
y2 <- c(p2, 2*p2, p2)
cbind(x2,y2)

##            x2   y2
## [1,] 135.3333 0.25
## [2,] 143.6667 0.50
## [3,] 147.3333 0.25

Part B. i.

###Find E[y_bar] = Sum of y_bars*p

#Plan 1
E_Y_bar1<- weighted.mean(x,y)
E_Y_bar1

## [1] 142

#Plan 2
E_Y_bar2<- weighted.mean(x2,y2)
E_Y_bar2

## [1] 142.5

Part B. ii.

###Find V[y_bar] = E(y^2) - (E(y))^2, BUT E(y^2) = Sum of (y_bars^2)*p

#Plan 1
S1_yb_sq <- x^2
E_Ysq <- weighted.mean(x^2,y)
V1_Y<- E_Ysq -(E_Y_bar1)^2
V1_Y

## [1] 18.94444

#Plan 2
S2_yb_sq <- x2^2
E_Y2sq <- weighted.mean(x2^2,y2)
V2_Y<- E_Y2sq -(E_Y_bar2)^2
V2_Y

## [1] 19.36111

Part B. iii.

###Find Bias. This is simply E[y_bar] - y_bar_U 

###Plan 1 
B1 <- E_Y_bar1-y_U
B1

## [1] 0

###Plan 2 
B2<-E_Y_bar2-y_U
B2

## [1] 0.5

Part B. iv.

###Find MSE. This is simple as well. V[y_bar] - (y_bar^2)

###Plan 1
MSE_1 <- V1_Y + B1
MSE_1

## [1] 18.94444

###Plan 2
MSE_2 <- V2_Y + B2
MSE_2

## [1] 19.86111

Part C.

Plan 1 gives an unbiased estimator. Plan 2 has a y_bar with less variability and smaller MSE. It depends, but I suppose I would pick plan 2.

Question 5.

In order to be a Simple Random Sample, each unit must possess the same probability of being chosen. This example makes the assumption that all books have the same probability of being chosen in this experiment. This is not true as bigger or wider books have a much larger probability of being chosen.

Question 6.

Based on the results below, the precision of SRS Design 3 of size 3000 from population size of 300000000 would be best as it has the lowest variance.

V1 <- (1/400)*(1-(400/4000))
V1

## [1] 0.00225

V2 <-(1/30)*(1-(30/300))
V2

## [1] 0.03

V3 <-(1/3000)*(1-(3000/300000000))
V3

## [1] 0.00033333

Question 7. Data/Pre-Work

#"SDaA" package contains the data sets for the Lohr book.
#Install the package if not already installed
if (!"SDaA"%in%installed.packages()[,1]){
install.packages("SDaA")
}
library(SDaA)
data("golfsrs")

Part A.

It looks like the weekday green fees for nine holes of golf are positively skewed. It seems most individuals are paying between 0 and 40.

hist(golfsrs$wkday9, main= "Histogram Weekday 9 Holes")

Part B.

The average weekday greens fees for nine holes of golf is standard error is 20.1533. The SE is 1.629619.

y_bar_golf <- mean(golfsrs$wkday9)
y_bar_golf

## [1] 20.15333

golfsrs$wkday9

##   [1]  25.00  24.00  10.00  37.00  10.00  12.00   8.00  40.00   5.00  23.50
##  [11]  40.00  12.00  35.00  12.00  10.00  20.00   5.00  15.00   9.00  20.00
##  [21]  10.00  18.00  20.00  10.25  18.00   7.00  10.00   8.00  10.00   9.00
##  [31]  10.00  30.00  75.00   6.00  50.00  11.50  40.00   6.00  15.00   6.00
##  [41]   8.50   3.25  14.00 101.00  22.50   9.50   8.00  30.00   7.00  15.00
##  [51]  15.00   9.60  50.00  30.00  20.00   8.50   3.00  16.00  12.00  40.00
##  [61]  12.00   8.00  10.00  12.00  40.00  15.00  35.00  25.00  10.80   9.75
##  [71]  30.00  30.00  18.00  14.00  20.00   7.50  13.00  27.00  75.00  75.00
##  [81]  12.00  28.00   9.00   9.50   7.50  16.00   7.00  22.50 100.00  20.00
##  [91]  35.00  25.00  11.00  20.00   9.00   7.00  12.00  20.00   9.50   9.50
## [101]  30.00  10.00  50.00   8.00  10.00  18.00  12.00  12.00  11.50  25.00
## [111]  10.00   9.00  40.00   7.00  10.00  50.00  40.00   5.25   8.00  11.00

diff_g<-golfsrs$wkday9-y_bar_golf 
diff_g

##   [1]   4.8466667   3.8466667 -10.1533333  16.8466667 -10.1533333
##   [6]  -8.1533333 -12.1533333  19.8466667 -15.1533333   3.3466667
##  [11]  19.8466667  -8.1533333  14.8466667  -8.1533333 -10.1533333
##  [16]  -0.1533333 -15.1533333  -5.1533333 -11.1533333  -0.1533333
##  [21] -10.1533333  -2.1533333  -0.1533333  -9.9033333  -2.1533333
##  [26] -13.1533333 -10.1533333 -12.1533333 -10.1533333 -11.1533333
##  [31] -10.1533333   9.8466667  54.8466667 -14.1533333  29.8466667
##  [36]  -8.6533333  19.8466667 -14.1533333  -5.1533333 -14.1533333
##  [41] -11.6533333 -16.9033333  -6.1533333  80.8466667   2.3466667
##  [46] -10.6533333 -12.1533333   9.8466667 -13.1533333  -5.1533333
##  [51]  -5.1533333 -10.5533333  29.8466667   9.8466667  -0.1533333
##  [56] -11.6533333 -17.1533333  -4.1533333  -8.1533333  19.8466667
##  [61]  -8.1533333 -12.1533333 -10.1533333  -8.1533333  19.8466667
##  [66]  -5.1533333  14.8466667   4.8466667  -9.3533333 -10.4033333
##  [71]   9.8466667   9.8466667  -2.1533333  -6.1533333  -0.1533333
##  [76] -12.6533333  -7.1533333   6.8466667  54.8466667  54.8466667
##  [81]  -8.1533333   7.8466667 -11.1533333 -10.6533333 -12.6533333
##  [86]  -4.1533333 -13.1533333   2.3466667  79.8466667  -0.1533333
##  [91]  14.8466667   4.8466667  -9.1533333  -0.1533333 -11.1533333
##  [96] -13.1533333  -8.1533333  -0.1533333 -10.6533333 -10.6533333
## [101]   9.8466667 -10.1533333  29.8466667 -12.1533333 -10.1533333
## [106]  -2.1533333  -8.1533333  -8.1533333  -8.6533333   4.8466667
## [111] -10.1533333 -11.1533333  19.8466667 -13.1533333 -10.1533333
## [116]  29.8466667  19.8466667 -14.9033333 -12.1533333  -9.1533333

diff_sq<-(diff_g)^2
sum_diff <- sum(diff_sq)
asd <- sum_diff/120
sqrt(asd)

## [1] 17.85158

sqrt(asd)/(sqrt(120))

## [1] 1.629619

Question 8

The probability that an SRS of size 300 would have no missing data is 0.1416.

library(gmp)

## 
## Attaching package: 'gmp'

## The following objects are masked from 'package:base':
## 
##     %*%, apply, crossprod, matrix, tcrossprod

chooseZ(3059,300)/chooseZ(3078,300)

## Big Rational ('bigq') :
## [1] 71016485063544213774143698593961946003/501379873065078909152598946182976157664

7.1016485063544213774143698593961946003/50.1379873065078909152598946182976157664

## [1] 0.1416421

STAT 488 Sampling HW 1

Kajal Chokshi

9/3/2018

Question One

Problem 1.

Selection Bias/Inaccuracy of Responses:

Problem 5.

Selection Bias/Inaccuracy of Responses:

Problem 7.

Selection Bias/Inaccuracy of Responses:

Problem 18.

Selection Bias/Inaccuracy of Responses:

Problem 22.

Selection Bias/Inaccuracy of Responses:

Question Two

Question Three

Question Four

Part A.

Part B. Data/Pre-Work

Part B. i.

Part B. ii.

Part B. iii.

Part B. iv.

Part C.

Question 5.

Question 6.

Question 7. Data/Pre-Work

Part A.

Part B.

Question 8