HW2 – Statistics C155

Problem 1

A sample of 8 architects was chosen in a city with 14 architects and architectural firms in order to estimate the average annual income of architects. To select a survey sample, each architect was contacted by telephone in order of appearance in the telephone directory. The first 8 agreeing to be interviewed formed the sample.

  1. What is the population?

def. A population is a collection of elements about which we wish to make an inference.

[sol] Therefore, the population is: all architects that exist.

  1. Do you agree with this method of selecting the sample?

def. A survey sample in statistics describes the process of selecting a sample of elements from a target population to conduct a survey.

[sol] No, I do not. This is considered a convenience sample.

There are several key-issues :

  • Non-random : “contacted by telephone in order of appearance” – ie, the sample itself wasnt generated randomly which is an underlying assumption of just about any statistical test; can’t do statistics without randomness–it causes bias.

  • The first 8 might be the Type-A personality types and therefore, are just the sort of people who make more money because they are just better workers. So, we dont know and like i said, require randomness.

def. A convenience sample is a nonprobability sample in which elements are selected because they are easy to access, readily available, or willing to participate, rather than through random selection.

  1. What concerns do you have about the adequacy of this sample to estimate annual income?

[sol] The main concern is that the sample may produce a biased estimate of the population mean income. Because participation depended on willingness to respond, the resulting data may systematically over- or under-represent certain types of architects (e.g., those with more time, interest, or incentives to respond). This introduces nonsampling error, specifically nonresponse and selection bias, which threatens the adequacy of the sample for estimating the population parameter (average annual income), even though the sample size is large relative to the population.


Problem 2

Read the following article that describes a proposal for using sampling in the year 2000 U.S. census: Roush W. A census in which all Americans count. Science 1996; 274: 713-714. What are the main arguments for using sampling in 2000? Against? What do you think?

[sol] U.S. census became highly politicized because undercounts disproportionately affected urban, minority, and low-income populations, which tended to benefit Democratic representation if corrected.Traditional door-to-door census systematically misses hard-to-reach populations (non-response bias). Republicans largely opposed statistical sampling, arguing it was unconstitutional and manipulable, while Democrats supported it to improve accuracy and equity. This culminated in major legal and congressional battles over whether sampling could be used for apportionment in the 2000 Census. The article was published during the Clinton administration, a period marked by debates over government efficiency, civil rights enforcement, and the use of statistical methods in public policy. statistics as a whole was put under question.

Personal thoughts : Don’t trust politicians to think with their brain. They are regretfully bad at that. For some sick reason in the world we are too busy burning things down to realize whats around us : Each-other


Problem 3

Every year, the University of California surveys graduating undergrads for their views. Review link (https://www.universityofcalifornia.edu/about-us/information-center/university-california-undergraduate-experience-survey-ucues-data-tables-2024)

  • What is the source population for the undergraduate survey? Who is the sample?

def. A source population proportion of sampled elements that provide usable data, calculated as the number of responding units divided by the number of eligible units in the sample

[sol]

Population : Graduating Undergrads at the nine University of California’s

Source population : Graduating undergraduates in that year who were eligible and reachable via UC administrative records and therefore could be invited (Source of coverage error)

Sample : Students who responded to the UCUES questionnaire in a given survey cycle (biennial survey)

  • What is the difference between what they call a response rate and a completion rate? In reports from the office that conducts this conducts this survey, they emphasize completion rates. Why?

def. A Response Rate is the set of elements from which the survey sample is actually drawn, as determined by the sampling frame and the practical constraints of the data-collection process–ie pct responding to survey.

def. A Completion Rate is the proportion of respondents who start a survey and complete it sufficiently to provide usable data, typically calculated as the number of completed interviews divided by the number of respondents who began the survey–ie. pct of students in the pop. who responded to atleast one survey question and clicked “submit”.

Reasoning : Taking a portion of an assessment doesn’t tell us the same amt as someone who did the whole assessment. Unfortunately, the characteristics of those whom complete exams are likely different.

  • In your opinion, will the sample estimates reasonably estimate the population values? Explain why

Recall :
Population : Graduating Undergrads at the nine University of California’s

And,

Source population : Graduating undergraduates in that year who were eligible and reachable via UC administrative records and therefore could be invited (Source of coverage error)

So, take into consideration that we take the survey every 2 years. Meaning we miss information every other year–and further realize some students are coming in from community college. So we have an even lower chance of picking those students information. Take a look numerically :

    yr    class
1 2021 Freshman
2 2022 Sophmore
3 2023   Junior
4 2024   Senior

Notice that if we take every other, we can only take them 2 ways :

    yr    class
1 2021 Freshman
3 2023   Junior
    yr    class
2 2022 Sophmore
4 2024   Senior

and, across time the behaviors of students might change. We can hope Juniors are related to seniors but i find that unlikely given the immidiete pressure and typical work habits of students (doing things last minute – Mon, Jan 19 10:03pm).

Now take the case of community college students :

    yr  class
3 2023 Junior
4 2024 Senior

We have only 2 ways to pick them :

    yr  class
3 2023 Junior
    yr  class
4 2024 Senior
  • Look at the 2024 data tables for the UCUES survey. Pick one of the questions that you think is particularly badly written and rewrite it. What makes you think it is written poorly? Why is your version an improvement?

Side note : The following data has the following filter applied : UCLA and Statistics Major.

Now, im not a genius but im calling bs :

You’re telling me over 60% of students aren’t using ChatGPT or “AI”– right.

and Now, look :

[1] 0.125

If i’m understanding things right, the respondents must think we are idiots. So about, 1/8 respondents uses ChatGPT daily but everyday i see students in the library using ChatGPT? And know many students to admit not knowing how to code. Makes sense. Right.

However, i dont believe the problem is in the question but rather the current perspective on AI in education. Students are scared to admit they use AI to do their work or learn.

Now heres the funniest part : Remember, we applied 2 filters : UCLA and Statistics Major. Meaning, we chose a subset of students–those of which deal with data science as a curriculum. Lets see what everyone else has to say :

These people are comedians–a very dishonest crowd.

So how do we get them to stop lying?

If we know they are using ChatGPT, test how often they use test-cases :

  • Asking Questions their too afraid to ask during lecture (most lecture if people aren’t actively talking stay entirely quite),

The following links were useful in solving the problem above :


Problem 4

At a small Midwest college, there are 2500 undergraduate students. The college wants to estimate how much time the students study each week. To do so, an administrator takes a simple random sample of 10 students and asks each how many hours he or she studied in the past week. The students report the following: {25,3,5,7,32,10,8,5,11,9}.

Calculate the mean and standard deviation of this sample.

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00    5.50    8.50   11.50   10.75   32.00 

Estimate the standard error.

standard err :  9.43103623386341