Quiz 1 Preview

SOC470

Section A — Multiple Choice

(6 × 1 pt = 6 pts)

Question 1

Which of the following best defines a unit of analysis?

A. The specific theory guiding your study.

B. The primary entity about which information is collected.

C. The variable chosen for statistical analysis.

D. The summary statistic computed from a dataset.

(Lecture 2, “Unit of Analysis,” p. 22.)

Answer 1

B. The primary entity about which information is collected.

The unit of analysis is the main entity or “case” that we study and collect data about - whether that’s individuals, households, organizations, countries, etc.

Question 2

Which statement best describes the difference between quantitative and qualitative research?

A. Quantitative focuses on numerical measurement and statistical inference, qualitative emphasizes meaning and context.

B. Quantitative requires fewer participants than qualitative.

C. Quantitative methods are inherently superior to qualitative.

D. Qualitative methods rely on experiments only.

(Lecture 2, pp. 10–12.)

Answer 2

A. Quantitative focuses on numerical measurement and statistical inference, qualitative emphasizes meaning and context.

This captures the fundamental distinction: quantitative research emphasizes numbers and statistical analysis, while qualitative research focuses on understanding meaning, context, and interpretation.

Question 3

For which type of variable is the median appropriate, but the mean is not?

A. Interval/ratio variables without outliers

B. Ordinal variables

C. Nominal variables

D. Any symmetric distribution

(Lecture 5, pp. 10–12.)

Answer 3

B. Ordinal variables

The median can be calculated for ordinal variables because we can rank-order the values, but the mean requires mathematical operations (addition/division) that aren’t meaningful for ordinal data.

Question 4

In a left-skewed distribution, which relationship usually holds?

A. Mean < Median < Mode

B. Mode < Median < Mean

C. Mean = Median = Mode

D. Median < Mode < Mean

(Lecture 5, pp. 22–25.)

Answer 4

A. Mean < Median < Mode

In a left-skewed (negatively skewed) distribution, the tail extends to the left, pulling the mean toward the lower values. The mode is at the peak, median is in the middle, and mean is pulled lowest.

Question 5

In cluster sampling, the researcher typically:

A. Samples some clusters, and includes all members of selected clusters.

B. Samples all clusters, but only a few members per cluster.

C. Randomly samples individuals without grouping.

D. Selects cases deliberately to represent variation.

(Lecture 4, pp. 41–42.)

Answer 5

A. Samples some clusters, and includes all members of selected clusters.

Cluster sampling involves randomly selecting entire clusters (groups) and then including all members within those selected clusters in the study.

Question 6

Which tool or practice is most directly linked to reproducibility in quantitative research?

A. Running more participants.

B. Keeping all notes in a single Word document.

C. Version control using Git/GitHub.

D. Using descriptive statistics.

(Lecture 3, pp. 21, 24–28.)

Answer 6

C. Version control using Git/GitHub.

Version control systems like Git/GitHub allow researchers to track changes, share code, and ensure that analyses can be exactly reproduced by others.

Section B — Short Answer

(4 × 2 pts = 8 pts)

Question 7

Population parameter vs. sample statistic. Define each and give one example of a parameter and its corresponding statistic.

(Lecture 4, pp. 4–7, 26–28.)

Answer 7

A parameter is a numerical summary describing a population (for example, population mean μ).

A statistic is the corresponding summary from a sample (for example, sample mean x̄) used to estimate μ.

Question 8

Give two reasons social scientists may avoid simple random sampling.

(Lecture 4, pp. 35–36.)

Answer 8

Reasons social scientists may avoid simple random sampling:

  1. Requires complete list of population
  2. Costly and impractical
  3. May miss rare subgroups

(Any two reasons earn full credit)

Question 9

In your own words, explain the difference between a probability sample and a non-probability sample. Give one example of each.

(Lecture 4, pp. 9–12, 29–32.)

Answer 9

A probability sample uses random selection where each unit has a known chance of being included (for example, stratified random sample).

A non-probability sample does not use random selection (for example, convenience sample).

Question 10

What is the purpose of using R Studio when doing data analysis with R?

(Lecture 3, pp. 30–38.)

Answer 10

Purpose: R Studio gives R a friendly GUI (Graphical User Interface), so that you can see plots, objects you created, and data analysis results in the same window.

It provides an integrated development environment that makes R more user-friendly and efficient.

Section C — Problems

(1 × 5 pts = 5 pts)

Question 11

A sociologist collects data on daily study hours from a small sample of 8 students:

2, 3, 2, 4, 3, 5, 50, 4

  1. Compute the sample mean x̄ and median.

  2. Compute the sample variance s².

  3. Briefly explain how the outlier (50) influences your answers, and what this implies about which measure of central tendency should be reported.

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \]

Let the ordered data be \(x_{(1)} \le \cdots \le x_{(n)}\). The median is

\[ M = \begin{cases} x_{\left(\frac{n+1}{2}\right)}, & n \text{ odd} \\ \dfrac{x_{\left(\frac{n}{2}\right)} + x_{\left(\frac{n}{2}+1\right)}}{2}, & n \text{ even} \end{cases} \] The variance is: \[ s^{2} = \frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^{2} \]

(Lecture 5, “Outliers…Median more resistant; Skewness,” pp. 55–60, 18–25, 30.)

Answer 11

Manual calculations:

  • Sample mean:
    • Sum: 2 + 3 + 2 + 4 + 3 + 5 + 50 + 4 = 73
    • \(\displaystyle \bar{x} = \frac{73}{8} = 9.125\)
  • Median (n = 8, even):
    • Ordered data: 2, 2, 3, 3, 4, 4, 5, 50
    • \(\displaystyle \tilde{x} = \frac{x_{(4)} + x_{(5)}}{2} = \frac{3 + 4}{2} = 3.5\)

Answer 11 (continued)

  • Sample variance:
    • \(\displaystyle \sum x_i^2 = 2^2 + 3^2 + 2^2 + 4^2 + 3^2 + 5^2 + 50^2 + 4^2 = 2583\)
    • \(\displaystyle s^2 = \frac{\sum x_i^2 - n\,\bar{x}^{\,2}}{n-1} = \frac{2583 - 8\,(9.125)^2}{7} = \frac{2583 - 666.125}{7} = \frac{1916.875}{7} \approx 273.84\)

Answer 11 (continued)

Part c) Interpretation:

The outlier (50) greatly inflates both the mean (9.125 vs median of 3.5) and the variance (≈ 273.84).

The median is more resistant to outliers and better reflects the typical study time for most students.

Recommendation: Report the median as the measure of central tendency due to the extreme outlier.

Summary

  • Section A (Multiple Choice): 6 questions testing fundamental concepts
  • Section B (Short Answer): 4 questions on key definitions and concepts
  • Section C (Problems): 1 computational problem demonstrating statistical concepts

Key Learning Points: - Understanding units of analysis and research approaches - Sampling methods and their applications - Measures of central tendency and their robustness - The impact of outliers on statistical measures