Session 20&21

Associative Analysis

Sungjin Kim

Class Structure

Last Session

Testing Differences:
- Percentages (Two Groups)
  - Tyrenol preferences among congestion vs. muscle ache sufferers.
- Means (Multiple Groups):
  - t-test: Compare means of two groups. e.g., (Male vs. female teens’ sports drink consumption.)
  - ANOVA: Compare means across multiple groups. (e.g., age-based preferences for cars).
- Paired Samples:
  - Paired t-test: Compare two variables within the same group.
  - Example: Concern for global warming vs. gasoline emissions.

Agenda

What Are Associative Analyses?
Types of Relationships
Correlation Coefficient
Cross Tabulation and Chi-Square Test
Special Consideration in Associative Analyses

Introduction to Associative Analyses

Marketing researchers often want to go beyond descriptive measures, statistical inference, and differences tests.
Explore relationships among variables in large datasets (hundreds or thousands of survey responses).
What kinds of people buy Frito-Lay snacks such as Cheetos, Fritos, or Lay’s potato chips?
- Demographics: Who are the customers?
- Circumstances: Under what conditions are these products chosen?
Associative analyses: determine where stable relationships exist between two variables

Relationships between Two Variables

Relationship: a consistent, systematic linkage between the levels or labels for two variables
- “Levels” refers to the characteristics of description for interval or ratio scales.
  - e.g., Older consumers purchase more vitamins.
- “Labels” refers to the characteristics of description for nominal or ordinal scales.
  - e.g., PayPal customers (Yes/No) tend to also be Amazon Prime customers.

Relationships between Two Variables (cont.)

Statistical Linkage (our focus): Indicates a consistent pattern or association between variables, but not causation.
- Most daily exercisers purchase sports drinks → suggests correlation but does not prove that exercising causes sports drink purchases.
Causal Linkage: Requires certainty that one variable causes the other.
- Increased advertising spend leads to higher sales → a proven cause-and-effect relationship supported by controlled testing or robust analysis.

Why Statistical Relationships Matter?

They offer insights that lead to deeper understanding.

Customer Preferences: Data shows that young adults are more likely to purchase plant-based milk. ->Marketing efforts can focus on health-conscious messaging for this demographic.
Product Usage Patterns: Consumers who frequently travel tend to buy larger quantities of travel-size toiletries. -> Retailers can optimize product placement near travel-related items.
E-Commerce Behaviors: Frequent online shoppers often use digital wallets like PayPal. -> Encouraging wallet-based payment options might increase conversion rates.
Cross-Selling Opportunities: Customers who buy high-end smartphones often purchase accessories like cases or ear buds.->Bundle deals can encourage higher spending.

Agenda

What Are Associative Analyses?
Types of Relationships
Correlation Coefficient
Cross Tabulation and Chi-Square Test
Special Consideration in Associative Analyses

Monotonic Linear Relationship

A “straight-line association” between two scale variables.
Formula:
\[y=a+bx\]
- $y$: Dependent variable (predicted/estimated).
- $a$: Intercept (value of y when $x=0$).
- $b$: Slope (rate of change in $y$ per unit change in $x$).
- $x$: Independent variable.
- Predicts $y$ given any value of $x$ using known $a$ and $b$.
Linear relationships are commonly used in the analysis of interval or ratio scale variables
We will learn about it in more detail in the regression analysis.

Monotonic Linear Relationship Example

Burger King estimates that every customer will spend about $12 per lunch visit
it is easy to use a linear relationship to estimate how many dollars of revenue will be associated with the number of customers for any given location.

\[y = 0 + 12\] where x = number of customers.
100 customers → 12×100=1,200 revenue.
200 customers → 12×200=2,400 revenue.
Linear relationship provides an average expectation of future revenue.

Nonmonotonic Relationship

A relationship where the presence or absence of one variable’s label is systematically associated with the presence or absence of another variable’s label.
- No consistent direction (e.g., increasing or decreasing).
- The relationship is general and described verbally.
- Describes patterns of association, not precise relationships or quantities.
Nonmonotonic relationships are often used for the analysis of norminal scale variables

Nonmonotonic Relationship Example

Drinks Ordered at McDonald’s
- Morning customers tend to buy coffee and breakfast foods.
- Noon customers tend to buy soft drinks and lunch items.
Not exclusive: Labels are associated on average, but exceptions exist.

Characterizing Relationships between Variables

Presence:
Whether a systematic (statistical) relationship exists between two variables.
Pattern:
The general nature of the relationship, including its direction.
Strength of Association:
Whether the relationship is consistent.

Step-By-Step Procedure for Analyzing the Relationship between Two Variables

Agenda

What Are Associative Analyses?
Types of Relationships
Correlation Coefficient
Cross Tabulation and Chi-Square Test
Special Consideration in Associative Analyses

Correlation Coefficients

Definition:
- A measure (r) that quantifies the strength and direction of a linear relationship between two scale variables.
- Range: −1.0 to +1.0.
The correlation coefficient communicates both the strength and the direction of the linear relationship between two metric variables.Strength: Determined by the absolute value of
- Strength: Determined by the absolute value of r.
  - e.g., r=0.8: Strong relationship; r=0.2: Weak relationship.
- Direction: Indicated by the sign of r.
  - Positive (+): Variables increase together.
  - Negative (−): One variable increases as the other decreases.

Correlation Coefficients (cont.)

Rules of Thumb about Correlation Coefficient Size

Coefficient Range	Strength of Association*
$\pm$ 0.81 to $\pm$ 1.00	Very strong
$\pm$ 0.61 to $\pm$ 0.80	Strong
$\pm$ 0.41 to $\pm$ 0.60	Moderate
$\pm$ 0.21 to $\pm$ 0.40	Weak
under $\pm$ 0.20	Very weak

Regardless of its absolute value, the correlation coefficient must be tested for statistical significance.

Visualizing Covariation Using Scatter Diagrams

Scatter diagrams visualize covariation between two variables.
- Vertical axis (y): Dependent variable (e.g., sales).
- Horizontal axis (x): Independent variable (e.g., number of salespeople).
- Points: Represent matched pairs of x and y values.
Systematic covariation forms an ellipse-like pattern.

Visualizing Covariation Using Scatter Diagrams (cont.)

Patterns can vary depending on the relationship between variables.

The Pearson Product Moment Correlation Coefficient

Measures the linear relationship between two interval or ratio-scaled variables (scale variables).
- Based on the closeness of scatter points to a straight line.
Perfect Correlation: All points fall on a straight line; Correlation coefficient (r) = $\pm$ 1.00.
No Correlation: Scatter points form a ball-shaped pattern with no discernable ellipse; r ≈ 0.0.
Typical Values: Most correlations fall between -1.0 and +1.0 and are interpreted as: Strong, moderate, or weak, if statistically significant.
Provides insight into the strength and direction of the linear association.
If you are interested in the math behind it, here is the reference.
- The calculation of r is VERY tedious, so I don’t even introduce the formula here.

Calculating Correlation Coefficient in R

Let’s calculate the correlation coefficient using R.
As always, we will use auto_concept.csv data.
There are six different lifestyles: Novelist, Innovator, Trendsetter, Forerunner, Mainstreamer, and Classic.
- Each lifestyle type is measured with a 7-point interval scale
  - 1 = Does not describe me at all to 7 = Describes me perfectly
Correlation analysis can find out which lifestyle profile is associated with a particular automobile model preference.
- High Positive Correlations: Indicate consumers prefer a specific model and score high on the associated lifestyle type.
- Low or Negative Correlations: Suggest consumers do not align with that model or lifestyle type.

Calculating Correlation Coefficient in R (cont.)

Let’s determine which of the six lifestyle types is associated with the preference for the 5-seat economy gasoline automobile model.
First step is to select the variables to analyze.
- Preference for the 4 Seat Economy Gasoline model.
- All six lifestyle types: Novelist, Innovator, Trendsetter, Forerunner, Mainstreamer, and Classic.
All of these variables are interval scale variables. -> Use Pearson’s correlation to measure the strength and direction of the linear relationship.
- Default to a two-tailed test for statistical significance.

Calculating Correlation Coefficient in R (cont.)

We will rely on psych package for an efficient and clean workflow.
- Computes both the correlation coefficient and the p-value in a single step.
- Saves time and reduces redundancy compared to cor() and cor.test().
Load Data: Import the dataset into R.
SSelect Variables: Focus on the 4 Seat Economy Gasoline model preference and the six lifestyle types.
- Rename variables to their original names (e.g., Novelist, Innovator) for clarity.

# Install and load the psych package
# install.packages("psych") #if you didn't install
library(tidyverse)
library(psych)
setwd("~/Documents/GitHub/Marketing-Research-2025-Spring")
# Step 1: Load the dataset
auto_concept <- read_csv("auto_concept.csv")

# Step 2: Select relevant columns
# Include 'economygas4seat' and the six lifestyle variables
selected_data <- auto_concept %>%
  select(economygas4seat, lifestyle1, lifestyle2, lifestyle3, lifestyle4, lifestyle5, lifestyle6) %>%
  rename(
    "Economy Gasoline" = economygas4seat,
    "Novelist" = lifestyle1,
    "Innovator" = lifestyle2,
    "Trendsetter" = lifestyle3,
    "Forerunner" = lifestyle4,
    "Mainstreamer" = lifestyle5,
    "Classic" = lifestyle6
  )

Calculating Correlation Coefficient in R (cont.)

Use the corr.test() function in the psych package for Pearson correlation.
- Correlation matrix: Measures strength and direction.
- P-Value: Indicates statistical significance.

# Step 3: Compute correlations and p-values
# Use rcorr to compute correlation matrix and significance
results <- corr.test(selected_data, method = "pearson")

# Step 4: Extract correlation coefficients and p-values
results
results$stars

Correlation Matrix
	Economy Gasoline	Novelist	Innovator	Trendsetter	Forerunner	Mainstreamer	Classic
Economy Gasoline	1.00	0.09	-0.12	0.03	0.22	-0.01	0.63
Novelist	0.09	1.00	-0.06	0.09	0.12	0.02	0.07
Innovator	-0.12	-0.06	1.00	-0.17	0.00	-0.02	-0.07
Trendsetter	0.03	0.09	-0.17	1.00	0.01	-0.04	0.07
Forerunner	0.22	0.12	0.00	0.01	1.00	0.11	0.11
Mainstreamer	-0.01	0.02	-0.02	-0.04	0.11	1.00	-0.04
Classic	0.63	0.07	-0.07	0.07	0.11	-0.04	1.00

P-value
	Economy Gasoline	Novelist	Innovator	Trendsetter	Forerunner	Mainstreamer	Classic
Economy Gasoline	1***	0.09	-0.12**	0.03	0.22***	-0.01	0.63***
Novelist	0.09**	1***	-0.06	0.09*	0.12**	0.02	0.07
Innovator	-0.12***	-0.06	1***	-0.17***	0	-0.02	-0.07
Trendsetter	0.03	0.09**	-0.17***	1***	0.01	-0.04	0.07
Forerunner	0.22***	0.12***	0	0.01	1***	0.11**	0.11*
Mainstreamer	-0.01	0.02	-0.02	-0.04	0.11***	1***	-0.04
Classic	0.63***	0.07*	-0.07*	0.07*	0.11***	-0.04	1***

Calculating Correlation Coefficient in R (cont.)

Classic (r=0.63, p<0.01): Strong positive & significant correlation.
- Highlights that individuals identifying as Classic are the most likely to prefer the economy gasoline model.
Forerunner (r=0.22, p<0.01): Moderate positive & significant correlation.
- Suggests Forerunners are more likely to prefer the economy gasoline model
Novelist (r=0.09, p<0.01): Weak positive & significant correlation
- Suggests a slight association between preference for the economy gasoline model and the Novelist lifestyle.
Innovator (r=−0.12, p<0.01): Weak negative & significant correlation
- Indicates that individuals identifying as Innovators are slightly less likely to prefer the economy gasoline model.

Agenda

What Are Associative Analyses?
Types of Relationships
Correlation Coefficient
Cross Tabulation and Chi-Square Test
Special Consideration in Associative Analyses

Cross-Tabulation Analysis

Cross-tabulation is a statistical method used to examine nonmonotonic relationships between two nominally scaled variables.
A cross-tabulation table is sometimes referred to as an “r×c” (r-by-c) table because it is comprised of rows and columns.
- Rows (r): Represent categories of one variable.
- Columns (c): Represent categories of another variable.
- Cells: The intersection of rows and columns, containing the frequency of occurrences.

Cross-Tabulation Analysis Example

In a survey (200 respondents), researchers examine the relationship between Occupation (nominal variable) and Michelob Ultra Beer Purchasing Behavior (nominal variable).
Two Nominal Variables:
1. Occupation: White Collar (160) & Blue Collar (40).
2. Beer Purchasing Behavior: Buyers (166) & Nonbuyers (34) of Michelob Ultra beer.
Cross-tabulation analysis uses both nominal variables simultaneously and tallies up the cell frequencies

Occupation Status	Buyer	Nonbuyer	Row Total
White Collar	152	8	160
Blue Collar	14	26	40
Column Total	166	34	200

Raw Percentages Table

In a cross-tabulation table, raw frequencies can be converted into various types of percentages to provide additional insights.
The raw percentages table is derived by dividing each raw frequency by the grand total (200 in this case) and multiplying by 10 \[\text{Raw Percentage}=\frac{\text{Cell Freqeuncy}}{\text{Grand Total}}\times100\]

		Buyer	Nonbuyer	Total
White Collar	% of Total	76%	4%	80%
Blue Collar	% of Total	7%	13%	20%
Total	% of Total	83%	17%	100%

Row Percentages Table

The row percentages table calculates percentages relative to the row totals (e.g., 160 for White Collar, 40 for Blue Collar). \[\text{Row Cell Percentage}=\frac{\text{Cell Freqeuncy}}{\text{Total of Cell Frequencies in that Row}}\times100\]
e.g., The first column is calculated by $\frac{152}{160}\times100$ and $\frac{14}{40}\times100$

Occupation Status	Buyer (%)	Nonbuyer (%)	Row Total (%)
White Collar	95%	5%	100%
Blue Collar	35%	65%	100%

Column Percentages Table

The row percentages table calculates percentages relative to the row totals (e.g., 160 for White Collar, 40 for Blue Collar). \[\text{Column Cell Percentage}=\frac{\text{Cell Freqeuncy}}{\text{Total of Cell Frequencies in that Column}}\times100\]
e.g., The first row is calculated by $\frac{152}{166}\times100$ and $\frac{8}{34}\times100$

Occupation Status	Buyer (%)	Nonbuyer (%)
White Collar	91.57%	23.53%
Blue Collar	8.43%	76.47%
Column Total	100%	100%

Chi-Square (χ²) Analysis Overview

Chi-square (χ²) analysis is a statistical technique used to examine the frequencies of two nominally scaled variables in a cross-tabulation table.
It assesses nonmonotonic association in a cross-tabulation table based upon differences between observed and expected frequencies
It determines whether there is a statistically significant relationship between the two variables.
- Null Hypothesis: Assumes no association between the variables (independence).
  - e.g.,: The distribution of buyers and nonbuyers is independent of occupation.
- Alternative Hypothesis: Assumes that the variables are associated (dependent).

Observed Frequencies in Chi-Square Analysis

Chi-square analysis compares observed frequencies (actual counts from the data) to expected frequencies (theoretical counts assuming no association). The degree of difference is expressed in the Chi-square test statistic.
Observed Frequencies ($O_{ij}$):
- These are the actual cell counts in the cross-tabulation table.
- Example: 152 White Collar buyers and 8 White Collar nonbuyers.

Occupation Status	Buyer	Nonbuyer	Row Total
White Collar	152	8	160
Blue Collar	14	26	40
Column Total	166	34	200

Expected Frequencies in Chi-Square Analysis

Expected Frequencies ($E_{ij}$): These are theoretical frequencies calculated under the null hypothesis of no association.
Formula: $E_{ij} = \frac{\text{Row Total}_{i} \times \text{Column Total}_{j}}{\text{Grand Total}}$

\[ \text{White-collar buyer} = \frac{160 \times 166}{200} = 132.8 \]

\[ \text{White-collar nonbuyer} = \frac{160 \times 34}{200} = 27.2 \]

\[ \text{Blue-collar buyer} = \frac{40 \times 166}{200} = 33.2 \]

\[ \text{Blue-collar nonbuyer} = \frac{40 \times 34}{200} = 6.8 \]

The Computed χ² Value

The Chi-square (χ²) value is a single number summarizing how much the observed frequencies deviate from the expected frequencies in a cross-tabulation table.
It provides a measure of how closely the data align with the null hypothesis of no association. \[ \chi^2 = \sum_{i=1,j=1}^n \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \] Where:
$O_{ij}$ : Observed frequency in row i and column j
$E_{ij}$: Expected frequency in row i and column j
$n$: Number of cells

Degrees of Freedom in Chi-Square Analysis

Degrees of freedom represent the number of values in a calculation that are free to vary while still meeting the constraints of the data.
In the context of Chi-square analysis, degrees of freedom are used to determine the critical value from the Chi-square distribution table.

\[ df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1) \]

2x2 Table (Michelob Ultra Example): - 2 Rows (White Collar, Blue Collar) and 2 Columns (Buyer, Nonbuyer) \[ df = (2 - 1) \times (2 - 1) = 1 \]

Characteristics of the Chi-Square Distribution

We will use the chi-square distribution to determine the statistical significance of chi-square value.
The Chi-square distribution is skewed to the right.
- The skewness decreases as the degrees of freedom (df) increase.
- For large df, the Chi-square distribution approximates a normal distribution
The rejection region is always located in the right-hand tail of the distribution.
Larger Chi-square values fall in the rejection region, indicating a significant result.

Characteristics of the Chi-Square Distribution

The Computed χ² Value Practice

Observed Frequencies (O)

Occupation Status	Buyer	Nonbuyer	Row Total
White Collar	152	8	160
Blue Collar	14	26	40
Column Total	166	34	200

Expected Frequencies (E)

Occupation Status	Buyer	Nonbuyer	Row Total
White Collar	132.8	27.2	160
Blue Collar	33.2	6.8	40
Column Total	166	34	200

The Computed χ² Value Practice

\[ \chi^2 = \frac{(152 - 132.8)^2}{132.8} + \frac{(8 - 27.2)^2}{27.2} + \frac{(14 - 33.2)^2}{33.2} + \frac{(26 - 6.8)^2}{6.8}=81.70 \] \[ df = (2 - 1) \times (2 - 1) = 1 \]

Statistical significance is determined by comparing the computed Chi-square value with the critical value from the Chi-square distribution table or by using a p-value.
Modern statistical software automates the process.

How the Chi-Square Value is Computed

Compare Observed and Expected Frequencies: Subtract the expected frequency $E$ from the observed frequency $O$; $(O - E)$.
Adjust for Negative Values: Square the difference to eliminate negative values and prevent cancellation effects: $(O - E)^2$
Normalize by Expected Frequency: Divide the squared difference by the expected frequency $E$ to adjust for differences in cell sizes: $\frac{(O - E)^2}{E}$
Summation Across All Cells: Add these normalized values across all cells in the cross-tabulation table: \[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

Interpreting the Chi-Square Value

Large Deviations:
- If observed frequencies deviate significantly from expected frequencies -> the computed Chi-square value increases.
Small Deviations:
- If observed frequencies are close to expected frequencies -> the computed Chi-square value remains small.
The Chi-square value provides a summary measure of the departure of observed frequencies from expected frequencies.

Cross Tabulation in R

Okay, let’s practice what we have learned in R.
Recall that we have several nominal variables in our auto_concept.csv data.
- Marital Status (marital): Unmarried(0) / Married (1) (Nominal Variable)
- Favorite Newspaper Section: (newspaper): Editorial, Business, Local News, National News, Sports, Entertainment, Do Not Read (Nominal Variable)
We will use janitor package for the associative analysis

library(tidyverse)
#install.packages("janitor")  # Install if not already installed
library(janitor)
setwd("~/Documents/GitHub/Marketing-Research-2025-Spring")
auto_concept <- read_csv("auto_concept.csv")
# Rename marital and newspaper variables
auto_concept <- auto_concept %>%
  mutate(
    marital = factor(marital,
                     levels = c(0, 1),
                     labels = c("Unmarried", "Married")),
    newspaper = factor(newspaper,
                       levels = c(1, 2, 3, 4, 5, 6, 7),
                       labels = c("Editorial", "Business", "Local News",
                                  "National News", "Sports", "Entertainment", "Do Not Read"))
  )

Cross Tabulation in R (cont.)

The tabyl() within janitor package provides simple way to calculate the cross table.

# Create a cross-tabulation table
cross_tab <- auto_concept %>%
  tabyl(marital, newspaper)
cross_tab

Observed Frequencies: Marital Status vs. Newspaper Section
marital	Editorial	Business	Local News	National News	Sports	Entertainment	Do Not Read
Unmarried	0	5	16	4	53	7	25
Married	94	199	301	37	183	52	24

Cross Tabulation in R (cont.)

Let’s do the chi-square test.
- you just put the cross_tab in the chisq.test() function.
The p-value is essentially zero; it indicates a statistically significant association between marital status and newspaper section preferences.
However, it does not specify the nature of the relationship.
To understand this, we must analyze the row or column percentages.

# Perform Chi-square test
chi_test <- chisq.test(cross_tab)

# Display Chi-square test results
chi_test


    Pearson's Chi-squared test

data:  cross_tab
X-squared = 150.24, df = 6, p-value < 2.2e-16

Cross Tabulation in R (cont.)

We can easily calculate the raw percentages table
adorn_percentages("all") does the job!

cross_tab_total <- auto_concept %>%
  tabyl(marital, newspaper) %>%
  adorn_percentages("all") %>%
  adorn_pct_formatting(digits = 2)

cross_tab_total

Raw Percentages: Marital Status vs. Newspaper Section
marital	Editorial	Business	Local News	National News	Sports	Entertainment	Do Not Read
Unmarried	0.00%	0.50%	1.60%	0.40%	5.30%	0.70%	2.50%
Married	9.40%	19.90%	30.10%	3.70%	18.30%	5.20%	2.40%

Cross Tabulation in R (cont.)

Similarly, we can easily calculate the row percentages table
adorn_percentages("row") does the job!

cross_tab_total <- auto_concept %>%
  tabyl(marital, newspaper) %>%
  adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 2)

cross_tab_total

Row Percentages: Marital Status vs. Newspaper Section
marital	Editorial	Business	Local News	National News	Sports	Entertainment	Do Not Read
Unmarried	0.00%	4.55%	14.55%	3.64%	48.18%	6.36%	22.73%
Married	10.56%	22.36%	33.82%	4.16%	20.56%	5.84%	2.70%

Cross Tabulation in R (cont.)

Similarly, we can easily calculate the column percentages table
adorn_percentages("col") does the job!

cross_tab_total <- auto_concept %>%
  tabyl(marital, newspaper) %>%
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 2)

cross_tab_total

Column Percentages: Marital Status vs. Newspaper Section
marital	Editorial	Business	Local News	National News	Sports	Entertainment	Do Not Read
Unmarried	0.00%	2.45%	5.05%	9.76%	22.46%	11.86%	51.02%
Married	100.00%	97.55%	94.95%	90.24%	77.54%	88.14%	48.98%

Key Observations from Row Percentages Table

Unmarried Individuals:
- Sports is the dominant preference, at about 48% of their readership.
- Do Not Read is the second most frequent choice, at about 23%.
Married Individuals:
- Local News is the most common preference, at around 34%.
- Business comes next, with about 22%; Sports is also significant, at 21%.

Row Percentages: Marital Status vs. Newspaper Section
marital	Editorial	Business	Local News	National News	Sports	Entertainment	Do Not Read
Unmarried	0.00%	4.55%	14.55%	3.64%	48.18%	6.36%	22.73%
Married	10.56%	22.36%	33.82%	4.16%	20.56%	5.84%	2.70%

Marketing Implications

If you are a marketing manager, what would you recommend for the CMO based on this results?
- Advertisers targeting Unmarried individuals should focus on Sports sections or non-traditional platforms.
- For Married individuals, content in Local News and Business sections would be more effective.
- Tailor messaging based on these patterns. For example, community-based initiatives might resonate more with Married individuals.
- Editorial, National News, and Entertainment sections need reevaluation as they perform poorly across both marital status groups.

Agenda

What Are Associative Analyses?
Types of Relationships
Correlation Coefficient
Cross Tabulation and Chi-Square Test
Special Consideration in Associative Analyses

Special Considerations in Association Procedures

1. Scaling Assumptions

Correlation: Requires both variables to have interval-level scaling at a minimum.
Cross-Tabulation: Used when one or both variables are nominally scaled.

2. Analysis is Limited to Two Variables

Association analyses focus solely on the relationship between two variables.
Assumptions: Other variables are assumed constant or “frozen.”
Limitation: Interactions with other variables are ignored, which can oversimplify complex relationships.

Special Considerations in Association Procedures (cont.)

3. Correlation ≠ Causation

Neither correlations nor cross-tabulations imply cause-and-effect relationships.

4. Pearson Correlation Only Identifies Linear Relationships

A correlation coefficient of ~0 does not mean no relationship exists—it simply means no linear relationship exists.
Nonlinear relationships (e.g., S-shape or J-shape patterns) will not be detected by the Pearson product-moment correlation.

Coefficient Range	Strength of Association*
\(\pm\) 0.81 to \(\pm\) 1.00	Very strong
\(\pm\) 0.61 to \(\pm\) 0.80	Strong
\(\pm\) 0.41 to \(\pm\) 0.60	Moderate
\(\pm\) 0.21 to \(\pm\) 0.40	Weak
under \(\pm\) 0.20	Very weak

Session 20&21

Class Structure

Last Session

Agenda

Introduction to Associative Analyses

Relationships between Two Variables

Relationships between Two Variables (cont.)

Why Statistical Relationships Matter?

Agenda

Monotonic Linear Relationship

Monotonic Linear Relationship Example

Nonmonotonic Relationship

Nonmonotonic Relationship Example

Characterizing Relationships between Variables

Step-By-Step Procedure for Analyzing the Relationship between Two Variables

Agenda

Correlation Coefficients

Correlation Coefficients (cont.)

Visualizing Covariation Using Scatter Diagrams

Visualizing Covariation Using Scatter Diagrams (cont.)

The Pearson Product Moment Correlation Coefficient

Calculating Correlation Coefficient in R

Calculating Correlation Coefficient in R (cont.)

Calculating Correlation Coefficient in R (cont.)

Calculating Correlation Coefficient in R (cont.)

Calculating Correlation Coefficient in R (cont.)

Agenda

Cross-Tabulation Analysis

Cross-Tabulation Analysis Example

Raw Percentages Table

Row Percentages Table

Column Percentages Table

Chi-Square (χ²) Analysis Overview

Observed Frequencies in Chi-Square Analysis

Expected Frequencies in Chi-Square Analysis

The Computed χ² Value

Degrees of Freedom in Chi-Square Analysis

Characteristics of the Chi-Square Distribution

Characteristics of the Chi-Square Distribution

The Computed χ² Value Practice

The Computed χ² Value Practice

How the Chi-Square Value is Computed

Interpreting the Chi-Square Value

Cross Tabulation in R

Cross Tabulation in R (cont.)

Cross Tabulation in R (cont.)

Cross Tabulation in R (cont.)

Cross Tabulation in R (cont.)

Cross Tabulation in R (cont.)

Key Observations from Row Percentages Table

Marketing Implications

Agenda

Special Considerations in Association Procedures

1. Scaling Assumptions

2. Analysis is Limited to Two Variables

Special Considerations in Association Procedures (cont.)

3. Correlation ≠ Causation

4. Pearson Correlation Only Identifies Linear Relationships

Next Class