Session 20&21

Associative Analysis

Sungjin Kim

Class Structure

Last Session

  • Testing Differences:

    • Percentages (Two Groups)

      • ​Tyrenol preferences among congestion vs. muscle ache sufferers.
    • Means (Multiple Groups):

      • t-test: Compare means of two groups. e.g., (Male vs. female teens’ sports drink consumption.)

      • ANOVA: Compare means across multiple groups. (e.g., age-based preferences for cars).

    • Paired Samples:

      • Paired t-test: Compare two variables within the same group.

      • Example: Concern for global warming vs. gasoline emissions.

Agenda

  • What Are Associative Analyses?

  • Types of Relationships

  • Correlation Coefficient

  • Cross Tabulation and Chi-Square Test

  • Special Consideration in Associative Analyses

Introduction to Associative Analyses

  • Marketing researchers often want to go beyond descriptive measures, statistical inference, and differences tests.

  • Explore relationships among variables in large datasets (hundreds or thousands of survey responses).

  • What kinds of people buy Frito-Lay snacks such as Cheetos, Fritos, or Lay’s potato chips?

    • Demographics: Who are the customers?

    • Circumstances: Under what conditions are these products chosen?

  • Associative analyses: determine where stable relationships exist between two variables

Relationships between Two Variables

  • Relationship: a consistent, systematic linkage between the levels or labels for two variables
    • Levels” refers to the characteristics of description for interval or ratio scales.
      • e.g., Older consumers purchase more vitamins.
    • Labels” refers to the characteristics of description for nominal or ordinal scales.
      • e.g., PayPal customers (Yes/No) tend to also be Amazon Prime customers.

Relationships between Two Variables (cont.)

  • Statistical Linkage (our focus): Indicates a consistent pattern or association between variables, but not causation.
    • Most daily exercisers purchase sports drinks → suggests correlation but does not prove that exercising causes sports drink purchases.
  • Causal Linkage: Requires certainty that one variable causes the other.
    • Increased advertising spend leads to higher sales → a proven cause-and-effect relationship supported by controlled testing or robust analysis.

Why Statistical Relationships Matter?

They offer insights that lead to deeper understanding.

  1. Customer Preferences: Data shows that young adults are more likely to purchase plant-based milk. ->Marketing efforts can focus on health-conscious messaging for this demographic.

  2. Product Usage Patterns: Consumers who frequently travel tend to buy larger quantities of travel-size toiletries. -> Retailers can optimize product placement near travel-related items.

  3. E-Commerce Behaviors: Frequent online shoppers often use digital wallets like PayPal. -> Encouraging wallet-based payment options might increase conversion rates.

  4. Cross-Selling Opportunities: Customers who buy high-end smartphones often purchase accessories like cases or ear buds.->Bundle deals can encourage higher spending.

Agenda

  • What Are Associative Analyses?

  • Types of Relationships

  • Correlation Coefficient

  • Cross Tabulation and Chi-Square Test

  • Special Consideration in Associative Analyses

Monotonic Linear Relationship

  • A “straight-line association” between two scale variables.

  • Formula:
    \[y=a+bx\]

    • \(y\): Dependent variable (predicted/estimated).

    • \(a\): Intercept (value of y when \(x=0\)).

    • \(b\): Slope (rate of change in \(y\) per unit change in \(x\)).

    • \(x\): Independent variable.

    • Predicts \(y\) given any value of \(x\) using known \(a\) and \(b\).

  • Linear relationships are commonly used in the analysis of interval or ratio scale variables

  • We will learn about it in more detail in the regression analysis.

Monotonic Linear Relationship Example

  • Burger King estimates that every customer will spend about $12 per lunch visit

  • it is easy to use a linear relationship to estimate how many dollars of revenue will be associated with the number of customers for any given location.

    \[y = 0 + 12\] where x = number of customers.

  • 100 customers → 12×100=1,200 revenue.

  • 200 customers → 12×200=2,400 revenue.

  • Linear relationship provides an average expectation of future revenue.

Nonmonotonic Relationship

  • A relationship where the presence or absence of one variable’s label is systematically associated with the presence or absence of another variable’s label.

    • No consistent direction (e.g., increasing or decreasing).

    • The relationship is general and described verbally.

    • Describes patterns of association, not precise relationships or quantities.

  • Nonmonotonic relationships are often used for the analysis of norminal scale variables

Nonmonotonic Relationship Example

  • Drinks Ordered at McDonald’s

    • Morning customers tend to buy coffee and breakfast foods.
    • Noon customers tend to buy soft drinks and lunch items.
  • Not exclusive: Labels are associated on average, but exceptions exist.

Characterizing Relationships between Variables

  • Presence:
    Whether a systematic (statistical) relationship exists between two variables.
  • Pattern:
    The general nature of the relationship, including its direction.
  • Strength of Association:
    Whether the relationship is consistent.

Step-By-Step Procedure for Analyzing the Relationship between Two Variables 

Agenda

  • What Are Associative Analyses?

  • Types of Relationships

  • Correlation Coefficient

  • Cross Tabulation and Chi-Square Test

  • Special Consideration in Associative Analyses

Correlation Coefficients

  • Definition:
    • A measure (r) that quantifies the strength and direction of a linear relationship between two scale variables.

    • Range: −1.0 to +1.0.

  • The correlation coefficient communicates both the strength and the direction of the linear relationship between two metric variables.Strength: Determined by the absolute value of
    • Strength: Determined by the absolute value of r.
      • e.g., r=0.8: Strong relationship; r=0.2: Weak relationship.
    • Direction: Indicated by the sign of r.
      • Positive (+): Variables increase together.
      • Negative (−): One variable increases as the other decreases.

Correlation Coefficients (cont.)

  • Rules of Thumb about Correlation Coefficient Size
Coefficient Range Strength of Association*
\(\pm\) 0.81 to \(\pm\) 1.00 Very strong
\(\pm\) 0.61 to \(\pm\) 0.80 Strong
\(\pm\) 0.41 to \(\pm\) 0.60 Moderate
\(\pm\) 0.21 to \(\pm\) 0.40 Weak
under \(\pm\) 0.20 Very weak
  • Regardless of its absolute value, the correlation coefficient must be tested for statistical significance.

Visualizing Covariation Using Scatter Diagrams

  • Scatter diagrams visualize covariation between two variables.
    • Vertical axis (y): Dependent variable (e.g., sales).
    • Horizontal axis (x): Independent variable (e.g., number of salespeople).
    • Points: Represent matched pairs of x and y values.
  • Systematic covariation forms an ellipse-like pattern.

Visualizing Covariation Using Scatter Diagrams (cont.)

  • Patterns can vary depending on the relationship between variables.

The Pearson Product Moment Correlation Coefficient

  • Measures the linear relationship between two interval or ratio-scaled variables (scale variables).

    • Based on the closeness of scatter points to a straight line.
  • Perfect Correlation: All points fall on a straight line; Correlation coefficient (r) = \(\pm\) 1.00.

  • No Correlation: Scatter points form a ball-shaped pattern with no discernable ellipse; r ≈ 0.0.

  • Typical Values: Most correlations fall between -1.0 and +1.0 and are interpreted as: Strong, moderate, or weak, if statistically significant.

  • Provides insight into the strength and direction of the linear association.

  • If you are interested in the math behind it, here is the reference.

    • The calculation of r is VERY tedious, so I don’t even introduce the formula here.

Calculating Correlation Coefficient in R

  • Let’s calculate the correlation coefficient using R.
  • As always, we will use auto_concept.csv data.
  • There are six different lifestyles: Novelist, Innovator, Trendsetter, Forerunner, Mainstreamer, and Classic.
    • Each lifestyle type is measured with a 7-point interval scale
      • 1 = Does not describe me at all to 7 = Describes me perfectly
  • Correlation analysis can find out which lifestyle profile is associated with a particular automobile model preference.
    • High Positive Correlations: Indicate consumers prefer a specific model and score high on the associated lifestyle type.

    • Low or Negative Correlations: Suggest consumers do not align with that model or lifestyle type.

Calculating Correlation Coefficient in R (cont.)

  • Let’s determine which of the six lifestyle types is associated with the preference for the 5-seat economy gasoline automobile model.

  • First step is to select the variables to analyze.

    • Preference for the 4 Seat Economy Gasoline model.
    • All six lifestyle types: Novelist, Innovator, Trendsetter, Forerunner, Mainstreamer, and Classic.
  • All of these variables are interval scale variables. -> Use Pearson’s correlation to measure the strength and direction of the linear relationship.

    • Default to a two-tailed test for statistical significance.

Calculating Correlation Coefficient in R (cont.)

  • We will rely on psych package for an efficient and clean workflow.

    • Computes both the correlation coefficient and the p-value in a single step.
    • Saves time and reduces redundancy compared to cor() and cor.test().
  • Load Data: Import the dataset into R.

  • SSelect Variables: Focus on the 4 Seat Economy Gasoline model preference and the six lifestyle types.

    • Rename variables to their original names (e.g., Novelist, Innovator) for clarity.
# Install and load the psych package
# install.packages("psych") #if you didn't install
library(tidyverse)
library(psych)
setwd("~/Documents/GitHub/Marketing-Research-2025-Spring")
# Step 1: Load the dataset
auto_concept <- read_csv("auto_concept.csv")

# Step 2: Select relevant columns
# Include 'economygas4seat' and the six lifestyle variables
selected_data <- auto_concept %>%
  select(economygas4seat, lifestyle1, lifestyle2, lifestyle3, lifestyle4, lifestyle5, lifestyle6) %>%
  rename(
    "Economy Gasoline" = economygas4seat,
    "Novelist" = lifestyle1,
    "Innovator" = lifestyle2,
    "Trendsetter" = lifestyle3,
    "Forerunner" = lifestyle4,
    "Mainstreamer" = lifestyle5,
    "Classic" = lifestyle6
  )

Calculating Correlation Coefficient in R (cont.)

  • Use the corr.test() function in the psych package for Pearson correlation.

    • Correlation matrix: Measures strength and direction.
    • P-Value: Indicates statistical significance.
# Step 3: Compute correlations and p-values
# Use rcorr to compute correlation matrix and significance
results <- corr.test(selected_data, method = "pearson")

# Step 4: Extract correlation coefficients and p-values
results
results$stars
Correlation Matrix
Economy Gasoline Novelist Innovator Trendsetter Forerunner Mainstreamer Classic
Economy Gasoline 1.00 0.09 -0.12 0.03 0.22 -0.01 0.63
Novelist 0.09 1.00 -0.06 0.09 0.12 0.02 0.07
Innovator -0.12 -0.06 1.00 -0.17 0.00 -0.02 -0.07
Trendsetter 0.03 0.09 -0.17 1.00 0.01 -0.04 0.07
Forerunner 0.22 0.12 0.00 0.01 1.00 0.11 0.11
Mainstreamer -0.01 0.02 -0.02 -0.04 0.11 1.00 -0.04
Classic 0.63 0.07 -0.07 0.07 0.11 -0.04 1.00
P-value
Economy Gasoline Novelist Innovator Trendsetter Forerunner Mainstreamer Classic
Economy Gasoline 1*** 0.09 -0.12** 0.03 0.22*** -0.01 0.63***
Novelist 0.09** 1*** -0.06 0.09* 0.12** 0.02 0.07
Innovator -0.12*** -0.06 1*** -0.17*** 0 -0.02 -0.07
Trendsetter 0.03 0.09** -0.17*** 1*** 0.01 -0.04 0.07
Forerunner 0.22*** 0.12*** 0 0.01 1*** 0.11** 0.11*
Mainstreamer -0.01 0.02 -0.02 -0.04 0.11*** 1*** -0.04
Classic 0.63*** 0.07* -0.07* 0.07* 0.11*** -0.04 1***

Calculating Correlation Coefficient in R (cont.)

  • Classic (r=0.63, p<0.01): Strong positive & significant correlation.
    • Highlights that individuals identifying as Classic are the most likely to prefer the economy gasoline model.
  • Forerunner (r=0.22, p<0.01): Moderate positive & significant correlation.
    • Suggests Forerunners are more likely to prefer the economy gasoline model
  • Novelist (r=0.09, p<0.01): Weak positive & significant correlation
    • Suggests a slight association between preference for the economy gasoline model and the Novelist lifestyle.
  • Innovator (r=−0.12, p<0.01): Weak negative & significant correlation
    • Indicates that individuals identifying as Innovators are slightly less likely to prefer the economy gasoline model.

Agenda

  • What Are Associative Analyses?

  • Types of Relationships

  • Correlation Coefficient

  • Cross Tabulation and Chi-Square Test

  • Special Consideration in Associative Analyses

Cross-Tabulation Analysis

  • Cross-tabulation is a statistical method used to examine nonmonotonic relationships between two nominally scaled variables.
  • A cross-tabulation table is sometimes referred to as an “r×c” (r-by-c) table because it is comprised of rows and columns.
    • Rows (r): Represent categories of one variable.
    • Columns (c): Represent categories of another variable.
    • Cells: The intersection of rows and columns, containing the frequency of occurrences.

Cross-Tabulation Analysis Example

  • In a survey (200 respondents), researchers examine the relationship between Occupation (nominal variable) and Michelob Ultra Beer Purchasing Behavior (nominal variable).
  • Two Nominal Variables:
    1. Occupation: White Collar (160) & Blue Collar (40).

    2. Beer Purchasing Behavior: Buyers (166) & Nonbuyers (34) of Michelob Ultra beer.

  • Cross-tabulation analysis uses both nominal variables simultaneously and tallies up the cell frequencies
Occupation Status Buyer Nonbuyer Row Total
White Collar 152 8 160
Blue Collar 14 26 40
Column Total 166 34 200

Raw Percentages Table

  • In a cross-tabulation table, raw frequencies can be converted into various types of percentages to provide additional insights.
  • The raw percentages table is derived by dividing each raw frequency by the grand total (200 in this case) and multiplying by 10 \[\text{Raw Percentage}=\frac{\text{Cell Freqeuncy}}{\text{Grand Total}}\times100\]
Buyer Nonbuyer Total
White Collar % of Total 76% 4% 80%
Blue Collar % of Total 7% 13% 20%
Total % of Total 83% 17% 100%

Row Percentages Table

  • The row percentages table calculates percentages relative to the row totals (e.g., 160 for White Collar, 40 for Blue Collar). \[\text{Row Cell Percentage}=\frac{\text{Cell Freqeuncy}}{\text{Total of Cell Frequencies in that Row}}\times100\]
  • e.g., The first column is calculated by \(\frac{152}{160}\times100\) and \(\frac{14}{40}\times100\)
Occupation Status Buyer (%) Nonbuyer (%) Row Total (%)
White Collar 95% 5% 100%
Blue Collar 35% 65% 100%

Column Percentages Table

  • The row percentages table calculates percentages relative to the row totals (e.g., 160 for White Collar, 40 for Blue Collar). \[\text{Column Cell Percentage}=\frac{\text{Cell Freqeuncy}}{\text{Total of Cell Frequencies in that Column}}\times100\]
  • e.g., The first row is calculated by \(\frac{152}{166}\times100\) and \(\frac{8}{34}\times100\)
Occupation Status Buyer (%) Nonbuyer (%)
White Collar 91.57% 23.53%
Blue Collar 8.43% 76.47%
Column Total 100% 100%

Chi-Square (χ²) Analysis Overview

  • Chi-square (χ²) analysis is a statistical technique used to examine the frequencies of two nominally scaled variables in a cross-tabulation table.

  • It assesses nonmonotonic association in a cross-tabulation table based upon differences between observed and expected frequencies

  • It determines whether there is a statistically significant relationship between the two variables.

    • Null Hypothesis: Assumes no association between the variables (independence).

      • e.g.,: The distribution of buyers and nonbuyers is independent of occupation.
    • Alternative Hypothesis: Assumes that the variables are associated (dependent).

Observed Frequencies in Chi-Square Analysis

  • Chi-square analysis compares observed frequencies (actual counts from the data) to expected frequencies (theoretical counts assuming no association). The degree of difference is expressed in the Chi-square test statistic.

  • Observed Frequencies (\(O_{ij}\)):

    • These are the actual cell counts in the cross-tabulation table.
    • Example: 152 White Collar buyers and 8 White Collar nonbuyers.
Occupation Status Buyer Nonbuyer Row Total
White Collar 152 8 160
Blue Collar 14 26 40
Column Total 166 34 200

Expected Frequencies in Chi-Square Analysis

  • Expected Frequencies (\(E_{ij}\)): These are theoretical frequencies calculated under the null hypothesis of no association.

  • Formula: \(E_{ij} = \frac{\text{Row Total}_{i} \times \text{Column Total}_{j}}{\text{Grand Total}}\)

\[ \text{White-collar buyer} = \frac{160 \times 166}{200} = 132.8 \]

\[ \text{White-collar nonbuyer} = \frac{160 \times 34}{200} = 27.2 \]

\[ \text{Blue-collar buyer} = \frac{40 \times 166}{200} = 33.2 \]

\[ \text{Blue-collar nonbuyer} = \frac{40 \times 34}{200} = 6.8 \]

The Computed χ² Value

  • The Chi-square (χ²) value is a single number summarizing how much the observed frequencies deviate from the expected frequencies in a cross-tabulation table.

  • It provides a measure of how closely the data align with the null hypothesis of no association. \[ \chi^2 = \sum_{i=1,j=1}^n \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \] Where:

  • \(O_{ij}\) : Observed frequency in row i and column j

  • \(E_{ij}\): Expected frequency in row i and column j

  • \(n\): Number of cells

Degrees of Freedom in Chi-Square Analysis

  • Degrees of freedom represent the number of values in a calculation that are free to vary while still meeting the constraints of the data.

  • In the context of Chi-square analysis, degrees of freedom are used to determine the critical value from the Chi-square distribution table.

\[ df = (\text{Number of Rows} - 1) \times (\text{Number of Columns} - 1) \]

  • 2x2 Table (Michelob Ultra Example): - 2 Rows (White Collar, Blue Collar) and 2 Columns (Buyer, Nonbuyer) \[ df = (2 - 1) \times (2 - 1) = 1 \]

Characteristics of the Chi-Square Distribution

  • We will use the chi-square distribution to determine the statistical significance of chi-square value.
  • The Chi-square distribution is skewed to the right.
    • The skewness decreases as the degrees of freedom (df) increase.

    • For large df, the Chi-square distribution approximates a normal distribution

  • The rejection region is always located in the right-hand tail of the distribution.
  • Larger Chi-square values fall in the rejection region, indicating a significant result.

Characteristics of the Chi-Square Distribution

The Computed χ² Value Practice

  • Observed Frequencies (O)
Occupation Status Buyer Nonbuyer Row Total
White Collar 152 8 160
Blue Collar 14 26 40
Column Total 166 34 200
  • Expected Frequencies (E)
Occupation Status Buyer Nonbuyer Row Total
White Collar 132.8 27.2 160
Blue Collar 33.2 6.8 40
Column Total 166 34 200

The Computed χ² Value Practice

\[ \chi^2 = \frac{(152 - 132.8)^2}{132.8} + \frac{(8 - 27.2)^2}{27.2} + \frac{(14 - 33.2)^2}{33.2} + \frac{(26 - 6.8)^2}{6.8}=81.70 \] \[ df = (2 - 1) \times (2 - 1) = 1 \]

  • Statistical significance is determined by comparing the computed Chi-square value with the critical value from the Chi-square distribution table or by using a p-value.

  • Modern statistical software automates the process.

How the Chi-Square Value is Computed

  • Compare Observed and Expected Frequencies: Subtract the expected frequency \(E\) from the observed frequency \(O\); \((O - E)\).
  • Adjust for Negative Values: Square the difference to eliminate negative values and prevent cancellation effects: \((O - E)^2\)
  • Normalize by Expected Frequency: Divide the squared difference by the expected frequency \(E\) to adjust for differences in cell sizes: \(\frac{(O - E)^2}{E}\)
  • Summation Across All Cells: Add these normalized values across all cells in the cross-tabulation table: \[ \chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]

Interpreting the Chi-Square Value

  • Large Deviations:
    • If observed frequencies deviate significantly from expected frequencies -> the computed Chi-square value increases.
  • Small Deviations:
    • If observed frequencies are close to expected frequencies -> the computed Chi-square value remains small.
  • The Chi-square value provides a summary measure of the departure of observed frequencies from expected frequencies.

Cross Tabulation in R

  • Okay, let’s practice what we have learned in R.

  • Recall that we have several nominal variables in our auto_concept.csv data.

    • Marital Status (marital): Unmarried(0) / Married (1) (Nominal Variable)

    • Favorite Newspaper Section: (newspaper): Editorial, Business, Local News, National News, Sports, Entertainment, Do Not Read (Nominal Variable)

  • We will use janitor package for the associative analysis

library(tidyverse)
#install.packages("janitor")  # Install if not already installed
library(janitor)
setwd("~/Documents/GitHub/Marketing-Research-2025-Spring")
auto_concept <- read_csv("auto_concept.csv")
# Rename marital and newspaper variables
auto_concept <- auto_concept %>%
  mutate(
    marital = factor(marital,
                     levels = c(0, 1),
                     labels = c("Unmarried", "Married")),
    newspaper = factor(newspaper,
                       levels = c(1, 2, 3, 4, 5, 6, 7),
                       labels = c("Editorial", "Business", "Local News",
                                  "National News", "Sports", "Entertainment", "Do Not Read"))
  )

Cross Tabulation in R (cont.)

  • The tabyl() within janitor package provides simple way to calculate the cross table.
# Create a cross-tabulation table
cross_tab <- auto_concept %>%
  tabyl(marital, newspaper)
cross_tab
Observed Frequencies: Marital Status vs. Newspaper Section
marital Editorial Business Local News National News Sports Entertainment Do Not Read
Unmarried 0 5 16 4 53 7 25
Married 94 199 301 37 183 52 24

Cross Tabulation in R (cont.)

  • Let’s do the chi-square test.
  • The p-value is essentially zero; it indicates a statistically significant association between marital status and newspaper section preferences.
  • However, it does not specify the nature of the relationship.
  • To understand this, we must analyze the row or column percentages.
# Perform Chi-square test
chi_test <- chisq.test(cross_tab)

# Display Chi-square test results
chi_test

    Pearson's Chi-squared test

data:  cross_tab
X-squared = 150.24, df = 6, p-value < 2.2e-16

Cross Tabulation in R (cont.)

  • We can easily calculate the raw percentages table

  • adorn_percentages("all") does the job!

cross_tab_total <- auto_concept %>%
  tabyl(marital, newspaper) %>%
  adorn_percentages("all") %>%
  adorn_pct_formatting(digits = 2)

cross_tab_total
Raw Percentages: Marital Status vs. Newspaper Section
marital Editorial Business Local News National News Sports Entertainment Do Not Read
Unmarried 0.00% 0.50% 1.60% 0.40% 5.30% 0.70% 2.50%
Married 9.40% 19.90% 30.10% 3.70% 18.30% 5.20% 2.40%

Cross Tabulation in R (cont.)

  • Similarly, we can easily calculate the row percentages table

  • adorn_percentages("row") does the job!

cross_tab_total <- auto_concept %>%
  tabyl(marital, newspaper) %>%
  adorn_percentages("row") %>%
  adorn_pct_formatting(digits = 2)

cross_tab_total
Row Percentages: Marital Status vs. Newspaper Section
marital Editorial Business Local News National News Sports Entertainment Do Not Read
Unmarried 0.00% 4.55% 14.55% 3.64% 48.18% 6.36% 22.73%
Married 10.56% 22.36% 33.82% 4.16% 20.56% 5.84% 2.70%

Cross Tabulation in R (cont.)

  • Similarly, we can easily calculate the column percentages table

  • adorn_percentages("col") does the job!

cross_tab_total <- auto_concept %>%
  tabyl(marital, newspaper) %>%
  adorn_percentages("col") %>%
  adorn_pct_formatting(digits = 2)

cross_tab_total
Column Percentages: Marital Status vs. Newspaper Section
marital Editorial Business Local News National News Sports Entertainment Do Not Read
Unmarried 0.00% 2.45% 5.05% 9.76% 22.46% 11.86% 51.02%
Married 100.00% 97.55% 94.95% 90.24% 77.54% 88.14% 48.98%

Key Observations from Row Percentages Table

  • Unmarried Individuals:

    • Sports is the dominant preference, at about 48% of their readership.

    • Do Not Read is the second most frequent choice, at about 23%.

  • Married Individuals:

    • Local News is the most common preference, at around 34%.

    • Business comes next, with about 22%; Sports is also significant, at 21%.

Row Percentages: Marital Status vs. Newspaper Section
marital Editorial Business Local News National News Sports Entertainment Do Not Read
Unmarried 0.00% 4.55% 14.55% 3.64% 48.18% 6.36% 22.73%
Married 10.56% 22.36% 33.82% 4.16% 20.56% 5.84% 2.70%

Marketing Implications

  • If you are a marketing manager, what would you recommend for the CMO based on this results?
    • Advertisers targeting Unmarried individuals should focus on Sports sections or non-traditional platforms.
    • For Married individuals, content in Local News and Business sections would be more effective.
    • Tailor messaging based on these patterns. For example, community-based initiatives might resonate more with Married individuals.
    • Editorial, National News, and Entertainment sections need reevaluation as they perform poorly across both marital status groups.

Agenda

  • What Are Associative Analyses?

  • Types of Relationships

  • Correlation Coefficient

  • Cross Tabulation and Chi-Square Test

  • Special Consideration in Associative Analyses

Special Considerations in Association Procedures

1. Scaling Assumptions

  • Correlation: Requires both variables to have interval-level scaling at a minimum.

  • Cross-Tabulation: Used when one or both variables are nominally scaled.

2. Analysis is Limited to Two Variables

  • Association analyses focus solely on the relationship between two variables.

  • Assumptions: Other variables are assumed constant or “frozen.”

  • Limitation: Interactions with other variables are ignored, which can oversimplify complex relationships.

Special Considerations in Association Procedures (cont.)

3. Correlation ≠ Causation

  • Neither correlations nor cross-tabulations imply cause-and-effect relationships.

4. Pearson Correlation Only Identifies Linear Relationships

  • A correlation coefficient of ~0 does not mean no relationship exists—it simply means no linear relationship exists.

  • Nonlinear relationships (e.g., S-shape or J-shape patterns) will not be detected by the Pearson product-moment correlation.

Next Class

  • xx