2026-03-26

Tempe Pre-Calls / Texts Analysis

DAT 301 Midterm Project

Author: Khoa Vo

Project Purpose

This project will be taken data from the Tempe Pre-Calls for the SunDevil Days event from the Administration Services Department. The goal is to identify the pattern in outreach records and results after every call, student common majors, locations, and event-related trends.

Main questions:

  • What the note for the call result at the rightmost column appear most often? (vm = voicemail, conf = confirm)
  • Which states and cities appear most frequently?
  • What program and campaign interests appear most often?
  • Are there visible relationships among location, stage, and status to the call results?

Dataset Overview

  • File used: Tempe Pre-Calls_ Texts - SDD Mon 3_30.csv (I am a student worker for Admission Services, I used the old data to do this project)
  • Number of observations: 202
  • Number of variables after cleaning: 26
  • All records are tied to the same event date (03-30-26) and event location (Tempe)
  • Status is constant (Registered) and Type is constant (First Time Freshman), so they are useful for context but not for variation-based analysis
  • Important analysis fields include Notes, Mailing.City, Mailing.State.Province, Plan, Primary.Interest..Campaign.Name, Location, and Stage

Data Prep

library(dplyr)
library(ggplot2)
library(plotly)

tempe 
  <- read.csv("Tempe Pre-Calls_ Texts - SDD Mon 3_30.csv", stringsAsFactors = FALSE)
names(tempe) 
  <- make.names(names(tempe))

analysis_df 
  <- tempe %>%
  mutate(
    notes_lower = tolower(trimws(Notes)),
    call_result_group = case_when(
      grepl("^vm", notes_lower) ~ "Voicemail",
      grepl("^conf", notes_lower) ~ "Confirmed",
      TRUE ~ "Other"
    )
  )

Most Common Call Result Notes (RESULTS)

Notes Analysis

  • vm is appearing to be the dominant note by a large amount of our pre-call results.
  • conf which means confirmation is the second most result.
  • There are some other notes as the results of the call like date changes, event changes, and other smaller outcomes.

Most Frequent States and Cities

States and Cities Analysis

  • There are 2 main states: California and Arizona contribute the highest number of students as Mesa and Chandler in AZ appear among the most common Arizona cities.
  • The city distribution is more spread out than the state distribution.

Most Common Student Majors / Plans

Majors / Plans Analysis

  • Finance is taking place the most for Tempe.
  • Psychology, Political Science, and Business also appear often.
  • The list suggests strong interest in business and social science related majors (no strong number for Engineering).

Campaign Interest Distribution (Interactive Plotly)

Campaign Analysis

  • Business is the largest campaign interest in this dataset.
  • Public Service and Political Science and W. P. Carey School of Business also appear frequently.
  • Interest patterns are concentrated in a small number of campaigns.

3D View of Location, Stage, and Call Result

3D Plot Analysis

  • This plot is used to visualize three variables at a time: Location, Stage, and grouped call result.
  • Hover text is showing the original values, so the encoded axes still connect back to the real data.

Statistical Analysis

Tests for Call Result Relationships
Relationship Statistic DF P.value Significant
Stage vs Call Result 9.49 10 0.486 No
Location vs Call Result 9.09 6 0.168 No
Status vs Call Result NA NA NA N/A

Statistical Commentary

  • A chi-square test was used because the variables are categorical.
  • Stage vs Call Result has p-value 0.486.
  • Location vs Call Result has p-value 0.168.
  • Status vs Call Result was not testable because every record is Registered.

Key Findings and Conclusion

  • The note vm is the most common call result, followed by conf. Resulting in Voicemail and Confirm.
  • California and Arizona are the most frequent states in the contact list.
  • Finance, Psychology, Political Science, and Business are among the most common intended majors.
  • Business and Public Service related campaign interests dominate the campaign field.
  • Status cannot explain call-result differences because every record is Registered.
  • Stage and location can be explored in relation to grouped call results, and the statistical slide summarizes those relationships.
  • Finalizatin. This dataset shows that voicemail is the dominant outreach result.