Install Quality Over Install Volume: A Marketing Analytics Study of Credit Direct Mobile App Acquisition — April 2026

Author

Victoria Arinze

Published

May 13, 2026

1. Executive Summary

Credit Direct Mobile ran paid acquisition across multiple channels in April 2026, generating approximately 51,000 non-organic Android installs. Channels included Google Ads, programmatic display networks, Meta Ads, and other paid media sources. Install volume is a useful headline metric, but it does not tell you whether the spend was efficient.

This analysis reframes the performance question: not how many people installed, but how many actually engaged. Using raw AppsFlyer attribution data, a user is classified as converted if they triggered at least one tracked in-app event after installation. Five analytical techniques are applied: exploratory data analysis, data visualisation, hypothesis testing, correlation analysis, and logistic regression.

The findings show statistically significant differences in conversion rates across acquisition channels. These differences are systematic enough to support a reallocation of media spend. The core recommendation is a shift from cost-per-install to cost-per-engaged-user as the primary performance KPI for Q3 2026 planning.

2. Professional Disclosure

I am Victoria Arinze, Product Marketing and Growth Lead at Credit Direct Finance Company Limited, a Nigerian digital lending and fintech company. My work sits at the intersection of product, growth, marketing, and customer experience. A core part of my role involves evaluating the quality of paid acquisition channels, understanding user engagement post-install, and translating data insights into channel investment and campaign decisions.

Exploratory Data Analysis is the starting point for every channel review I conduct. Before making any budget recommendation, I need to understand the structure of the acquisition data — which channels drive volume, how installs are distributed across geographies and devices, and where data quality issues exist.

Data Visualisation is central to how performance is communicated at Credit Direct across stakeholder levels, from the growth team to the CFO. Attribution data needs to be translated into stories that non-technical stakeholders can act on.

Hypothesis Testing provides the statistical rigour needed to confirm whether observed differences in channel performance are real or simply noise from volume variation. When one channel appears to convert better than another, I need statistical confirmation before recommending a budget shift.

Correlation Analysis helps identify which channel mix, device profile, or geographic concentration is most associated with engaged users at the daily level — directly relevant to campaign scheduling and budget pacing decisions.

Logistic Regression provides a predictive framework for estimating the probability of conversion based on observable acquisition attributes. The output — odds ratios per channel — translates directly into a channel efficiency ranking that informs bid strategy and campaign prioritisation.

3. Data Collection & Sampling

Source: Raw AppsFlyer attribution data exported from the Credit Direct Mobile app account by the Growth team.

Files used: installs.csv contains non-organic Android install records for April 2026, approximately 51,000 rows across 50 variables. events.csv contains in-app event records for the same period, approximately 200,000 rows across 44 variables.

Collection method: Direct export from AppsFlyer raw data reports. The installs file contains one row per attributed install. The events file contains one row per tracked in-app event.

Time period: April 2026 (full calendar month).

Sampling frame: All non-organic Android installs attributed to paid channels during the period. No sampling was applied — this is the complete population of paid installs for the month.

Ethical notes: All personally identifiable information — including IP addresses, device IDs, and IMEI numbers — was removed from both files before analysis. The data was accessed in the course of my professional role. No external consent was required as the data reflects campaign attribution records, not individual user profiles.

Outcome variable: converted = 1 if the AppsFlyer ID appears in the events file (at least one in-app event recorded post-install). converted = 0 if no events were recorded.

Channel classification: If the Partner field is populated, the install is classified as Programmatic. If Media Source contains google, it is Google Ads. If Media Source contains facebook, instagram, or meta, it is Meta Ads. All other cases are Other Paid.

Code
installs_raw <- read_csv('installs.csv', show_col_types=FALSE)
events_raw   <- read_csv('events.csv',   show_col_types=FALSE)
installs <- installs_raw |> janitor::clean_names()
events   <- events_raw   |> janitor::clean_names()
converted_ids <- unique(events$apps_flyer_id)
installs <- installs |> mutate(converted=if_else(apps_flyer_id %in% converted_ids,1L,0L))
installs <- installs |> mutate(partner_str=trimws(as.character(partner)),
  channel=case_when(
    !is.na(partner) & partner_str!='' & partner_str!='FALSE' & partner_str!='NA' ~ 'Programmatic',
    grepl('google',tolower(media_source)) ~ 'Google Ads',
    grepl('facebook|instagram|meta',tolower(media_source)) ~ 'Meta Ads',
    TRUE ~ 'Other Paid'))
installs <- installs |> mutate(
  install_date=as.Date(substr(install_time,1,10)),
  install_hour=as.integer(substr(install_time,12,13)))
top5 <- installs |> count(country_code,sort=TRUE) |> slice_head(n=5) |> pull(country_code)
installs <- installs |> mutate(
  country_group=if_else(country_code %in% top5,country_code,'Other'),
  channel=factor(channel,levels=c('Google Ads','Meta Ads','Programmatic','Other Paid')))
cat('Installs:',nrow(installs),'| Converted:',sum(installs$converted),'| Rate:',round(mean(installs$converted)*100,2),'%\n')
Installs: 42020 | Converted: 25577 | Rate: 60.87 %

4. Data Description (EDA)

4.1 Install and Conversion Summary by Channel

Code
channel_summary <- installs |> group_by(channel) |>
  summarise(Installs=n(),Converted=sum(converted),Conv_Rate_Pct=round(mean(converted)*100,2),.groups='drop') |>
  arrange(desc(Installs))
knitr::kable(channel_summary,col.names=c('Channel','Total Installs','Converted Users','Conversion Rate (%)'),caption='Table 1: Install Volume and Conversion Rate by Acquisition Channel')
Table 1: Install Volume and Conversion Rate by Acquisition Channel
Channel Total Installs Converted Users Conversion Rate (%)
Google Ads 20362 13624 66.91
Programmatic 19746 10611 53.74
Other Paid 1909 1340 70.19
Meta Ads 3 2 66.67

4.2 Missing Value Analysis

Code
key_cols <- c('channel','country_code','device_category','carrier','converted')
missing_df <- data.frame(Variable=key_cols,Missing=sapply(key_cols,function(x)sum(is.na(installs[[x]]))),Missing_Pct=sapply(key_cols,function(x)round(mean(is.na(installs[[x]]))*100,2)))
knitr::kable(missing_df,col.names=c('Variable','Missing Count','Missing (%)'),caption='Table 2: Missing Value Summary')
Table 2: Missing Value Summary
Variable Missing Count Missing (%)
channel channel 0 0.00
country_code country_code 0 0.00
device_category device_category 0 0.00
carrier carrier 4993 11.88
converted converted 0 0.00

4.3 Daily Install Volume

Code
daily_vol <- installs |> count(install_date) |> summarise(Min=min(n),Max=max(n),Mean=round(mean(n),0),Median=median(n))
knitr::kable(daily_vol,caption='Table 3: Daily Install Volume Statistics')
Table 3: Daily Install Volume Statistics
Min Max Mean Median
753 1966 1401 1404

EDA Interpretation: Two data quality issues were identified and resolved. First, the Partner column used mixed data types across files — resolved by casting to character before channel classification. Second, install timestamps required substring extraction to produce usable date and hour features. No rows were dropped; the full population of April 2026 non-organic Android installs is retained.

5. Data Visualisation

Code
ggplot(channel_summary,aes(x=reorder(channel,-Installs),y=Installs,fill=channel))+geom_col(show.legend=FALSE)+geom_text(aes(label=scales::comma(Installs)),vjust=-0.4,size=3.5)+scale_y_continuous(labels=scales::comma)+scale_fill_manual(values=c('Google Ads'='#4285F4','Meta Ads'='#1877F2','Programmatic'='#FF6B35','Other Paid'='#6C757D'))+labs(title='Install Volume by Acquisition Channel',subtitle='Credit Direct Mobile App — April 2026',x='Channel',y='Total Installs')+theme_minimal(base_size=13)
Figure 1: Figure 1: Total Installs by Acquisition Channel
Code
ggplot(channel_summary,aes(x=reorder(channel,-Conv_Rate_Pct),y=Conv_Rate_Pct,fill=channel))+geom_col(show.legend=FALSE)+geom_text(aes(label=paste0(Conv_Rate_Pct,'%')),vjust=-0.4,size=3.5)+scale_fill_manual(values=c('Google Ads'='#4285F4','Meta Ads'='#1877F2','Programmatic'='#FF6B35','Other Paid'='#6C757D'))+labs(title='Conversion Rate by Acquisition Channel',subtitle='Converted = at least one in-app event post-install',x='Channel',y='Conversion Rate (%)')+theme_minimal(base_size=13)
Figure 2: Figure 2: Conversion Rate by Acquisition Channel
Code
daily_trend <- installs |> group_by(install_date,channel) |> summarise(n=n(),.groups='drop')
ggplot(daily_trend,aes(x=install_date,y=n,colour=channel))+geom_line(linewidth=0.8)+scale_colour_manual(values=c('Google Ads'='#4285F4','Meta Ads'='#1877F2','Programmatic'='#FF6B35','Other Paid'='#6C757D'))+scale_y_continuous(labels=scales::comma)+labs(title='Daily Install Volume by Channel',x='Date',y='Daily Installs',colour='Channel')+theme_minimal(base_size=13)
Figure 3: Figure 3: Daily Install Volume by Channel
Code
top10 <- installs |> count(country_code,sort=TRUE) |> slice_head(n=10)
ggplot(top10,aes(x=reorder(country_code,n),y=n))+geom_col(fill='#2E86AB')+geom_text(aes(label=scales::comma(n)),hjust=-0.2,size=3.2)+coord_flip()+scale_y_continuous(labels=scales::comma,expand=expansion(mult=c(0,0.15)))+labs(title='Top 10 Countries by Install Volume',x='Country',y='Installs')+theme_minimal(base_size=13)
Figure 4: Figure 4: Top 10 Countries by Install Volume
Code
device_ch <- installs |> group_by(channel,device_category) |> summarise(n=n(),.groups='drop') |> group_by(channel) |> mutate(pct=n/sum(n)*100)
ggplot(device_ch,aes(x=channel,y=pct,fill=device_category))+geom_col(position='stack')+scale_fill_brewer(palette='Set2')+labs(title='Device Category Mix by Acquisition Channel',x='Channel',y='Share (%)',fill='Device')+theme_minimal(base_size=13)
Figure 5: Figure 5: Device Category Mix by Channel

Visualisation Narrative: These five charts tell a single story — install volume and install quality are not the same thing. Figures 1 and 2 make that gap explicit: the channel generating the highest install volume is not necessarily leading on conversion rate. Figure 3 shows daily volume patterns by channel. Figure 4 shows geographic concentration of installs. Figure 5 shows device mix by channel — relevant because creative performance can differ significantly across device types.

6. Hypothesis Testing

Hypothesis 1: Conversion rate differs significantly across acquisition channels

H0: There is no significant association between acquisition channel and conversion outcome.

H1: Conversion rate differs significantly across acquisition channels.

Code
ct1 <- table(installs$channel,installs$converted)
cat('Contingency Table — Channel vs Converted:\n'); print(ct1)
Contingency Table — Channel vs Converted:
              
                   0     1
  Google Ads    6738 13624
  Meta Ads         1     2
  Programmatic  9135 10611
  Other Paid     569  1340
Code
chi1 <- chisq.test(ct1); print(chi1)

    Pearson's Chi-squared test

data:  ct1
X-squared = 803.22, df = 3, p-value < 2.2e-16
Code
cat('\nCramer V:',round(sqrt(chi1$statistic/(sum(ct1)*(min(dim(ct1))-1))),4),'\n')

Cramer V: 0.1383 

Interpretation: A p-value below 0.05 means we reject H0 — the differences in conversion rates across channels are statistically significant and not random noise. Cramer V quantifies the practical strength of the association independent of sample size.

Hypothesis 2: Conversion rate differs significantly by device category

H0: Device category has no significant association with conversion outcome.

H1: Conversion rate differs significantly by device category.

Code
ct2 <- table(installs$device_category,installs$converted)
cat('Contingency Table — Device Category vs Converted:\n'); print(ct2)
Contingency Table — Device Category vs Converted:
                         
                              0     1
  desktop                     2     2
  mobile_phone            16197 25113
  other                       0     1
  set_top_box                 2     0
  tablet                     77   173
  unknown_device_category   165   288
Code
chi2 <- chisq.test(ct2); print(chi2)

    Pearson's Chi-squared test

data:  ct2
X-squared = 12.735, df = 5, p-value = 0.02599
Code
cat('\nCramer V:',round(sqrt(chi2$statistic/(sum(ct2)*(min(dim(ct2))-1))),4),'\n')

Cramer V: 0.0174 

Interpretation: This test checks whether device type at install is associated with post-install engagement. A significant result means device type is not independent of conversion — an actionable signal for targeting and creative decisions.

7. Correlation Analysis

Code
daily_corr <- installs |> group_by(install_date) |>
  summarise(total_installs=n(),conversion_rate=mean(converted),pct_google=mean(channel=='Google Ads'),pct_meta=mean(channel=='Meta Ads'),pct_prog=mean(channel=='Programmatic'),pct_other=mean(channel=='Other Paid'),day_of_week=as.integer(format(min(install_date),'%u')),.groups='drop')
corr_vars <- daily_corr |> select(total_installs,conversion_rate,pct_google,pct_meta,pct_prog,pct_other,day_of_week)
cm <- cor(corr_vars,use='complete.obs')
colnames(cm) <- rownames(cm) <- c('Total Installs','Conv. Rate','% Google','% Meta','% Prog','% Other','Day')
corrplot::corrplot(cm,method='color',type='upper',addCoef.col='black',tl.col='black',tl.srt=45,number.cex=0.75)

Figure 6: Correlation Matrix — Daily Channel Mix and Conversion Rate

Correlation Interpretation: The matrix is built from daily aggregates — each row represents one calendar day in April 2026. The key relationships are between Conv. Rate and the four channel proportion columns. A positive correlation means days with more of that channel tended to produce more engaged users. The Day of Week variable captures whether engagement is systematically higher on certain days, relevant to budget pacing and scheduling decisions.

8. Logistic Regression

Code
model_data <- installs |> mutate(converted=factor(converted,levels=c(0,1),labels=c('No','Yes')),channel=relevel(factor(channel),ref='Google Ads'),country_group=factor(country_group),device_cat=factor(device_category)) |> select(converted,channel,country_group,device_cat) |> tidyr::drop_na()
set.seed(42)
train_idx <- sample(seq_len(nrow(model_data)),size=0.7*nrow(model_data))
train_data <- model_data[train_idx,]; test_data <- model_data[-train_idx,]
log_model <- glm(converted~channel+country_group+device_cat,data=train_data,family=binomial(link='logit'))
summary(log_model)

Call:
glm(formula = converted ~ channel + country_group + device_cat, 
    family = binomial(link = "logit"), data = train_data)

Coefficients:
                                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)                         0.083420   1.367657   0.061  0.95136    
channelMeta Ads                     0.000909   1.224875   0.001  0.99941    
channelProgrammatic                -0.538412   0.024734 -21.768  < 2e-16 ***
channelOther Paid                   0.166891   0.062157   2.685  0.00725 ** 
country_groupNG                     0.555147   0.606297   0.916  0.35986    
country_groupNL                     0.726689   0.855871   0.849  0.39585    
country_groupOther                  0.394840   0.642765   0.614  0.53903    
country_groupUK                     2.046413   1.211781   1.689  0.09126 .  
country_groupUS                     0.985706   0.700277   1.408  0.15925    
device_catmobile_phone              0.053671   1.225782   0.044  0.96508    
device_catother                     9.927460 119.474330   0.083  0.93378    
device_catset_top_box             -11.204595  84.485555  -0.133  0.89449    
device_cattablet                    0.458248   1.236851   0.370  0.71101    
device_catunknown_device_category   0.307876   1.231290   0.250  0.80255    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 39354  on 29412  degrees of freedom
Residual deviance: 38802  on 29399  degrees of freedom
AIC: 38830

Number of Fisher Scoring iterations: 9
Code
or_table <- data.frame(Predictor=names(coef(log_model)),Odds_Ratio=round(exp(coef(log_model)),3),CI_Lower=round(exp(confint.default(log_model)[,1]),3),CI_Upper=round(exp(confint.default(log_model)[,2]),3))
knitr::kable(or_table,col.names=c('Predictor','Odds Ratio','95% CI Lower','95% CI Upper'),caption='Table 4: Logistic Regression Odds Ratios and Confidence Intervals')
Table 4: Logistic Regression Odds Ratios and Confidence Intervals
Predictor Odds Ratio 95% CI Lower 95% CI Upper
(Intercept) (Intercept) 1.087 0.074 1.586300e+01
channelMeta Ads channelMeta Ads 1.001 0.091 1.104100e+01
channelProgrammatic channelProgrammatic 0.584 0.556 6.130000e-01
channelOther Paid channelOther Paid 1.182 1.046 1.335000e+00
country_groupNG country_groupNG 1.742 0.531 5.717000e+00
country_groupNL country_groupNL 2.068 0.386 1.106900e+01
country_groupOther country_groupOther 1.484 0.421 5.231000e+00
country_groupUK country_groupUK 7.740 0.720 8.321800e+01
country_groupUS country_groupUS 2.680 0.679 1.057200e+01
device_catmobile_phone device_catmobile_phone 1.055 0.095 1.166000e+01
device_catother device_catother 20485.250 0.000 1.019003e+106
device_catset_top_box device_catset_top_box 0.000 0.000 1.117226e+67
device_cattablet device_cattablet 1.581 0.140 1.785800e+01
device_catunknown_device_category device_catunknown_device_category 1.361 0.122 1.519800e+01
Code
test_probs <- predict(log_model,newdata=test_data,type='response')
test_preds <- factor(if_else(test_probs>=0.5,'Yes','No'),levels=c('No','Yes'))
conf_mat <- table(Predicted=test_preds,Actual=test_data$converted)
cat('Confusion Matrix:\n'); print(conf_mat)
Confusion Matrix:
         Actual
Predicted   No  Yes
      No     3    1
      Yes 4953 7650
Code
cat('\nModel Accuracy:',round(sum(diag(conf_mat))/sum(conf_mat)*100,2),'%\n')

Model Accuracy: 60.7 %

Regression Interpretation: The logistic regression models the probability of conversion based on acquisition channel, country group, and device category. Google Ads is the reference channel — all other channel coefficients are interpreted relative to it. An odds ratio above 1 means higher odds of converting than a Google Ads install. An odds ratio below 1 means lower odds. These figures translate directly into a channel efficiency ranking for Q3 budget decisions.

9. Integrated Findings

The five techniques applied here converge on one finding: install volume and install quality are not the same metric, and they do not reliably come from the same channel.

EDA established the baseline distribution of installs and conversion rates. Visualisation made the quality gap visible across channels, geographies, and device types. Hypothesis testing confirmed that differences in conversion rates across channels are statistically significant — not random variation. Correlation analysis identified which daily channel mix is most associated with higher conversion outcomes. Logistic regression quantified the channel-level effect on conversion probability, controlling for country and device type.

Recommendation: Credit Direct should adopt cost-per-engaged-user as the primary channel performance KPI for Q3 2026, replacing cost-per-install as the lead metric. Budget should be reallocated toward channels with the highest odds ratios in the regression model, with a 30-day observation window after reallocation to confirm the relationship holds at scale.

10. Limitations & Further Work

The converted variable is binary and coarse — a user who opened the app once is treated identically to one who completed a loan application. A graduated engagement score would improve model precision. The analysis covers a single month, so seasonal effects cannot be assessed. Logistic regression excludes campaign-level variables due to high cardinality; regularisation could improve this. AppsFlyer defaults to last-touch attribution, which may overstate bottom-of-funnel channel contributions.

Further work should extend the dataset to Q1 and Q2 2026 to test whether channel quality patterns hold over time, build a multi-touch attribution model to understand assisted conversion contributions, and implement a live dashboard tracking cost-per-engaged-user by channel on a rolling 7-day basis.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making — from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online

Arinze, V. (2026). Credit Direct Mobile App — AppsFlyer non-organic install and event data, April 2026 [Dataset]. Collected from Credit Direct Finance Company Limited, Lagos, Nigeria. Data available on request from the author.

R Core Team. (2024). R: A language and environment for statistical computing (Version 4.6.0). R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048

Appendix: AI Usage Statement

Claude (Anthropic) was used to assist with debugging Quarto rendering errors and structuring R code for data loading, variable construction, and output formatting. The analytical decisions — choice of techniques, business framing, interpretation of outputs, and the final recommendation — were made independently based on direct professional familiarity with the Credit Direct acquisition data and the requirements of this assessment. All text interpretation and business conclusions are my own.