Enockevans Kirui
Course: Statistical Modeling and Regression

column1

1.0 Introduction

Throughout the semester, under the guidance of Prof. Bradford Dykes in the STA 631 Modeling and Regression course, my understanding of statistical concepts and methodologies has significantly evolved. Initially faced with limited proficiency in these areas, the daily learning journey has enabled me to construct a robust foundation in statistical methods. This newfound understanding has equipped me with the confidence to interpret complex models and derive meaningful insights from data. As part of the coursework, I applied these skills to a practical project focusing on the analysis of customer data from an e-commerce company. This project aimed to explore the relationship between customer engagement metrics and sales performance, leveraging the techniques taught in the STA 631 course.

2.0 Project Objective

The primary aim of this project is to delve into the intricate relationship between various customer engagement metrics and sales performance within the e-commerce domain. Through thorough analysis of factors such as average session length, time spent on the app, time spent on the website, and length of membership, I seek to unearth the underlying drivers of sales. Utilizing advanced statistical techniques like multiple regression analysis, the project endeavors to construct predictive models capable of accurately forecasting sales based on comprehensive customer engagement data. Ultimately, the project aims to provide actionable recommendations to optimize customer engagement strategies and enhance sales revenue in the e-commerce sector.

column1

3.0 Methodology

The methodology section outlines the approach taken to conduct the analysis on customer engagement metrics and their correlation with sales performance.

3.1 Data Collection and Preprocessing

Data collection began with importing the dataset from the company’s e-commerce platform. The dataset was loaded into R using the read_csv function. Upon examination, it was observed to contain 500 entries with 8 columns. To streamline the analysis, certain columns such as Email, Address, and Avatar were excluded as they were deemed irrelevant to the analysis. Additionally, column names were standardized for clarity and consistency.

3.2 Exploratory Data Analysis

Exploratory Data Analysis (EDA) was conducted to gain insights into the distribution and relationships among variables. This involved statistical methods and visualization techniques to explore the dataset.The ggpairs function was used to create a pairplot, allowing for the visualization of relationships between all numerical features. The pairplot revealed that the length of membership has the strongest correlation with the yearly amount spent.

column2

A linear model plot of yearly amount spent vs. length of membership confirmed a positive correlation between these two variables.

In my statistical modeling phase, I employed a multiple linear regression model to uncover the relationship between customer engagement metrics and sales performance. Trained meticulously on the designated dataset, this model discerned patterns and nuances within the data. Rigorous evaluation using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) provided insights into predictive accuracy. This process enabled me to refine my analytical approach and extract meaningful insights. Grounded in accuracy, my analyses informed decision-making and strategic planning effectively.

[1] "MAE: 7.05996921759796"
[1] "MSE: 80.4022713401859"
[1] "RMSE: 8.96673136322183"

Displayed below is a scatterplot illustrating the comparison between the actual values from the test dataset and the corresponding values predicted by our regression model. This visualization serves as a comprehensive depiction of how closely the model’s predictions align with the true values of the test data. By plotting each actual value against its corresponding predicted value, we gain a nuanced understanding of the model’s performance across a range of data points. This comparison allows us to assess the accuracy and reliability of our model’s predictions, providing valuable insights into its effectiveness in capturing the underlying patterns and relationships within the dataset.

coluumn2

I got a very good model with a good fit and i explored the residuals to make sure everything was okay with my data.

As I consider whether to prioritize mobile app or website enhancements, or if membership duration is paramount, we must analyze the coefficients meticulously. They reveal the impact of metrics like session length, app and website usage on sales. Also, the significant coefficient for membership duration suggests its critical role in sales generation..

                     Coefficient
Avg_Session_Length    25.7342711
Time_on_App           38.7091538
Time_on_Website        0.4367388
Length_of_Membership  61.5773238

coluumn2

Interpreting the coefficients:

Holding all other features fixed, a 1 unit increase in Avg. Session Length is associated with an increase of 25.73 total dollars spent.
Holding all other features fixed, a 1 unit increase in Time on App is associated with an increase of 38.71 total dollars spent.
Holding all other features fixed, a 1 unit increase in Time on Website is associated with an increase of 0.44 total dollars spent.
Holding all other features fixed, a 1 unit increase in Length of Membership is associated with an increase of 61.58 total dollars spent.

Conclussion

In conclusion, this project has provided valuable insights into the dynamics between customer engagement metrics and sales performance within the e-commerce domain. Through meticulous data analysis and statistical modeling, we’ve uncovered significant relationships and identified key drivers of sales revenue.

The findings suggest that while both mobile app and website usage play essential roles in influencing sales, membership duration emerges as a critical factor. Customers with longer membership durations tend to contribute significantly more to sales revenue. Therefore, strategies aimed at fostering long-term customer relationships may yield substantial benefits for e-commerce businesses.

Furthermore, the predictive models developed in this project demonstrate promising accuracy in forecasting sales based on customer engagement data. Leveraging these models, businesses can make informed decisions and implement targeted marketing strategies to optimize sales revenue.

Overall, this project underscores the importance of data-driven insights in guiding strategic decision-making in the competitive e-commerce landscape. By leveraging statistical techniques and advanced analytics, businesses can enhance customer engagement strategies, drive sales growth, and stay ahead in today’s dynamic market environment.