https://rpubs.com/kirui/1178406

Enockevans Kirui
Course: Statistical Modeling and Regression

column1

1.0 Introduction

Under the mentorship of Prof. Bradford Dykes in the STA 631 Modeling and Regression course, my com-prehension of statistical principles has undergone significant development. This journey has empowered meto construct a robust foundation in statistical methods, enabling the interpretation of complex models andthe extraction of meaningful insights from data. This report encapsulates the application of these skills toa practical project analyzing customer data from an e-commerce company. By leveraging techniques taughtin the STA 631 course, this project aims to explore the intricate relationship between customer engagementmetrics and sales performance.

2.0 Project Objective

The primary aim of this project is to delve into the intricate relationship between various customer engage-ment metrics and sales performance within the e-commerce domain. Through thorough analysis of factorssuch as average session length, time spent on the app, time spent on the website, and length of member-ship, I seek to unearth the underlying drivers of sales. Utilizing advanced statistical techniques like multipleregression analysis, the project endeavors to construct predictive models capable of accurately forecastingsales based on comprehensive customer engagement data. Ultimately, the project aims to provide actionablerecommendations to optimize customer engagement strategies and enhance sales revenue in the e-commercesector.

3.0 Methodology

The methodology section outlines the approach taken to conduct the analysis on customer engagement metrics and their correlation with sales performance.

column1

3.1 Data Collection and Preprocessing

Data collection began with importing the dataset from the company’s e-commerce platform. The dataset was loaded into R using the read_csv function. Upon examination, it was observed to contain 500 entries with 8 columns. To streamline the analysis, certain columns such as Email, Address, and Avatar were excluded as they were deemed irrelevant to the analysis. Additionally, column names were standardized for clarity and consistency.

3.2 Exploratory Data Analysis

Exploratory Data Analysis (EDA) was conducted to gain insights into the distribution and relationships among variables. This involved statistical methods and visualization techniques to explore the dataset.The ggpairs function was used to create a pairplot, allowing for the visualization of relationships between all numerical features. The pairplot revealed that the length of membership has the strongest correlation with the yearly amount spent.

column3

A linear model plot of yearly amount spent vs. length of membership confirmed a positive correlation between these two variables.

In my statistical modeling phase, I employed a multiple linear regression model to uncover the relationship between customer engagement metrics and sales performance. Trained meticulously on the designated dataset, this model discerned patterns and nuances within the data. Rigorous evaluation using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) provided insights into predictive accuracy. This process enabled me to refine my analytical approach and extract meaningful insights. Grounded in accuracy, my analyses informed decision-making and strategic planning effectively.

[1] "MAE: 7.05996921759796"
[1] "MSE: 80.4022713401859"
[1] "RMSE: 8.96673136322183"

Displayed below is a scatterplot illustrating the comparison between the actual values from the test dataset and the corresponding values predicted by our regression model. This visualization serves as a comprehensive depiction of how closely the model’s predictions align with the true values of the test data. By plotting each actual value against its corresponding predicted value, we gain a nuanced understanding of the model’s performance across a range of data points. This comparison allows us to assess the accuracy and reliability of our model’s predictions, providing valuable insights into its effectiveness in capturing the underlying patterns and relationships within the dataset.

coluumn2

I got a very good model with a good fit and i explored the residuals to make sure everything was okay with my data.

As I consider whether to prioritize mobile app or website enhancements, or if membership duration is paramount, we must analyze the coefficients meticulously. They reveal the impact of metrics like session length, app and website usage on sales. Also, the significant coefficient for membership duration suggests its critical role in sales generation..

                     Coefficient
Avg_Session_Length    25.7342711
Time_on_App           38.7091538
Time_on_Website        0.4367388
Length_of_Membership  61.5773238

coluumn2

Interpreting the coefficients:

Holding all other features fixed, a 1 unit increase in Avg. Session Length is associated with an increase of 25.73 total dollars spent.
Holding all other features fixed, a 1 unit increase in Time on App is associated with an increase of 38.71 total dollars spent.
Holding all other features fixed, a 1 unit increase in Time on Website is associated with an increase of 0.44 total dollars spent.
Holding all other features fixed, a 1 unit increase in Length of Membership is associated with an increase of 61.58 total dollars spent.

Conclussion

In conclusion, this project has provided valuable insights into the dynamics between customer engagement metrics and sales performance within the e-commerce domain. Through meticulous data analysis and statistical modeling, we’ve uncovered significant relationships and identified key drivers of sales revenue.

The findings suggest that while both mobile app and website usage play essential roles in influencing sales, membership duration emerges as a critical factor. Customers with longer membership durations tend to contribute significantly more to sales revenue. Therefore, strategies aimed at fostering long-term customer relationships may yield substantial benefits for e-commerce businesses.

Furthermore, the predictive models developed in this project demonstrate promising accuracy in forecasting sales based on customer engagement data. Leveraging these models, businesses can make informed decisions and implement targeted marketing strategies to optimize sales revenue.

Overall, this project underscores the importance of data-driven insights in guiding strategic decision-making in the competitive e-commerce landscape. By leveraging statistical techniques and advanced analytics, businesses can enhance customer engagement strategies, drive sales growth, and stay ahead in today’s dynamic market environment.