In this project, I conducted an analysis of customer data from an e-commerce company. The dataset provided contains various customer information such as email, address, numerical value columns like average session length, time spent on the app and website, length of membership, and yearly amount spent by each customer.

column1

Exploratory Data Analysis

Upon loading and examining the dataset, I observed that it contains 500 entries with 8 columns. I primarily focused on the numerical data for our analysis. I utilized statistical methods and visualization techniques to explore the relationships between different variables.

Email Address Avatar Avg. Session Length Time on App Time on Website Length of Membership Yearly Amount Spent
835 Frank Tunnel
Wrightmouth, MI 82180-9605 Violet 34.49727 12.65565 39.57767 4.082621 587.9511
4547 Archer Common
Diazchester, CA 06566-8576 DarkGreen 31.92627 11.10946 37.26896 2.664034 392.2049

The pairplot allowed me to visualize the relationships between all numerical features. It indicated that the length of membership has the strongest correlation with the yearly amount spent.

column2

A linear model plot of yearly amount spent vs. length of membership confirmed a positive correlation between these two variables.

I split the data into training and testing sets and trained a linear regression model using the training data. The model was evaluated using various metrics including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). The model performed well with low error values.

``

[1] "MAE: 7.05996921759796"
[1] "MSE: 80.4022713401859"
[1] "RMSE: 8.96673136322183"

Below a scatterplot of the real test values versus the predicted values.

coluumn2

Residuals:I gotten a very good model with a good fit. Let explore the residuals to make sure everything was okay with our data.

I still want to figure out the answer to the original question, do we focus our efforst on mobile app or website development? Or maybe that doesn’t even really matter, and Membership Time is what is really important. Let’s see if we can interpret the coefficients at all to get an idea.

                     Coefficient
Avg_Session_Length    25.7342711
Time_on_App           38.7091538
Time_on_Website        0.4367388
Length_of_Membership  61.5773238

Interpreting the coefficients:

Holding all other features fixed, a 1 unit increase in Avg. Session Length is associated with an increase of 25.73 total dollars spent. Holding all other features fixed, a 1 unit increase in Time on App is associated with an increase of 38.71 total dollars spent. Holding all other features fixed, a 1 unit increase in Time on Website is associated with an increase of 0.44 total dollars spent. Holding all other features fixed, a 1 unit increase in Length of Membership is associated with an increase of 61.58 total dollars spent.

Do you think the company should focus more on their mobile app or on their website?

This is tricky, there are two ways to think about this: Develop the Website to catch up to the performance of the mobile app, or develop the app more since that is what is working better.