An e-commerce store can generate thousands of interactions daily. With the help of e-commerce sales Exploratory Data Analysis(EDA), we can find fascinating insights which can be used to better understand the business and its customers. We have performed different types of analysis on the sales data of E-commerce and found some interesting facts about sales.
With sales data, cluster analysis can be performed, so we could define new strategies for pricing, customer segmentation, special discounts etc. It can be used to make better sales offer and increased sales in different states and hence help a business grow by increasing customer base also gain profit.
Below are the packages used in this project.
These are the datasets Used in this project and Imported the datasets using read.csv The overview of the datasets,
Different types of data pre-processing is performed by removing Null values,converted data types as per the requirement of the plot, rounded the numbers, and in some cases separated Date column,grouped by and summarized few columns,reordered and aggregated the data,built corpus,joined data, multiple operations performed to plot the graphs for clustering, probability, Text analysis and Time series. Converted unstructured into structured data by removing punctuations, stopwords, numbers, whitespaces etc. In the below chunk of code, we performed various data cleaning techniques for all the business questions.
The above graph represents that there are more number of orders on 2nd, 3rd and 21st of every month in 2020 that is 167,140,138 orders respectively, and the least orders were mostly placed on 31st which are 38 orders. This graph is plotted to find out the binomial distribution below.
According to the observations drawn from the previous plot, here from the above plot, we noticed 95% of the time, days in one month will have between 87 and 127 orders placed, Higher or lower values than this range can not happen due to random chance according to the binomial distribution.
From the above plot, we can see that highest discount’s were given by Samsung and Xiaomi Mobile brands,however poco is having only one brand with F1 model. Mostly Galaxy and Redmi models are having the best discounts.
This plot represents the optimal number of clusters, we found 5 is the optimal number to get the clusters and we derived it as 5 by using K-means elbow method. The elbow method which is showing the slight bend in the plot, which represents the number of clusters.
This plot shows the highest profit and sales, Here we noticed that California, New York and Washington are getting the highest profit and sales comparing with other states. We also analysed that Texas, Pennsylvania and Illinois are in loss with less number of sales. Few other states called Tennessee, North Carolina, Colorado are also in loss. Here the states like Illinois, Pennsylvania etc,. sales can be increase by giving discounts so that they will be getting the profit from sales.
From the K-medoids clustering using PAM algorithm, there are 3 different clusters with higher level medium level and lower level.The above plot represents that New York City, Seattle, Los Angeles, Philadelphia had the highest number of sales with a profit.
From the time series analysis performed in the above plot, we observed that there are more number of sales in November and December months because of the festival season in USA.
From the time series graph according to days, we can clearly say that more number of sales occur in the middle of the month because people spend all of their salary on expenses, food, groceries etc in the beginning of the month and then the rest of the amount they will be spending in online shopping in mid of every month.However these days people are earning more than they need, so people are also spending their salary in the beginning of the month, we can see that in the graph that there also more number of sales in the beginning.
From the text analysis, we found that buy, women, flipkartcom, online and products are the words most frequently used words in the product description by retailers. buy, women and flipkartcom are used in more than 10000 products in the description of the products..
In this plot the word cloud varies the size of the words based on the frequency.The word cloud can also receive a set of colors or a color palette as input to distinguish between the more and the lesser frequent words in the cloud. One of the main objectives of this study is to analyse the difference in keywords between those who recommend and those who don’t recommend the product.
Identification of sentiment scores in the above plot,which proved useful in assigning a numeric value to strength (of positivity, negativity, trust etc) of sentiments in the text and allowed interpreting score of the text.
The comparison cloud gives a clear contrast of words used by customers who are happy with the product compared to those who are not.The people who have not recommended the product have used negative words like problem, dirty, bad, poor etc. Also to express the originality and behavior of the product.
Another interesting aspect to notice is the “Positive” list of words having good, awesome, nice, etc. This could imply that most people who are happy with the product. customers who recommend a product are happy with the quality. So overall there are more number of negative reviews than positive but the positive words are like good,nice, awesome are high in count compared with the negative words.
Encapsulating the iceberg of the E-commerce analysis performed by using probability, clustering, Text mining and Time series, most Profitable States are New York, Washington and California. Customers generally prefer standard delivery. 95% of the time, days in one month will have between 87 and 127 orders placed.Buy, Women, Online and free are the topmost words used in description of the products. Attractive, Amazing are the positive words used frequently by the retailers to grab the attention. Highest number of sales occur in December because of the festival season.Mid of the month had the most sales. Customers used positive words like good, awesome, nice and negative words like problem, dirty, bad, poor but the positive words are rated by more number of customers.