```{python}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```
Analyzing Store Food Sales Using Python Programming Language
Uncovering Insights from Food Sales
In this analysis, I analyse sales data in different cities and regions in the United States. The objective of this analysis is to illustrate basic data analysis using Python
programming language. Python is the leading competitor for R
, the other leading data science analysis programming language.
1 Background
In this analysis I analyze data from contextures regarding food sales to derive insights that could aid the management in making decisions. The data covers the sales of food products by different sales people in several cities from several regions. The food products are grouped into several categories. I do the analysis using the Pandas library in Python (Bogdanchikov, Zhaparov, and Suliyev 2013).
Please visit my rpubs site to see more data projects. Alternatively, copy and paste the link <www.rpubs.com/Karuitha> into your browser. You can also view my linkedin site for my skills and education.My Tableau public profile contains my data visualizations.
Skills & Technologies Applied: Python, Quarto, Data Science, and Machine Learning.
Python is a widely-used programming language known for its simplicity and readability. Created by Guido van Rossum, Python emphasizes code clarity and comes with an extensive standard library. It supports multiple programming paradigms and has a thriving ecosystem of third-party libraries and frameworks. Python is popular for web development, scientific computing, data analysis, and machine learning. Its clean syntax and active community make it an excellent choice for developers of all levels to build a variety of applications efficiently.
2 Objectives
The objective of this analysis is to illustrate basic data analysis using Python
programming language. Python is the leading competitor for R
, the other leading data science analysis programming language (VanderPlas 2016).
3 Data
The data consists of the following variables.
ID: An identifier for the transaction.
Date: Date of the sales transaction.
Region: Region where the sale was made.
City: City in which the sale was made.
Category: Category of the product, for example, bars, crackers, cookies, etc.
Product: The actual product sold, for example, carrot, arrow root.
Qty: The quantity of the product that was sold for a given transaction.
UnitPrice: The price per unit of the product sold duting the transaction.
TotalPrice: The total price of the product sold in the transaction, which is the product of quantity and unit price.
To analyze this data, I start by loading the required python packages.
```{python}
= pd.read_excel("sampledatafoodsales.xlsx", sheet_name='FoodSales')
my_data ```
I set the ID column as the index.
```{python}
= my_data.set_index("ID")
my_data ```
4 Exploring the Data
I start by displaying the first few rows of the data below.
```{python}
my_data.head()```
Date | Region | City | Category | Product | Qty | UnitPrice | TotalPrice | |
---|---|---|---|---|---|---|---|---|
ID | ||||||||
ID07351 | 2022-01-01 | East | Boston | Bars | Carrot | 33 | 1.77 | 58.41 |
ID07352 | 2022-01-04 | East | Boston | Crackers | Whole Wheat | 87 | 3.49 | 303.63 |
ID07353 | 2022-01-07 | West | Los Angeles | Cookies | Chocolate Chip | 58 | 1.87 | 108.46 |
ID07354 | 2022-01-10 | East | New York | Cookies | Chocolate Chip | 82 | 1.87 | 153.34 |
ID07355 | 2022-01-13 | East | Boston | Cookies | Arrowroot | 38 | 2.18 | 82.84 |
We then create a pairs plot to visualize the entire data.
```{python}
# help(sns.pairplot)
="Region", palette="mako")
sns.pairplot(my_data, hue```
Next, I examine the total sales (Total Revenue) of each product by product category.
```{python}
'Category').mean().sort_values('Qty')
my_data.groupby(```
Qty | UnitPrice | TotalPrice | |
---|---|---|---|
Category | |||
Crackers | 36.807692 | 3.490000 | 128.458846 |
Snacks | 40.689655 | 2.012069 | 77.161034 |
Bars | 62.138298 | 1.814681 | 112.080532 |
Cookies | 78.568421 | 2.287684 | 181.183263 |
Next, I examine the sales by product.
```{python}
'Product').mean().sort_values('Qty')
my_data.groupby(```
Qty | UnitPrice | TotalPrice | |
---|---|---|---|
Product | |||
Banana | 26.333333 | 2.27 | 59.776667 |
Pretzels | 26.571429 | 3.15 | 83.700000 |
Whole Wheat | 36.807692 | 3.49 | 128.458846 |
Potato Chips | 45.181818 | 1.65 | 75.080455 |
Bran | 58.333333 | 1.87 | 109.083333 |
Carrot | 65.421875 | 1.77 | 115.796719 |
Chocolate Chip | 74.090909 | 1.87 | 138.550000 |
Arrowroot | 78.870968 | 2.18 | 171.938710 |
Oatmeal Raisin | 83.032258 | 2.84 | 235.811613 |
Table () shows the sales of the products by region.
```{python}
'Region').mean().sort_values('Qty')
my_data.groupby(```
Qty | UnitPrice | TotalPrice | |
---|---|---|---|
Region | |||
West | 61.553191 | 2.063085 | 125.544894 |
East | 64.373333 | 2.287133 | 143.495733 |
Finally, I summarise the sales by city.
```{python}
'City').mean().sort_values('Qty')
my_data.groupby(```
Qty | UnitPrice | TotalPrice | |
---|---|---|---|
City | |||
San Diego | 51.717949 | 2.099231 | 105.484615 |
Boston | 64.204545 | 2.425909 | 150.744659 |
New York | 64.612903 | 2.090161 | 133.206935 |
Los Angeles | 68.527273 | 2.037455 | 139.769455 |
5 Research Questions
5.1 Question 1: Which Regions do Different Products Categories Sell the Most?
```{python}
= my_data.groupby(['Region', 'Category']).sum()[
region_sales 'TotalPrice'].sort_values
print(region_sales)
```
<bound method Series.sort_values of Region Category
East Bars 6355.20
Cookies 10683.50
Crackers 3025.83
Snacks 1459.83
West Bars 4180.37
Cookies 6528.91
Crackers 314.10
Snacks 777.84
Name: TotalPrice, dtype: float64>
```{python}
=my_data, index="Region", columns='Category',
pd.pivot_table(data='TotalPrice', aggfunc='sum')
values```
Category | Bars | Cookies | Crackers | Snacks |
---|---|---|---|---|
Region | ||||
East | 6355.20 | 10683.50 | 3025.83 | 1459.83 |
West | 4180.37 | 6528.91 | 314.10 | 777.84 |
5.2 Question 2: Which Regions do Different Products Sell the Most?
```{python}
= my_data.groupby('Region').sum()['TotalPrice']
region_sales
= my_data.groupby('Region').sum()['TotalPrice'].plot(
region_sales ="bar", title="Sales by Region", color="steelblue")
kind```
5.3 Question 1: Which Cities do Different Products Categories Sell the Most?
```{python}
my_data.head()'City', 'Category']).sum()
my_data.groupby([```
Qty | UnitPrice | TotalPrice | ||
---|---|---|---|---|
City | Category | |||
Boston | Bars | 1827 | 53.53 | 3319.99 |
Cookies | 2706 | 73.53 | 6481.50 | |
Crackers | 726 | 59.33 | 2533.74 | |
Snacks | 391 | 27.09 | 930.30 | |
Los Angeles | Bars | 1611 | 43.18 | 2886.37 |
Cookies | 1753 | 50.14 | 4044.53 | |
Crackers | 42 | 6.98 | 146.58 | |
Snacks | 363 | 11.76 | 609.84 | |
New York | Bars | 1683 | 46.92 | 3035.21 |
Cookies | 1856 | 50.76 | 4202.00 | |
Crackers | 141 | 17.45 | 492.09 | |
Snacks | 326 | 14.46 | 529.53 | |
San Diego | Bars | 720 | 26.95 | 1294.00 |
Cookies | 1149 | 42.90 | 2484.38 | |
Crackers | 48 | 6.98 | 167.52 | |
Snacks | 100 | 5.04 | 168.00 |
5.4 Question 2: Which City Had the Highest Sales?
Here, we see that Boston had the highest sales.
```{python}
"City").sum()["TotalPrice"]
my_data.groupby(```
City
Boston 13265.53
Los Angeles 7687.32
New York 8258.83
San Diego 4113.90
Name: TotalPrice, dtype: float64
Let us plot this on a graph.
```{python}
"City").sum()["TotalPrice"].plot(
my_data.groupby(="bar", title="Sales by City")
kind```
<AxesSubplot:title={'center':'Sales by City'}, xlabel='City'>
6 Conclusion
In this analysis, I examined the sales of products in regions and cities. Please feel free to comment.