Analyzing Store Food Sales Using Python Programming Language

Uncovering Insights from Food Sales

Author

Affiliations

John Karuitha

Karatina University, School of Business

Graduate School of Business Administration, University of the Witwatersrand, Johannesburg, South Africa

Abstract

In this analysis, I analyse sales data in different cities and regions in the United States. The objective of this analysis is to illustrate basic data analysis using Python programming language. Python is the leading competitor for R, the other leading data science analysis programming language.

1 Background

In this analysis I analyze data from contextures regarding food sales to derive insights that could aid the management in making decisions. The data covers the sales of food products by different sales people in several cities from several regions. The food products are grouped into several categories. I do the analysis using the Pandas library in Python (Bogdanchikov, Zhaparov, and Suliyev 2013).

Tip

Please visit my rpubs site to see more data projects. Alternatively, copy and paste the link <www.rpubs.com/Karuitha> into your browser. You can also view my linkedin site for my skills and education.My Tableau public profile contains my data visualizations.

Important

Skills & Technologies Applied: Python, Quarto, Data Science, and Machine Learning.

Important

Python is a widely-used programming language known for its simplicity and readability. Created by Guido van Rossum, Python emphasizes code clarity and comes with an extensive standard library. It supports multiple programming paradigms and has a thriving ecosystem of third-party libraries and frameworks. Python is popular for web development, scientific computing, data analysis, and machine learning. Its clean syntax and active community make it an excellent choice for developers of all levels to build a variety of applications efficiently.

2 Objectives

The objective of this analysis is to illustrate basic data analysis using Python programming language. Python is the leading competitor for R, the other leading data science analysis programming language (VanderPlas 2016).

3 Data

```{python}
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```

The data consists of the following variables.

ID: An identifier for the transaction.
Date: Date of the sales transaction.
Region: Region where the sale was made.
City: City in which the sale was made.
Category: Category of the product, for example, bars, crackers, cookies, etc.
Product: The actual product sold, for example, carrot, arrow root.
Qty: The quantity of the product that was sold for a given transaction.
UnitPrice: The price per unit of the product sold duting the transaction.
TotalPrice: The total price of the product sold in the transaction, which is the product of quantity and unit price.

To analyze this data, I start by loading the required python packages.

```{python}
my_data = pd.read_excel("sampledatafoodsales.xlsx", sheet_name='FoodSales')
```

I set the ID column as the index.

```{python}
my_data = my_data.set_index("ID")
```

4 Exploring the Data

I start by displaying the first few rows of the data below.

```{python}
my_data.head()
```

	Date	Region	City	Category	Product	Qty	UnitPrice	TotalPrice
ID
ID07351	2022-01-01	East	Boston	Bars	Carrot	33	1.77	58.41
ID07352	2022-01-04	East	Boston	Crackers	Whole Wheat	87	3.49	303.63
ID07353	2022-01-07	West	Los Angeles	Cookies	Chocolate Chip	58	1.87	108.46
ID07354	2022-01-10	East	New York	Cookies	Chocolate Chip	82	1.87	153.34
ID07355	2022-01-13	East	Boston	Cookies	Arrowroot	38	2.18	82.84

We then create a pairs plot to visualize the entire data.

```{python}
# help(sns.pairplot)
sns.pairplot(my_data, hue="Region", palette="mako")
```

Next, I examine the total sales (Total Revenue) of each product by product category.

```{python}
my_data.groupby('Category').mean().sort_values('Qty')
```

	Qty	UnitPrice	TotalPrice
Category
Crackers	36.807692	3.490000	128.458846
Snacks	40.689655	2.012069	77.161034
Bars	62.138298	1.814681	112.080532
Cookies	78.568421	2.287684	181.183263

Next, I examine the sales by product.

```{python}
my_data.groupby('Product').mean().sort_values('Qty')
```

	Qty	UnitPrice	TotalPrice
Product
Banana	26.333333	2.27	59.776667
Pretzels	26.571429	3.15	83.700000
Whole Wheat	36.807692	3.49	128.458846
Potato Chips	45.181818	1.65	75.080455
Bran	58.333333	1.87	109.083333
Carrot	65.421875	1.77	115.796719
Chocolate Chip	74.090909	1.87	138.550000
Arrowroot	78.870968	2.18	171.938710
Oatmeal Raisin	83.032258	2.84	235.811613

Table () shows the sales of the products by region.

```{python}
my_data.groupby('Region').mean().sort_values('Qty')
```

	Qty	UnitPrice	TotalPrice
Region
West	61.553191	2.063085	125.544894
East	64.373333	2.287133	143.495733

Finally, I summarise the sales by city.

```{python}
my_data.groupby('City').mean().sort_values('Qty')
```

	Qty	UnitPrice	TotalPrice
City
San Diego	51.717949	2.099231	105.484615
Boston	64.204545	2.425909	150.744659
New York	64.612903	2.090161	133.206935
Los Angeles	68.527273	2.037455	139.769455

5 Research Questions

5.1 Question 1: Which Regions do Different Products Categories Sell the Most?

```{python}
region_sales = my_data.groupby(['Region', 'Category']).sum()[
    'TotalPrice'].sort_values

print(region_sales)
```

<bound method Series.sort_values of Region  Category
East    Bars         6355.20
        Cookies     10683.50
        Crackers     3025.83
        Snacks       1459.83
West    Bars         4180.37
        Cookies      6528.91
        Crackers      314.10
        Snacks        777.84
Name: TotalPrice, dtype: float64>

```{python}
pd.pivot_table(data=my_data, index="Region", columns='Category',
               values='TotalPrice', aggfunc='sum')
```

Category	Bars	Cookies	Crackers	Snacks
Region
East	6355.20	10683.50	3025.83	1459.83
West	4180.37	6528.91	314.10	777.84

5.2 Question 2: Which Regions do Different Products Sell the Most?

```{python}
region_sales = my_data.groupby('Region').sum()['TotalPrice']


region_sales = my_data.groupby('Region').sum()['TotalPrice'].plot(
    kind="bar", title="Sales by Region", color="steelblue")
```

5.3 Question 1: Which Cities do Different Products Categories Sell the Most?

```{python}
my_data.head()
my_data.groupby(['City', 'Category']).sum()
```

		Qty	UnitPrice	TotalPrice
City	Category
Boston	Bars	1827	53.53	3319.99
	Cookies	2706	73.53	6481.50
	Crackers	726	59.33	2533.74
	Snacks	391	27.09	930.30
Los Angeles	Bars	1611	43.18	2886.37
	Cookies	1753	50.14	4044.53
	Crackers	42	6.98	146.58
	Snacks	363	11.76	609.84
New York	Bars	1683	46.92	3035.21
	Cookies	1856	50.76	4202.00
	Crackers	141	17.45	492.09
	Snacks	326	14.46	529.53
San Diego	Bars	720	26.95	1294.00
	Cookies	1149	42.90	2484.38
	Crackers	48	6.98	167.52
	Snacks	100	5.04	168.00

5.4 Question 2: Which City Had the Highest Sales?

Here, we see that Boston had the highest sales.

```{python}
my_data.groupby("City").sum()["TotalPrice"]
```

City
Boston         13265.53
Los Angeles     7687.32
New York        8258.83
San Diego       4113.90
Name: TotalPrice, dtype: float64

Let us plot this on a graph.

```{python}
my_data.groupby("City").sum()["TotalPrice"].plot(
    kind="bar", title="Sales by City")
```

<AxesSubplot:title={'center':'Sales by City'}, xlabel='City'>

6 Conclusion

In this analysis, I examined the sales of products in regions and cities. Please feel free to comment.

References

Bogdanchikov, A, M Zhaparov, and R Suliyev. 2013. “Python to Learn Programming.” Journal of Physics: Conference Series 423 (April): 012027. https://doi.org/10.1088/1742-6596/423/1/012027.

VanderPlas, Jake. 2016. Python Data Science Handbook: Essential Tools for Working with Data. Sebastopol, CA: O’Reilly Media.