Cincy Climate Control Analysis

Introduction

I am a co-owner of a small HVAC business in Cincinnati called Cincy Climate Control, which I never thought I would be doing. My business partner/best friend went to school for it, and we always wanted to have our own business. So slowly but surely, we gathered resources and built something that has been paying the bills. Understanding the industry has been a challenge on its own, but I’ve learned a few key points already.

One is that it is extremely outdated, including basic business functions and operations, all the way to the actual age of the people who work in the industry. I have never been in a supply house and not been the youngest person in the room by at least a decade. There is a lack of technology among businesses and supply houses other than the basics. Many miss opportunities with optimization on all fronts like simplifying different tasks. As well as missing out on many insights that can be generated through data analysis techniques.

Another thing that I’ve noted is that HVAC or trades in general are often overlooked. Even though they are services that every single house or building of any kind utilizes at one point or another. Socially, we have diminished working in the trades, and in return, the trades are dying. This leaves many gaps in the market and a pretty open playing field, other than major players. Yet, their demand isn’t changing, allowing for outrageous pricing from many bigger companies.

Knowing these few aspects of the industry, utilizing data analytics can go a long way if done right. In this project, I plan to look at my company’s current positives and negatives, breaking down different variables with the currently available data. I’m going to gather specific insights to answer questions and narrow down potential growth areas.

Scope of Analysis

This project explores multiple aspects of my HVAC business using real transaction data collected through Square. Rather than narrowing the analysis to a single question, I’ve evaluated several critical business areas through separate yet interconnected lines of inquiry. Each question provides actionable insights into operations, profitability, and growth opportunities. The areas of focus are:

Job Type Profitability

Which types of HVAC jobs generate the highest revenue and profit?
I will identify the most financially valuable services by grouping transactions by job type and comparing average net profit and revenue. Obviously, installs would generate more than repairs, but I want to focus on profit margin instead of the actual dollar amount.

Targeting High-Value Customers

Which regions drive the most profit, and how can that influence marketing strategies?
I want to establish where high-value customers are located using certain variables such as zip code and city. These insights will guide personalized marketing and geographic targeting to keep and expand customer outreach.

Pricing Consistency

Are similar jobs (same type/unit) priced consistently?
By analyzing the range and standard deviation of gross sales and net totals within job-type and unit-type combinations, I aim to expose pricing inconsistencies. Identifying and correcting these gaps will protect margins and ensure reasonable pricing across customer quotes.

Data

The data I will be using to perform this analysis is transaction data for transactions completed through Square. Square is a service that helps businesses perform transactions and lets you fill in various descriptive fields of each job as you send invoices. Each row of this data represents a transaction that my business Cincy Climate Control has performed. All data used is sourced from my Square account and can be loaded here from One Drive: https://myxavier-my.sharepoint.com/:x:/g/personal/scalesl_xavier_edu/EXIB7Bs7TRdAmoeR5vfdfKMByCNM8d3rgvbzC88WAX61gw?download=1

Below is a data dictionary describing each variable in the data. There is also a data cleaning section explaining what was removed and added to the data and why. There is also a summary statistics section on some important variables to help map out the basis of the analysis.

Data Dictionary

Variable.Name Description Data.Type
date Date of the transaction Text
gross_sales Total sales amount before fees and expenses Currency
expense Expense associated with the job (e.g. materials) Currency
fees Fees charged by the payment processor Currency
net_total Profit remaining after expenses and fees Currency
source Where the customer came from (e.g. Thumbtack) Text
card_brand Brand of the payment card used (if applicable) Text
job_type Type of HVAC service performed Text
unit_type Type of HVAC unit serviced or installed Text
unit_brand Brand of the HVAC unit Text
description Job description or extra notes Text
customer_id Customer ID (if available) Text
transaction_status Status of the transaction (e.g. completed, failed) Text
cash_app Amount paid through Cash App (if any) Currency
city City where the service was performed Text
state State where the service was performed Text
zip ZIP code of the service address Numeric

Data Cleaning

In this section I will be cleaning up the data set for usability. This includes removing unnecessary dollar signs. Changing date column to a new column to explicitly show the month in its own column. As well as calculating a profit margin column which, which will be useful for the analysis.

The following code removes dollar signs using str_replace_all and ensures that (gross_sale, expense, fees, net_total, and cash_app) columns are all numeric using as.numeric:

ccc_data <- 
  ccc_data %>%
    mutate(gross_sales = as.numeric(str_replace_all(gross_sales, "[$,()]", "")),
               expense = as.numeric(str_replace_all(expense, "[$,()]", "")),
                  fees = as.numeric(str_replace_all(fees, "[$,()]", "")),
             net_total = as.numeric(str_replace_all(net_total, "[$,()]", "")),
              cash_app = as.numeric(str_replace_all(cash_app, "[$,()]", "")))

This code shows the process of using lubridate functions to covert date column and make a new month column:

ccc_data <- 
  ccc_data %>%
     mutate(date = mdy(date),
           month = month(date, label = TRUE))

This last code chunk creates a new column for profit margins on each transaction via calculation using the gross_sales and net_total columns. This was performed using simple math, but an added if_else function is added to ensure 0s will not mess up the profit_margin column.

ccc_data <- 
  ccc_data %>%
    mutate(profit_margin = ifelse(gross_sales > 0, net_total /              gross_sales, NA)) 

Summary Statistics

This first summary statistic shows the average expenses per job type according to have expense data we have saved in Square:

Average Expenses per Job Type
job_type mean_expense
Add-on 57.120000
Install 1438.779091
Maintenance 7.615455
Repair 53.293125

The following summary statistic shows if there are any repeat customers (or customers that performed multiple transactions) within the data set. Customer_id is used to perform this task so that customer privacy is protected:

Customers with Multiple Jobs
customer_id n
0D7DZJQHJVMECFT03M2YBMC9EG 2
198PXTEFCHQ2W3GKSZEYNRVTRR 2
1XM0SR8B0XQ1E4EW726HP12Z8W 2
4RE2VKF641CR02BME22N0M62RC 2
4SR5R7S87RVN5H28JA41GF64H8 2
6J2R35953GA82FZ9NG8T1KACMG 2
93FY68KJ8VJPCTDK7W7ZCYR0X8 2
A6E70JRRD650QE59KRAZPYWQER 3
BT0R0R601ZT52ATWQ0BDMPP650 2
KT7RWZDXJ13Q71TJAE0SHFD9ZR 2
N031A6C98NA2FKF1XA08Z5SKWG 2
PY29RPZ81CVW6HS1230206Z13R 3
Q6MN2BNK4ZZQKMR62HVRK7CECC 2
R8FKV3SJ4N0MTQ3F9Q9ZZ3D754 2
S0R1VR1TXFSX88KB7P72KA84E8 2
SMQC9FFYPD4C9ANP1S6CW2TCP0 2
TCM28FNMDF1D0S1YRYW89QQYSW 3
TKBZ52XF59DRFWEWXVV56DSPHG 3
V1Z6KPTH6NXVZFPCB7W31ZG5HG 3
VZTTY1VJZNTKJ4D847YSDK2VE4 3

This summary statistic shows the amount of jobs per city. This specific table only includes cities that had at least 2 jobs in them. We can see at a quick glance that majority of jobs are in the city of Cincinnati:

Total Jobs by City
city n
Beltsville 2
Cincinnati 44
Covington 3
Dayton 2
Georgetown 2
Hamilton 2
Loveland 2
Newport 3
Trotwood 2
NA 6

I want to look at the total revenue made by each job type. This specific topic will get analyzed deeper in the analysis I just want to get a baseline understanding for right now:

Total Gross Sales by Job Type
job_type total_gross_sales
Add-on 378.26
Install 76594.89
Maintenance 1156.44
N/A 72.30
Repair 15377.72

I want this summary statistic to express the count between each job type. This will be helpful in our first analysis question:

# A tibble: 5 × 2
  job_type        n
  <chr>       <int>
1 Add-on          1
2 Install        22
3 Maintenance    11
4 N/A             2
5 Repair         48

For the last summary statistic I want to view how much revenue was generated with our top 5 highest paid jobs. Looking at the various aspects of the job and transaction:

Top 5 Highest Revenue Jobs
date job_type unit_type unit_brand gross_sales
2024-08-08 Install AC/Furnace Heil 8253.52
2024-04-29 Install AC/Furnace Heil 7973.39
2024-08-30 Install AC Heil 7518.85
2024-08-21 Install AC Heil 7295.96
2024-12-07 Install Furnace Heil 4534.12

Analysis

How profitable is each job type?

In this first part of the analysis, I want to be able to answer the question of how profitable each job type is, whether that be an install, repair, or maintenance. We find ourselves being super dependent on installs, and it’s actually the only type of job we pay ourselves on. I want to use this question to help us not be so dependetant on installs. Of course installs are going to bring in the largest amount of money but profit margins sometimes suck if our profit margins are better in a different category we can put more energy into another aspect our business. For example if profit margins seem to be better for repairs, 10 repairs could equal an install we can focus on getting more repairs to help profit margins. At the same time I want to factor in the amount of money as well and find an equilibrium to what is the best course of action to maximize profit.

This first visualization examines the 4 different job types, and I want to see the average profit margins per job type. I first removed the NA values from the job type variable and converted the margins to a percentage.

At a quick glance, it is surprising to see that installs have the lowest profit margins overall. This can be several factors like expenses not being properly reported for repairs but one of the reasons is because of how expensive the equipment is for installs. It should be noted that the price of equipment shouldn’t hurt our profit margins this much. Yet it does seem to work out like that. Maintenance has the highest profitability because there is essentially no cost for maintenance, as they just involve cleaning and yearly tune-ups.

Now that I see how the profit margins compare with job types I want to see how much profit each job type is bringing in on average.

This visualization tells the exact opposite of the last one. Installs bring in the most profit, on average, by $1,750. Installs may bring in the most money, but we depend on them. We don’t even pay ourselves unless its from an install. According to this data, on average, 8 repairs = 1 install. This is because the profit margins are so much better for repairs. Installs are anywhere for 1 to 2 full days, sometimes more. If we slightly increased the repair price, we could become less dependent on installs. Repairs only take 30 minutes to 2 hours and 30 minutes, so it depends. With a slight increase in the price of repairs, we can balance out what genuinely fuels the business’s profit.

How Does the Time of Year Impact Demand?

Next, I would like to look at the times of year with the most jobs and revenue. There may be no correlation, but I would love to investigate whether there is or not to see if there is any significance to the time of year. Specifically, I plan to summarize this part of the analysis by month. I assume that the summer months will be more profitable because people can usually find alternative sources to getting warm. Still, if it’s hot, the AC is usually the number one solution to the problem. Analyzing the months we get the most customers will allow us to prepare with marketing and resources for busier seasons.

To begin this part of the analysis, I want to start with a basic line chart that explicitly shows the months throughout the year and the total revenue made each month.

Looking at this first visual that just includes the totals you can see two areas where revenue goes up then you can see two areas where revenue goes down. This is expected we call them “vacation periods” but in the the spring and the fall where its “windows down” period we would go on vacation. That is just a joke to express how slow it is in those seasons. In the Sep-Oct months in the fall and the Apr-Jun months in the spring we would have the time to go on vacation because the it’s so nice people usually don’t run their systems they just instead have their windows open enjoying the nice weather.

Next I would like to see the revenue on average to have a better understanding of what each months are contributing instead of just the sum overall.

This visual expresses the same trends we saw in the previous one. You can see an easy bi-modal presentation of two periods of not getting many jobs and two sections with an increase in jobs compared to the others. This goes right with my assumptions of serious hot and cold weather really influencing revenue.

In this visual I wanted to highlight the number of jobs by each unit type in each month. There are many unit types in the data set. I want to make sure to remove some unit types that aren’t very specific, such as filters, pipes, thermometers, stoves, and the NAs. I would like to see which specific units to be sure to have stocked or even ready to buy by each month according to this data.

These results automatically show the same results we were receiving throughout all three visuals. The pink color represents AC’s. If looking at the months leading up to or in the hotter months, there are some AC calls. Interestingly, in the colder months, we may have had a spike in furnaces, but not as many as AC’s in the summer months. We have Furnace calls consistently throughout the year. We might need to market better during the colder months to spike our sales on furnaces so we can balance out the revenue distribution.

Which regions drive the most profit, and how can that influence marketing strategies?

This section will establish which areas we are generating the most money from. We can look at which zip codes or cities generate the most average profit. If we know where we are getting most of our business from, we can look to other areas to increase our marketing strategies and attract more business. We can also establish higher-value customers by looking at each ticket price. See which areas customers are willing to spend the most money in.

In this first visual, I want to look at the total profit in each zip code. As well as filter to only show the top 10 zip codes as there are a bunch.

These results show us that the zip codes 45214 and 45207 have generated the most profit. These are two prominent Cincinnati zip codes that are close, so this makes sense, as the majority of our money has come from them.

Next I would like to see on average which cities have the most profit. If the two zip codes that had the highest profit in them were around Cincinnati, we should see Cincinnati having a higher average.

These results actually tell a different story. Cincinnati is actually one of the lower cities when talking about profit on average. Maineville seems to have the highest. I would like to investigate this phenomena more by getting a count of how many jobs were in each city.

These results explain a lot of what’s happening. Every city seems to have 1-3 jobs, while Cincinnati has 44 jobs. So, on average, Cincinnati is much lower because, if I had to guess, most service calls come from there, which brings down the profit average. A city like Maineville only has one job, which is an installation, making it appear much more valuable than it might be.

Next, I would like to know the average ticket price (total revenue). This will help us locate cities that may be more willing to spend more money on nicer equipment, generating more profit. I also will be filtering to only show installs so the averages to get changed by repairs.

Overall, the majority of these cities are actually around the same amount. Maineville is still the highest, but only one job’s worth of data. What shocks me is how Cincinnati is not higher. These results may indicate that our pricing seems consistent throughout each city or job, no matter how much the equipment costs. The bigger players of Maineville, Dayton, Loveland, Middletown, and Loveland are spots where we aim to sell the newest and latest efficiency systems according to the data we have.

Are similar jobs (same type/unit) priced consistently?

As we saw in the previous section, there might be some pricing consistency. We pride ourselves on trying to offer the best price for the customer. If we have inconsistencies, that could be an unfair practice, and I would like to make those happen as little as possible. Every job is different; some require more expensive equipment, others require more labor. Either way, we want to stay fair to those getting the same job. Overall, we are trying to stay as consistent as possible.

In this first visual, I want to use a facet-wrapped box plot to show the total price distribution for each job type and the units used on each job. This will allow us to see a range of pricing inconsistencies if they are there.

These results show that AC installs and AC/Furnace installs generate the highest revenue. They also have the broadest range of variability in pricing. This tells us that the quotes for these types of installs are inconsistent. This can be an unfair pricing practice that we need to work on. On the repair side, mini-split repairs tend to be a bit uncommon. So when they occur, they are pretty hard to price as they can be for all different types. The majority of the pricing on the repair side of things seems to be pretty consistent, but the mini-split repairs can be an area to work on. Pricing inconsistencies can create problems by ruining customer trust. This kind of work is fueled by word of mouth and what past customers have to say. If prices stay inconsistent that could be bad for brand image. It could also cut into profit margins without consistent pricing we may also be selling ourselves short on bigger jobs.

I also am going to use another facet-wrap but this time with bar chart to express the standard deviation of gross sales. This will be grouped by unit type and job type like the previous visual. This is another way of expressing pricing inconsistency to see if they match our previous visual.

The look at the standard deviations of pricing confirms what we saw in the previous visual. AC/Furnace installs, with a standard deviation of over $3,000, show a lot of variability in our pricing, which needs to be tightened up. Single AC installs don’t fall far from behind, and furnace variability is not too bad; we should shoot for that much variability. Each job is vastly different, so it will never be perfect, but this definitely tells us work needs to be done in configuring our prices.

Which HVAC equipment types and brands appear most frequently?

In this section, I would love to look at the differences between different brands and unit types to see what has been selling the most. As a smaller HVAC company, we have no contracts where we are tied down to certain supply houses. Each supply house sells specific brands (usually 2-3), and they offer specific incentives, such as discounts and early access to new equipment, if you sign certain contracts with them. These are only offered to high-volume companies that would make it beneficial to the supply house. As a small company, we aren’t tied down to any certain supply house, which means we aren’t tied down to any brands. So we interact with a wide variety of offerings, almost anything the customer wants. I want to see which brands customers are most interested in and the type of units that we are selling the most.

I want to start off by showing the count of each unit type. Just out of cursiosity which units are we selling more of. This can give us insights to which units we are succeding at and which we can aim to increase.

These results confirm our assumptions from earlier in the analysis. People are more likely to fix their AC before their furnace, but at the same time, they are close behind. We can aim to improve our ductless mini-split sales as they do heating and cooling, and are new and popular in the United States. An interesting point in this visual is the number of thermostats. I didn’t think we did that many thermostats but they are one of the weak links as they are made by third-party companies not being extremely compatible with each system.

Next I would like to look at the amount of gross sales that were generated by each unit brand. I want to narrow this down to the top 10. This includes installs and repairs yet still show the brands that each customer owns and is interested in.

These results show us their is an overall demand for the brand HEIL which is the prominent brand for the supply house we use most the time. So if customers werent brand sensitive and it was left up to us it was usually a HEIL system. They are not to expensive but also not to cheap. They operate well as a middle of the road system. A lot of customers tend to be realtors or property owners that are looking for those types of systems.

Secondary Data: Web Scraped

I will scrape data from eBay on the web to complete the next part of the analysis. eBay sells many capacitors, one of the most common AC repair fixes. It is the shortest link in the system so it goes out the most. eBay provides an active marketplace for buying and selling these parts. I can gain insights to different information on capacitors from this data.

This portion of the analysis will focus more on the repair side of things. The summer heat is right around the corner, and capacitors will be in large demand. I plan to use this data to better prepare for pricing our own repairs by comparing the capacitor data with our own.

Capacitor Data

The data that was collected consists of capacitor information based off the eBay selling page. This includes variables such as price, size, and how many times it was sold. Every row is a different capacitor. Whether we buy from eBay or not this data provides valuable information to pricing strategies. We can look at the standard deviations on each capacitor to establish pricing variability and compare that with our own repair prices. This will allow us to optimally charge our customers to the best of our ability. All data was scraped from this location: https://www.ebay.com/sch/i.html?_nkw=dual+run+capacitors&_sacat=0&_from=R40&_ipg=120

Pricing Strategy

In this section, the goal is to configure the best price for replacing a capacitor. We looked at the pricing inconsistencies for our repair prices, and they seem pretty consistent. Our data doesn’t include whether or not the repair made was a capacitor so that we will focus on AC repairs. We also looked at the need to increase repair prices slightly to even out profitability and be less dependent on installs. Then I want to look at the capacitors’ pricing to find the best price to charge for replacing these parts. This part of the analysis will bring everything we know together utilizing the information we learned previously in this analysis.

AC Repairs

In this section I want to look more closesly at the current prices for our AC repairs. I will use a box plot to view the data showing the current ticket price for each repair. I am filitering to show repairs under $1,000 to remove outliers that would unimportant.

The middle of our repair prices seems to be just under $200, and then they become skewed to more expensive prices, which is expected. The bottom 50% seems to be nicely tight, which is valuable information going forward. So we now know that our bare minimum price for an AC repair is around $200.

Capacitor Price

Next, I want to look at the same type of graph for capacitor prices so I can establish how much I’m looking to spend on capacitors and configure the proper pricing on our service calls that involve capacitors. I want this to be a repair price that can fit all sizes and differences, so that we aren’t constantly changing the price just depending on the capacitor size or type. This will help us stay consistent and build a better brand reputation.

In this visual I want to express the distribution between units sold and price range. I am also going to group them together in price groups so the data looks more clean. This will allow us to see the highest quantity of capacitors sold at a certain price range.

These results give us good insights on the pricing situation. At a quick glance we can see that $45 price is selling the most capacitors which is really the information we needed from the graph. Looking at the two visuals in this section, we can infer that our service call price needs to be at least $200 if we added another $45-$50 to cover the cost of the capacitor, which would raise our service call prices to an overall price of around $250. This price will help us stay more consistent, increases profits, and not break the customers bank.

Conclusion

If we were to wrap everyhting we learned up. We walked through real HVAC data from my own personal company. I was able to look at various aspects on how my company is currently be ran. With this kind of self refelction you can really see some on the holes in the business and areas to improve on. To summarize key points again we need to boost marketing in the months leading up to hot and cold months. We need to fix certain pricing inconsistency. We need to market to other bigger cities near Cincinnati. As well as established a new pricing system for capacitor replacements using real life capacitor sales data.

Working on this project I was able to consider some other areas that collecting data on would be very valuable. Moving forward into this next year I want to create so great ways to collect more job and customer data and everything in between. This will help us optimize our businesses functionallites so we can be the best we can possibly be.