Workshop 1 - Data Structure for Accounting
1 Introduction to Business Analytics
There is not a formal definition for Data Science. However, it can be seen as the intersection between statistics, mathematics, computer science and any knowledge domain that demands analysis of big volumes of data such as Banking, Finance, Economics, Marketing, Demographics, Genetics, Astronomy, etc. If the domain is a business discipline, then we can call it Business Analytics, or just Data Science for Business.
Data Science includes not only the data management - data collection, data merging and data preparation- but also data analysis tailored to understand a problem in a specific domain, and then be able to do predictions, and finally improve decision making in organizations. For data analysis we can use techniques based on a) statistical modeling, and also b) machine learning. Actually, machine learning is a combination of statistical, mathematical and artificial intelligence methods.
Focusing on the business domain, we can classify business analytics in the following:
- Descriptive Analytics
- Diagnostic Analytics
- Predictive Analytics
- Prescriptive Analytics
Descriptive analytics refers to analysis of historical business/economic data to better understand a current business situation. Diagnostic analytics refers to the ability of finding business insights - mainly business opportunities and threats- after analyzing present and past business data. Predictive analytics refers to the ability of creating possible future scenarios based on historical data and assumptions related to organizational variables. Prescriptive analytics refers to the ability of identifying/proposing a specific strategic business plan according to the previous descriptive, diagnostic and predictive analysis.
For descriptive analytics the main techniques are data management, statistical analysis and data visualization. For predictive analytics the main techniques are statistical modeling and machine learning.
Machine learning is a very dynamic and changing field. Machine learning refers to automatic algorithms that can learn from facts (data), identify patterns, and then provide a solution to a problem or come up with forecast of variables.
Machine learning is a combination of artificial intelligence techniques along with statistical and mathematical methods. The main purpose of machine learning is to process data and come up with models to a) understand patterns, and b) predict variables. Then, machine learning is a set of algorithms that receive data, select a model, train the model with the data in order to calibrate the model, and then execute the model and come up with insights and/or predictions.
The machine learning techniques we will focus in this course will be mostly statistic techniques applied to Marketing and Finance.
Successful data-driven companies apply Data Science to keep competitive in the market. Those organizations combine a variety of internal and external data sources in an analytics engine, translate the data into quantifiable and actionable insights to make effective decisions for the organization.
2 Review of basic business concepts
Check my eBook Foundations of Business Analytics at https://www.apradie.com/StatsBAbook/. Read Chapter 1 and Chapter 2.
2.1 Understanding Financial Statements
In Chapter 2 - Basics of Financial Statements of my eBook, I explain in detail the key elements of the most essential financial statements, their interrelationships, and the primary financial insights they offer. The most relevant statements include:
- Balance Sheet: Provides a snapshot of a firm’s financial position (its assets, liabilities and equity) at a specific point in time.
- Income Statement: Reports a firm’s financial performance over a period, including revenues, expenses, and net income.
- Statement of Cash Flows: Tracks cash inflows and outflows from operating, investing, and financing activities.
- Statement of Shareholders’ Equity: Shows changes in equity accounts over time.
Understanding these financial statements is essential for business analytics and decision-making.
Let’s review the main elements of the income statement:
\[S_t-COGS_t-SGAE_t-DEP_t =EBIT_t\]
Where:
\(S_t\)= Sales during the period t (from t-1 to t)
\(COGS_t\)= Cost of Good Sold during period t
\(SGAE_t\)= Sales and General Administrative Expenses during period t
\(DEP_t\)= Depreciation during period t
\(EBIT_t\)= Earnings Before Interest and Taxes
From EBIT we estimate the final Net Income, which is the final economic wealth generated by the firm after all expenses:
\[NI_t = EBIT_t-IE_t-T_t\]
Where:
\(NI_t\) = Net income during period t
\(IE_t\) = Interest Expenses during period t (also called financial expenses)
\(T_t\) = Income Tax during period t
In some cases, depending on the industry and country, other income/expenses are considered after EBIT. For example, in the US, the income/expense that is not related to business operations are usually added after EBIT to calculate net income. In the excercise below we will add 2 variables that represent income or expenses that are not related to daily operations: otherincome and extraordinaryitems.
2.2 Financial ratios and their importance
Financial ratios are crucial for evaluating a company’s financial health and performance. They allow analysts to compare companies across industries and track performance over time. Key ratio categories include:
- Liquidity Ratios: Assess a company’s ability to meet short-term obligations.
- Profitability Ratios: Measure a company’s ability to generate profits from its operations.
- Leverage Ratios: Evaluate the extent of a company’s financial leverage.
- Efficiency Ratios: Indicate how effectively a company utilizes its assets.
A financial ratio is an indicator that usually measures something related to business performance. A financial ratio is the result of dividing 2 numbers (values of financial statement variables) to get a proportion.
In this workshop I will review a few financial ratios that we use in the exercise.
Return on Assets (ROA)
As its name suggest, ROA at period t measures how much net income a firm generates during the period t for $1.00 invested in total assets at the beginning of the period t (at t-1):
\[ROA_t=\frac{NI_t}{TA_{t-1}}\]
Where
\(NI_t\) = Net Income generated during the period t
\(TA_{t-1}\)= Total Assets at the beginnig of period t (or end of period t-1)
DuPont decomposition of ROA
In the decade of 1920, Donaldson Brown, executive at DuPont company, came up with the idea of decomposing important financial ratios such as ROA and ROE (Return on Equity) when DuPont was acquiring a stake in General Motors. That was the beginning of the DuPont analysis.
Let’s review the decomposition of ROA according to DuPont analysis:
If we do the following multiplication:
\[ROA_t=\left(\frac{NI_t}{TA_{t-1}}\right)*\left(\frac{S_t}{S_t}\right)\]
We do not modify the result since Sales divided by itself is one.
We can re-express ROA as:
\[ROA_t=\left(\frac{NI_t}{S_t}\right)*\left(\frac{S_t}{TA_{t-1}}\right)\]
These factors are:
\[ProfitMargin=\frac{NI_t}{S_t}\]
\[AssetTurnOver=\frac{S_t}{TA{t-1}}\]
Then, ROA can be expressed as the multiplication of Profit Margin and Asset Turn Over.
Profit margin tells how much net income the firm gets for each $1.00 of sale. Asset turn over indicates how many times is total sales with respect to total assets. For example, if ATO=2 this means that the company sold the equivalent to 2 times its asset value.
This decomposition helps us understand whether profitability (Profit Margin) or efficiency (Asset Turn Over) is driving ROA!
Then, how does a firm can increase its ROA?
Improve profit margin, or
Improve asset turn over, or
Improve both!
2.3 Interpretation and Business Strategy
By analyzing DuPont ratios, we can infer strategic insights about firms, such as:
- High profit margins indicating differentiation strategy
- High asset turnover suggesting low-cost strategy
But, how can we judge whether 10% or 30% of profit margin is high, medium, or low? or whether an ATO greater than 1 is good or not. We need to compare these ratios of a firm against the corresponding average ratios of the firm’s industry.
In the exercise below we will calculate financial ratios for many real US firms, calculate the average for each industry so that we can have a good starting point to judge some values for financial ratios.
2.4 Review of percentages in business
The calculation and use of percentages is very common in our dainly live and the business world. Although most people have a good intuition about percentages and how they are calculated, it is easy to make mistakes when calculating business percentage measures.
There are two types of percentage measures we have to pay attention when calculating them:
Average of Percentages or Ratios
Percentage Growth measures
2.4.1 Average of Percentages / Ratios
Some people can easily make mistakes when calculating the average of percentages or ratios.
Here is an example:
Imagine that a firm only sells 2 products: A and B. The following table shows the total sales, cost and operating gross profit margin for these 2 products:
| Product | Units Sold | Unit Price | Sales | Unit Cost (COGS) | Cost | Gross profit | Gross profit margin |
|---|---|---|---|---|---|---|---|
| A | 10,000 | $1.00 | $10,000 | $0.40 | $4,000 | $6,000 | 60% |
| B | 1000 | $90 | $90,000 | $81.00 | $81,000 | $9,000 | 10% |
| Average= | ? |
If you report the gross profit margin for this company at the end of the year, how would you calculate it?
We can identify 3 methods:
Calculate the arithmetic average of gross profit margin of each product. In this case, the arithmetic average would be ( (60%+10%) / 2) = 35%
Calculate a weighted average of gross profit margin, assigning weights according to product value sales. In this case, product A sold only 10% of total sales, while product B sold 90%. Then, the weighted average would be: 0.10 (60%) + 0.90 (10%) = 15%
Calculate the weighted average as the total gross profit divided by total sales: ($15,000 / $100,000) = 15%
The last two methods end up with the same result. Which is the correct average gross profit margin for the firm?
Rule of thumb: When calculating average of percentages or ratios, DO NOT CALCULATE ARITHMETIC AVERAGE! You have to calculate WEIGHTED AVERAGE, and the easiest way is to SUM all values of the numerator divided by the SUM of all values of the denominator.
2.4.2 Percentage growth measures
When calculating the average percentage growth per period using historical percentage growth rates, it is easy to make the mistake of calculating the arithmetic average of these historical percentage growth rates.
Let’s do an example.
Imagine you have only 2 periods and you want to calculate the average return of an investment per period:
| Period | Investment value (at the end of the period) | Simple period Return (R) |
|---|---|---|
| 0 (today) | $100 | NA |
| 1 | $50 | -0.50 |
| 2 | $75 | +0.50 |
The simple period return is a percentage growth rate of the investment in the period.
Calculating the average simple return of this investment:
\[\bar{R}=\frac{-0.5+0.5}{2}=0%\]
Then, the arithmetic (simple) average return is 0%, while I end up with $75, losing 25% of my initial investment ($100) over the first 2 periods. If I lost 25% of my initial investment over 2 periods, then the average return per period might be a midpoint between 0 and 25%. The accurate average return of an investment over time (multi-periods) is the “Geometric Mean” return.
The total return of the investment in the whole period -also called the holding-period return (HPR)- can be calculated as:
\[HPR=\left(1+R_{1}\right)\left(1+R_{2}\right)...\left(1+R_{N}\right)-1\]
Using the example, the HPR for this investment is:
\[ HPR=\left(1-0.50\right)\left(1+0.50\right)-1=0.75 - 1 = -0.25 \]
And the formula for the geometric average of returns will be:
\[\bar{R_{g}}=\sqrt[N]{\left(1+R_{1}\right)\left(1+R_{2}\right)...\left(1+R_{N}\right)}-1\]
Calculating the geometric average return for this investment:
\[\bar{R_{g}}=\sqrt[2]{\left(1-0.5\right)\left(1+0.5\right)}-1= -0.13397\]
Then, the right average return per year is about -13.4% and the HPR for the 2 years is -25%.
However, if we use continuously compounded returns (r) instead of arithmetic/simple returns (R), then the arithmetic mean of r is an accurate measure that can be converted to simple returns to get the geometric mean, which is the accurate mean return. Let’s do the same example using continuously compounded returns:
| Period | Investment value (at the end) | Continuously compounded return (r) |
|---|---|---|
| 0 | $100 | NA |
| 1 | $50 | =log(50)-log(100)=-0.6931 |
| 2 | $75 | =log(75)-log(50)=+0.4054 |
If we sum the 2 cc returns (r) we get the correct holding-return for the 2 periods (but in continuously compounded):
\[hpr = r_1+r_2 = -0.6931 + 0.4054 = -0.2877 \]
Finally, we convert from continuously compounded to simple returns by applying the exponential function and subtracting 1:
\[HPR = e^{-0.2877}-1=-0.25 = -25\%\]
As illustrated, the continuously compounded returns are additive, where the simple historical returns are not additive - they are multiplicative.
We can get the average cc return with simple arithmetic mean:
\[\bar{r}=\frac{r_1+r_2}{2}=-0.14385\]
We convert from cc return to simple return by appying the exponential function minus 1:
\[\bar{R}=e^{-0.14385}-1=-0.1339=-13.39\%\]
This is actually the same result we got with the geometric mean of returns!
It is very recommended to calculate continuously compounded returns (cc returns) and using cc returns instead of simple returns for data analysis, statistics and econometric models.
One way to calculate cc returns is by subtracting the natural log of the current investment value (at t) minus the natural log of the previous investment value (at t-1):
\[r_{t}=log(InvestmentValue_{t})-log(InvestmentValue_{t-1})\]
This is also called as the difference of the log value, which is a percentage measure of how much the investment changed from period t-1 to t.
We can also calculate cc returns as the log of the current investment value (at t) divided by the previous investment value (at t-1):
\[r_{t}=log\left(\frac{InvestmentValue_{t}}{InvestmentValue_{t-1}}\right)\]
cc returns are usually represented by small r, while simple returns are represented by CAPITAL R.
But why do we use natural logarithm to calculate cc returns? First we need to remember what is a natural logarithm.
3 CHALLENGE - From raw business data to business insights
The objective of this challenge is to start learning basic business analytic skills to make simple BI applications that transform data into meaningful information for effective executive decision making. You will also practice with computing important profitability ratios using raw transactional sales data of a company.
We will learn how to compute basic elements of the income statement using raw transactional sales data. We will introduce the generation of Financial Dashboards using Pivot Tables. Using transactional sales data we will learn how to summarize data to get elements of the income statement.
Let’s work on a practical case to understand these concepts and start learning important data analytic skills.
You will use Excel to create a simple Dashboard with simple KPI (Key Performance Indicators) of a Multinational Company.
You have to do a set of “Dashboards” of KPI’s. These type of Dashboards are classified as “Online Analytical Processing” (OLAP) systems since the end user (in this case, a manager or executive) can actually analyze transactional data online at any time and re-construct KPI with updated data. Here you have to do a set of KPI’s for a Multinational Company.
Each Dashboard usually has at least a dynamic graph and one or more tables along with an interactive section where you can do different slices by categories or dimensions. The interactive section is called in Excel: “Slicer”. For example, a graph can be Sales per Country, and the interaction can be a Slicer that will display the different Product Categories and also the actual Products of the Company. Then, an end user (a manager or executive) will be able to select a specific type of product or a set of products to see the behavior of Sales in the different countries of the company. The actual graph and the slicer are representing a specific Key Performancer Indicator (KPI).
Each Dashboard can have one or more KPI. A Business Intelligent Application usually has a set of Dashboards. In the case of this workshop, you will do a few Dashboards, and you will do each Dashboard in one Spreadsheet of an Excel Workbook.
You have to download an Excel file that contains transactional sales data of a multinational company: W1-Transactional data.xlsx” from Canvas/Modules. Download the Excel file and then find the Sheet where you have to TYPE YOUR MATRICULA TO GENERATE THE SALES DATA. Each student will have different Sales dataset.
This file is actually the result of a “Query” designed with a Database Management System using a database of the Enterprise System of the Multinational company. A “Query” is like a specific report that extract and summarize data from a raw database. As you will see, this data set contains monthly data related to sales and costs (Cost of Good Sold or COGS and Marketing expenses) at the product level of one specific year (considering the levels of product category and country).
You have to do the following with this Excel Dataset:
- Using the Sales Data Sheet, Calculate the following financial items using new Columns:
Net Sales = Sales - Discount
Cost of Good Sold (COGS) = Variable COGS + Fixes COGS
Gross Profit = Net Sales - COGS
SGAE (Sales and General Administrative Expenses) = TV adds + Raqdio Adds + MediaProduction + Promotion expenses
Earnings Before Interest and Taxes (EBIT)
Exchange Rate = Corresponding Exchange rate according to the Country (Bring the exchange rate from the “Exchange Rate and Tax Income” Sheet using the function Vlookup)
Tax Rate = Corresponding Tax Rate according to the Country (Bring the tax rate from the “Exchange Rate and Tax Income” Sheet using the function Vlookup)
Net Income (NI)
Profit Margin (PM)
Net Sales in USD
Gross Profit in USD
EBIT in USD
Net Income (NI) in USD
Gross Profit Margin in USD
- Create a Sales Dashboard using a Pivot Table (in USD) with following KPI:
- Net Sales, Gross Profit, Earnings before interest and taxes (EBIT), and Average Gross Profit Margin, per countryRemember that Gross Profit is equal to Sales minus Cost of good sold, and Gross Profit margin is equal to Gross Profit divided by Sales. Be careful to calculate Average Gross Profit Margin per country! You have to use a “Calculated Field” for this. It’s NOT correct to calculate arithmetic averages for ratios or percentages!You have to add a Slicer with 2 components: Divisions and Products. FROM YOUR PIVOT TABLE WITHOUT APPLYING A FILTER/SLICER, COPY AND PASTE WITH VALUES the corresponding columns in the Sheet “2. PERFORMANCE BY COUNTRY”. IF you do not do this, you will not receive full credits for this part.
- Create a Business cycle Dashboard (using another Pivot Table): this dashboard has to have the following KPI:
Net Sales, Gross Profit, EBIT and Net Income (in USD) per monthWith a Slicer with 3 components: Countries, Divisions, and Products
Create a graph that illustrates the business cycle showing at Net Sales, Gross Profit and EBIT per month. FROM YOUR PIVOT TABLE WITHOUT APPLYING A FILTER/SLICER, COPY AND PASTE WITH VALUES the corresponding columns in the Sheet “3. PERFORMANCE BY MONTH”. IF you do not do this, you will not receive full credits for this part.
- Assume that you are the Chief Operation Officer of the Multinational Company. Play with your Dashboards, to identify one or two areas where you believe can significantly improve sales and/or gross profit margin for next year. You can focus on a specific country or set of countries, and/or products. Define ONE sales objective for next year with one specific strategy that attend this objective (remember that your objective must be “SMART”). Make your own assumption(s) and provide a reasonable argument to formulate your strategy. Use the “SALES STRATEGY” Sheet for your objective and corresponding strategy (no more than one page/sheet long is expected).
You have to submit your Excel file through Canvas