Workshop 3 - Introduction to Data Modeling
1 Introduction to Data Base Modeling
1.1 The Star Database Model
When organizations need to analyze large volumes of historical data — such as financial statements for thousands of firms over many years — they often organize the data using a Star Database Model (also called a Star Schema). This is the foundational model of data warehouses and Business Intelligence systems.
The name “star” comes from its shape: a central Fact Table is surrounded by several Dimension Tables, forming a star-like diagram.
+─────────────────+
| DIM_TIME |
|─────────────────|
| YearID (PK) |
| Year |
| Quarter |
+────────+────────+
|
+─────────────+ | +──────────────────+
| DIM_FIRM | | | DIM_INDUSTRY |
|─────────────| | |──────────────────|
| FirmID (PK) | | | IndustryID (PK) |
| FirmName | | | IndustryName |
| Ticker +─────+─────+ Sector |
| Country | | +──────────────────+
| Exchange | |
+──────+──────+ |
| |
| +─────────────────────────────+
| | FACT_FINANCIALS |
+───|─────────────────────────────|
| FirmID (FK - DIM_FIRM) |
| YearID (FK - DIM_TIME) |
| IndustryID (FK - DIM_IND.) |
| Revenue |
| COGS |
| SGAE |
| Depreciation |
| TotalAssets |
| TotalLiabilities |
| ShareholderEquity |
+─────────────────────────────+
1.1.1 Fact Tables
A Fact Table is the core of the star schema. It stores the quantitative measurements (facts) of business events or transactions. Each row in a Fact Table represents one observation — for example, the financial results of one firm for one fiscal year.
Characteristics of a Fact Table:
- Contains numeric measures (revenue, costs, assets, liabilities, etc.).
- Contains foreign keys that link to each Dimension Table.
- Is typically in long format — one row per unique combination of dimensions.
- Tends to be large (millions of rows in enterprise systems).
Example — FACT_FINANCIALS (excerpt, values in thousands USD):
| FirmID | YearID | IndustryID | Revenue | COGS | TotalAssets |
|---|---|---|---|---|---|
| AAPL | 2023 | TECH | 383,285 | 223,546 | 352,583 |
| MSFT | 2023 | TECH | 211,915 | 65,863 | 411,976 |
| TSLA | 2023 | AUTO | 96,773 | 79,113 | 93,941 |
| AAPL | 2022 | TECH | 394,328 | 223,546 | 352,755 |
| MSFT | 2022 | TECH | 198,270 | 62,650 | 364,840 |
1.1.2 Dimension Tables
Dimension Tables store the descriptive attributes that give context to the facts. They answer the questions: Who? What? When? Where? They are usually much smaller than the Fact Table and contain a Primary Key (PK) that is referenced by the Fact Table as a Foreign Key (FK).
DIM_FIRM — Firm dimension (catalog of companies):
| FirmID | FirmName | Ticker | Country | Exchange | Status |
|---|---|---|---|---|---|
| AAPL | Apple Inc. | AAPL | USA | NASDAQ | Active |
| MSFT | Microsoft | MSFT | USA | NASDAQ | Active |
| TSLA | Tesla Inc. | TSLA | USA | NASDAQ | Active |
DIM_TIME — Time dimension:
| YearID | Year | Decade | IsCrisisYear |
|---|---|---|---|
| 2020 | 2020 | 2020s | Yes |
| 2021 | 2021 | 2020s | No |
| 2022 | 2022 | 2020s | No |
| 2023 | 2023 | 2020s | No |
DIM_INDUSTRY — Industry dimension:
| IndustryID | IndustryName | Sector |
|---|---|---|
| TECH | Technology | Technology |
| AUTO | Automotive | Consumer |
| BANK | Commercial Banking | Finance |
| HLTH | Healthcare | Healthcare |
1.1.3 How the Star Model Works Together
The power of the Star Schema comes from joining the Fact Table with the Dimension Tables using the keys. For example, to answer “What was the average Revenue by Industry and Year for active US firms?”, a query would:
- Start with FACT_FINANCIALS (to get Revenue values).
- Join with DIM_FIRM (to filter by Country = USA and Status = Active).
- Join with DIM_INDUSTRY (to group by IndustryName).
- Join with DIM_TIME (to group by Year).
In Excel, this is essentially what you do when you use VLOOKUP / BUSCARV to bring firm names and industry labels from a catalog sheet into your data sheet — and then use a Pivot Table with Slicers to summarize the results. The findata sheet acts as your Fact Table and the usfirm sheet acts as a Dimension Table!
1.1.4 Summary: Key Concepts
| Concept | Description | Analogy in Excel |
|---|---|---|
| Wide Format | One row per entity; periods as columns | Traditional accounting table |
| Long Format | One row per entity-period; periods as a column value | The findata dataset |
| Fact Table | Stores numeric measures; one row per observation | findata sheet |
| Dimension Table | Stores descriptive attributes; one row per entity | usfirm sheet |
| Foreign Key | A column in Fact Table that links to a Dimension | The ticker used in VLOOKUP |
| Star Schema | Fact Table surrounded by Dimension Tables | Data model behind Pivot Tables |
Remember: The dataset for this workshop was built following an ETL (Extraction, Transformation, Loading) process and is already structured as a long-format Fact Table, ready to be analyzed with Pivot Tables and Slicers — exactly the Star Schema approach in practice.
2 CHALLENGE 1: Introduction to PowerBI
Download and install in your computer Microsoft Power Bi from [https://powerbi.microsoft.com/es-es/desktop/||the company’s website]. Remember that this tools is available only for Windows platforms.
You have to import the Sheets of W2template.xlsx into PowerBI and replicate the 2 Summary tables.
In this Workshop, we only have 1 Dimension Table and 1 Fact Table. You can easily identify them and PowerBI will automatically connect them when you import the tables. You have to make sure tha the automatic relationship is correct.
Within PowerBI you have to create
Calculated Columns: these are like extra columns of the fact table to calculate variables such as EBIT
Measures: these are formulas similar to Calculated Fields in Excel, and they are designed to do summaries by any dimension
You have to use DAX to write the formulas for either Calculated Columns or Measures. DAX stands for Data Analytics Language and is one of the languages of PowerBI to manipulate data in a database model.
Save your PowerBI and sumbit it through Canvas
3 The REA Model : A mapping for business processes and database model
The REA model, developed by accounting professor William McCarthy in 1982, offers a principled way to describe any business process in terms of its fundamental components. The acronym stands for three types of entities: Resources, Events, and Agents. Every economic exchange a firm carries out — selling goods, receiving payments, purchasing inventory — can be described using these three building blocks and the relationships that connect them.
3.0.0.1 The Three Entity Types
Resources are the things of economic value that the firm controls, and that get created, consumed, transferred, or exchanged through business activity. In a merchandising firm, Inventory (goods held for sale) and Cash are the two classic examples. Resources matter because changes in them signal whether the firm is gaining or losing value.
Events are the business transactions — the things that happen — that cause resources to increase or decrease. A Sale event causes inventory to decrease (goods leave the firm). A Cash Receipt event causes cash to increase (money enters the firm). Events occupy the center of the model: they are the mechanism through which value changes hands, and they connect resources and agents together.
Agents are the people or organizations that participate in events. Every economic event involves at least two agents: one internal (a firm employee, the salesperson, the cashier) and one external (a customer, a supplier, a bank). Tracking agents answers the fundamental questions who did this? and with whom did we transact?
3.0.0.2 The Three Relationships
The REA model defines three types of relationships:
Stockflow links an Event to a Resource and indicates whether that event increases (inflow) or decreases (outflow) the resource. A Sale is linked by stockflow to Inventory as an outflow (goods leave the firm). A Cash Receipt is linked by stockflow to Cash as an inflow (money enters the firm).
Duality links two Events that form the two sides of an economic exchange: you give something and you receive something in return. In the Revenue cycle, Sale (give goods) and Cash Receipt (receive payment) are duality partners — they represent two inseparable aspects of the same transaction.
Participation links an Agent to an Event and records who was involved. A Customer participates in the Sale event (they receive the goods) and also in the Cash Receipt event (they make the payment). An Employee participates in both as the internal counterpart who fulfills the sale and processes the payment.
3.0.0.3 The Revenue Cycle: A Complete Example
The Revenue cycle describes the core process by which a firm generates income: it sells goods or services and then collects payment. The diagram below maps this cycle using the REA model.
Notice that Events (Sale and Cash Receipt) occupy the center column — they are the hub connecting Resources to Agents. Also observe that both agents (Customer and Employee) participate in both events, shown by the L-shaped dashed lines. This is a general principle: in REA, each agent typically appears in multiple events throughout the business process.
3.0.0.4 From REA to Star Schema
If you look carefully, you will recognize the shape of the Star Schema you already know. The two Events — Sale and Cash Receipt — are natural Fact Tables: they record measurable economic transactions. The Resources (Inventory, Cash) and the Agents (Customer, Employee) are natural Dimension Tables: they provide descriptive context about what was exchanged and who was involved.
The mapping is almost direct:
| REA entity | Star Schema counterpart |
|---|---|
| Event | Fact Table |
| Resource | Dimension Table |
| Agent | Dimension Table |
| Stockflow relationship | Foreign key (Fact → Resource dimension) |
| Participation relationship | Foreign key (Fact → Agent dimension) |
When you use a VLOOKUP in Excel to bring product descriptions into a transaction table, or when you slice a Pivot Table by customer, you are implementing exactly this mapping — connecting an event record to its resource and agent dimensions. The REA model supplies the conceptual grammar; the Star Schema gives you the implementation strategy.
4 From REAM diagram to business policies and rules
Razorback Bycicles (RB) is a small company that lease bikes from alumni and people that do not use their bikes, and then rent these bikes to current students. RB is located in Monterrey Tech, Queretaro Campus. Here is a REA diagram ONLY for the Acquisition process of the company:
Based on this REA diagram, respond to the Quiz that is posted in Canvas.
5 Challenge 2 - Applying the REA model - Creating a Database
International Food, Inc (InfoIn) is a distributor of several food products in different countries. Their main customers are retail chains like supermarkets. Recently the company decided to implement a comprehensive Enterprise System implementation, and they decided for SAP. So, they have to migrate their existing legacy system to the SAP Enterprise System. The SAP implementation is expected to last for at least 1.5 years. You are just hired by the company as business analyst, so you will need access to the operational information and process some reports for decision making.
As part of the migration process between the legacy system and the SAP system, an IT engineer downloaded part of the database of the legacy system into text files. You asked to this IT engineer to have access to the data related to sales since one of your first tasks is to do an analysis about sales. Each table of the database was downloaded as text files (.txt). These files will be processed and uploaded to the SAP system later on.
During the migration process, the company will continue using the legacy system, but this system is very limited for executive type of reporting for decision making. The IT staff will be very busy in the migration and implementation process, so they do not have time for designing reports that you need for your first data analysis.
You were just hired in the division of Sales in a special department called Strategic Planning. Then, you need to learn about the products, the market, the competence, and about several performance indicators of the company. For the next executive meeting, the new CEO just asked you to get information and do an analysis of the efficiency and effectiveness of the sales division.
In order to get the reports they need for decision making, you suggest to use Power BI. Power BI is one of the important leaders in Business Intelligence/Analytics software platforms in the market according to Gartner. With PowerBI you can easily collect data and databases from different sources and systems, integrate them, and create visualizations that can lead to business insights and effective decision making.
The data sources may be an Excel Workbook, or a collection of cloud-based and on-premises hybrid data warehouses, internal ERP systems, internal legacy systems, etc. This BI application lets you easily connect to your data sources, whatever they are coming from, visualize and discover the information they need to support the decision making process in an organization.
For now, you only have access to the text files that were downloaded from the legacy system. These text files were already organized in one Excel Workbook (each file was placed in different Sheet). In this Workbook you have sales transaction data for the last 3 years. Then, you need to get this data and create performance indicators for the executive meeting.
Before working on the executive reports and performance indicators, you have to do the following:
• Download the DatabaseInfoIn.xlsx file from the course site (Canvas/Modules). Save this file in your computer
– The file has several Sheets, and each sheet is one table from the original database of the ERP system.
– Each Data Sheet represents an Entity (A Resource, Event or Agent). Navigate in the Excel Sheets and try to figure out the possible relationships among the entities. In a paper draw a possible REA model.
– In any database each table is either a resource (or resource type), event, agent, or a relationship between any of these entities. A RELATIONSHIP TABLE is always needed for those relationships that have cardinalities of MANY-TO-MANY relationships. How can you identify these many-to-many relationships? Looking at the Cardinalities. For those relationships that have a maximum cardinality of N in both sides of the Entities, then this relationship is a MANY-TO-MANY relationship.
– Which is the table that represents the many-to-many relationship in the database?
• Follow instructions on class so you can import the DatabaseInforIn dataset into Power BI.
• According to your REA model, establish all the relationships using your PowerPivot data model. Follow instructions to do this in class.
Now that you have your Data Model in a structured database, you can start extracting meaninful information for decision making. You have to work on following reports the CEO asked you:
The company is planning to have a special program with existing customers. The CEO wants to know who have been the best customers from BRAZIL in terms of total amount of sales for each year for the last 3 years. Calculate Sales BEFORE DISCOUNTS. He wants to see a complete list of customers sorted by total amount of sales (use the Country attribute from the the Customer table). He want to see this list from the best customer to the worst customer in terms of sales.
For sending their orders to their customers, the company has special contratcs with some shipping companies who deliver all orders. Recently they have had some problems with these shipping companies, so the CEO asked you a list of the shipers along with the total number of orders they have sent in the last 3 years, and how much money the shipping firms have received for their services, and the average amount they charge for each shipment. The CEO wants to see a list of shippers SORTED by the AVERAGE freight by Order.
You have to create one “Page” for each report. Name the pages with meaningful names related to the report.
6 Challenge 3 - Relationships and Visualization
View the REA model of the InfoIn case (It is a pptx file at Canvas/Modules). Before we continue with information retrieval, you need to understand the REA model, specially the cardinalities. With a good REA model you can make sure that your relationships of your Database model in PowerPivot are correct. If you are missing a relationship or have a mistake in one relationship, it is very likely that most of your queries or reports will not be correct, no matter that you correctly design your queries. In this part you do not have to submit anything, just learn from the REA model, and how that is represented in PowerPivot using relationships among tables.
The main difference between using only Excel compared with a database model using Power Bi is that in Power Bi you do not need to put all relevant information in one sheet using the vlookup (buscarv) function. In Power Bi you can use any related table in the database to do report tables with slicers.
In general for data analysis, Power Bi is much more powerful than only using Excel. Actually Microsoft used the SQL Server database technology to create Power BI, so end users like you have the potential to do very sophisticated data analysis that only Information Technology experts used to do. Although this is a great learning opportunity for you, it also requires a lot of PRACTICE, PRACTICE and PRACTICE!
In the REA model you can have 3 types of relationships:
• 1-to-1
• 1-to-N (or N-to-1)
• N-to-N
To identify the relationship type in your REA, you have to look only at the MAXIMUM cardinalities of the relationship. In your Power Bi data model, you only have 2 types of relationships between the tables: a) 1-to-1 and b)1-to-N. The reason is that a N-to-N relationship become a table in your Power Bi database model (for example, the OrdersDetail table in your REA). Most of the relationships in any model will be 1-to-N relationship.
In your data model there are the entities you can use as a “Pivot” or categorical variable to do an aggregation. When constructing your report, you have to post this categorical variable in the “ROWS” section.
In Power BI the reports are constructed using “Pages”. A “Page” is equivalent to a set of Excel Pivot Tables and Graphs with filters. We can create several “pages” to create connected and dynamic dashboards. To design a “Page” or report to solve business questions, you can follow a simple process:
Identify which piece of information is required. More specifically, what will be your categorical variable that you will use for your aggregation, and which variable you will aggregate (sum, average, count, etc). Imagine the final report as a table.
Once you understand the REA data model of the database, then you have to identify the specific tables where those pieces of information are located.
Once your Data model is well defined with ALL relationships established according to the REA model, then you have to create a new report using the report view. Look at the data model to identify the directions of the arrows and identify the “Pivot” entity (those entities that have an arrow pointing towards them)
Identify the specific attributes of the tables that you will use to construct the report. You can classify these attributes in 3:
Categories or “Pivot” attributes to group the information. For example, Customer name or Product name in case you want to analyze information by customer or by product. You have to post this attribute(s) in the ROW box of the report design view.
Numeric attributes you will aggregate in a specific way. For example, total sales, which you can either sum or average by the categories specified previously. You have to post this attribute in the VALUES box of the report design view.
Variables (can be numeric or categorical or text) to use them as filters or slicers. For example, Countries, to see specific pivot table selecting any country.
Make sure you drag these attributes to the right place in the report Design view
Specify the aggregation formula you will use for the numeric attributes. This can be either count, sum, average, minium, maximum or standard deviation.
Let’s continue with our data analysis of the company. Go to the OrdersDetail table and create a calculated column called “Subtotal with disc” that will have the Sales amount for each order-product INCLUDING the corresponding DISCOUNT. The discount column is given in decimals.
Now you have to work on the following 2 reports. Add one “Page” in your PowerBI file for each report.
The CEO is interested in expanding the market in Denmark. Then, he needs a list of all invoices that have been sold and shipped in 2013 to Denmark. This list have to have the following columns: Order ID, Company Name (the Customer), Number of different products in the order, Total units of the order (considering all products), Total sale ($) of the order. (Use slicers to dynamically filter by COUNTRY, CITY and YEAR).
The company is working on an strategic expansion plan. The CEO wants to know a world perspective of the company. He wants to see Sales by country and by year, sorted by total sales. He wants to see first the country with the highest amount of sales for the 3 years, and at the end the country with the lowest amount of sales for the 3 years.
YOU HAVE TO SUBMIT :
YOUR .PBIX FILE, AND
A WORD FILE WITH THE SCREEN SHOTS OF THE RESULTS OF YOUR DASHBOARDS