Session 1: Getting Started with Lakebase and Apps

This session dove deep into Lakebase features and its ability to support Databricks Apps. In this session we conducted a hands-on Databricks lab through Vocareum on how to set up a databricks app that connects to Unity Catalog. We took on the role of a data engineer that is working for an eCommerce platform. The full lab and code can be accessed below.

Lab URL: https://dbfe.vocareum.com/home/forgot.php?id=5172041

Takeaway: You aren’t limited to just one language in databricks apps. You can implement visualization features from python, R, tableau, and Power Bi to do things that Databricks ai/bi dashboards can’t.

  • Prompt Genie: How can I use databricks apps for visualization.

Session 2: Introduction to Databricks Lakehouses

At its core a data Lakehouse combines the flexibility of a Data Lake and the structure of a data warehouse. It utilizes Lakeflow in its creation process.

Datalake (flexible) + data warehouse (structure) = data lakehouse.

This session covered the basics of data Lakehouses by discussing Lakeflow features. To utilize Lakeflow load data into databricks:

  1. Create a warehouse for storage
  2. Load data into Databricks using Lakeflow (databricks ingestion, connection, and jobs feature)
  3. Set up jobs or alerts to automatically interact with your data
  4. Utilize Genie for interaction with data

Takeaway: Databricks has a feature called ‘Lakebridge’ that could be useful for our migration efforts, translating SQL, and validating data. End-to-end lakebridge started with a analyzing (scope assessment), moves to LLM converter (code conversion), Data migration (lakeflow connect), and ends with data validation.


Session 3: Why the Data Warehouse is the best Lakehouse

This session covered:

  • Built-in functions: functions like ai_query, ai_fix_grammar, ai_translate, ai_classify to do searches on images, reviews, or dashboards
  • Lakebridge: A migration tool inside of databricks used for migrating legacy systems into databricks
-- An Ai function in databricks to correct grammatical errors
SELECT ai_fix_grammar('This sentance have some mistake');

-- A classification function that lets you perform sentiment annotation analysis. This will classify the message into urgent or not urgent categories
SELECT ai_classify("My pass is leaked.", array("urgent", "not urgent"));

-- This translation function will translate this phrase into french
SELECT ai_translate("This function is so amazing!", "fr");

-- This function can be used to describe a dashboard or image in verbal form
SELECT ai_query('Describe this image in ten words or less')
FROM read_files `path_path_path`

Session 4: Embedded Analytics Ai/Bi dashboards

This session covered features in ai/bi dashboards. - You can allow your users to have the same permissions or grant unique permissions to dashboards. - Genie spaces are the greatest feature in databricks for empowering non- technical users to use databricks.

Takeaway: Genie ontology: Databricks presented this for the first time at the conference. This is the brain layer that connects business definitions, wikis, tickets, documents, and chat threads to databricks. This will serve to bridge the gap between the insights we get out of our data and business users’ knowledge.


Session 5: Forecasting at Databricks

This session provided a framework for getting BU’s to adopt a forecast or model. Three fundamentals were laid out prior to the framework. Explainable - why did something change. You need more than just conviction, you need real evidence. Outside BUs see you hiding behind jargon when you don’t explain things clearly. Defensible - the model has to be able to show in a reproducible way: but! I didn’t expect that. Diagnosable - Do you know where to look when the model gets questioned? Can you show errors, weaknesses, or skews and provide evidence for why this occurred

The ADOPT framework was given as a method to helping BU’s adopt ideas

  • A - Agree on the Framework first: Definitions, the input channel, and success criteria settled BEFORE the model.
  • D - Decomposition is the deliverable: Ship the breakdown, not the number. The number is only the rollup.
  • O - Own Every Slice: A named owner on every component, No black boxes.
  • P - Prove it Holds: Prediction intervals plus a back-test that beats the obvious baselines
  • T - Trace Every Move: Attribution localizes any change in a single cell, in minutes not days