ETL and Data Pipelines

1 ETL and Data Pipelines

1.1 Introduction to ETL and Data Pipelines

ETL stands for Extract, Transform, and Load
Loading means writing the data to its destination environment
Cloud platforms are enabling ELT to become an emerging trend
The key differences between ETL and ELT include the place of transformation, flexibility, Big Data support, and time-to-insight
There is an increasing demand for access to raw data that drives the evolution from ETL, which is still used, to ELT, which enables ad-hoc, self-serve analytics
Data extraction often involves advanced technology including database querying, web scraping, and APIs
Data transformation, such as typing, structuring, normalizing, aggregating, and cleaning, is about formatting data to suit the application
Information can be lost in transformation processes through filtering and aggregation
Data loading techniques include scheduled, on-demand, and incremental
Data can be loaded in batches or streamed continuously

2 Week 1

2.0.1 Part 1

3 Week 2

3.0.1 Part 2

4 Week 3

4.0.1 Part 2

5 Week 4

5.0.1 Part 1

5.0.2 Part 2