ETL and Data Pipelines

1 ETL and Data Pipelines

1.1 Introduction to ETL and Data Pipelines

  • ETL stands for Extract, Transform, and Load
  • Loading means writing the data to its destination environment
  • Cloud platforms are enabling ELT to become an emerging trend
  • The key differences between ETL and ELT include the place of transformation, flexibility, Big Data support, and time-to-insight
  • There is an increasing demand for access to raw data that drives the evolution from ETL, which is still used, to ELT, which enables ad-hoc, self-serve analytics
  • Data extraction often involves advanced technology including database querying, web scraping, and APIs
  • Data transformation, such as typing, structuring, normalizing, aggregating, and cleaning, is about formatting data to suit the application
  • Information can be lost in transformation processes through filtering and aggregation
  • Data loading techniques include scheduled, on-demand, and incremental
  • Data can be loaded in batches or streamed continuously

2 Week 1

2.0.1 Part 1

3 Week 2

3.0.1 Part 2

4 Week 3

4.0.1 Part 2

5 Week 4

5.0.1 Part 1

5.0.2 Part 2