View on GitHub

Twitter_Data_Pipeline_ETL

Data engineering project

Twitter Data Pipeline Project

Click HERE to see the full and detailed script

Project Overview

Data engineering project using Airflow to perform ETL process on Twitter data and executing tasks inside Docker containers

Objectives

Part I. Establishing Airflow inside Docker container

First of all, what is a Docker?

And what is Docker-compose?

What is Airflow?

The above DAG demonstrates the pipeline for this project. Once the task of extracting and transforming is complete, it triggers the next task, which is the loading task, to store the file into AWS S3.

Steps

(A snippet of docker-compose.yml)

Part II. Extraction and Transformation

Part III. Loading Data

Why use AWS S3 for storing files?

What is boto3?

Steps

Future assignments