Skip to content

A production-grade ETL pipeline for processing financial market data using Apache Airflow, dbt, and PostgreSQL.

Notifications You must be signed in to change notification settings

Javid912/Stock-Market-Analytics-data-pipline

Repository files navigation

πŸ“Š DataPipe Analytics

DataPipe Analytics Python Airflow PostgreSQL dbt Docker

A production-grade ETL pipeline for processing financial market data using Apache Airflow, dbt, and PostgreSQL. This project demonstrates modern data engineering practices with a focus on reliability, scalability, and performance.

🌟 Project Overview

This project implements a robust data engineering pipeline that processes financial market data from Alpha Vantage API. It showcases industry best practices in data engineering including data validation, testing, documentation, and monitoring.

graph TD
    A[Alpha Vantage API] -->|Extract| B[Raw Data Layer]
    B -->|Transform| C[Staging Layer]
    C -->|Model| D[Marts Layer]
    D -->|Visualize| E[Dashboards]
    
    style A fill:#f9a825,stroke:#f57f17,stroke-width:2px
    style B fill:#42a5f5,stroke:#1976d2,stroke-width:2px
    style C fill:#66bb6a,stroke:#388e3c,stroke-width:2px
    style D fill:#ab47bc,stroke:#7b1fa2,stroke-width:2px
    style E fill:#ec407a,stroke:#c2185b,stroke-width:2px
Loading

✨ Features

  • πŸ”„ Real-time Market Data: Automated extraction of stock market data from Alpha Vantage
  • πŸ›‘οΈ Data Quality: Comprehensive data testing and validation using dbt
  • πŸš€ Scalable Architecture: Containerized services with proper health checks and dependency management
  • πŸ“Š Visualization: Interactive Streamlit dashboard and Metabase BI platform
  • πŸ“ˆ Technical Analysis: Built-in indicators and market metrics
  • πŸ” Monitoring: Built-in logging and health monitoring for all services
  • πŸ“š Documentation: Extensive documentation of models, tests, and best practices
  • πŸ–₯️ Resource Optimization: Support for older hardware with minimal resource requirements

πŸ› οΈ Tech Stack

Category Technology
Orchestration Apache Airflow 2.7.3
Data Warehouse PostgreSQL 13
Transformation dbt 1.7.3
Containerization Docker & Docker Compose
Programming Python 3.9
Data Source Alpha Vantage API
Visualization Streamlit & Metabase
Testing pytest, dbt tests

πŸ—οΈ Architecture

Our data pipeline follows a modern layered architecture:

flowchart LR
    subgraph Extraction
        A[Alpha Vantage API] --> B[Airflow DAGs]
    end
    subgraph Storage
        B --> C[Raw Layer]
        C --> D[Staging Layer]
        D --> E[Marts Layer]
    end
    subgraph Visualization
        E --> F[Streamlit Dashboard]
        E --> G[Metabase]
    end
    
    style A fill:#f9a825,stroke:#f57f17,stroke-width:2px
    style B fill:#42a5f5,stroke:#1976d2,stroke-width:2px
    style C fill:#90caf9,stroke:#42a5f5,stroke-width:2px
    style D fill:#66bb6a,stroke:#388e3c,stroke-width:2px
    style E fill:#ab47bc,stroke:#7b1fa2,stroke-width:2px
    style F fill:#ec407a,stroke:#c2185b,stroke-width:2px
    style G fill:#7e57c2,stroke:#512da8,stroke-width:2px
Loading

πŸ“Š Data Model

Our data model follows a star schema design for analytics:

erDiagram
    fact_market_metrics ||--o{ dim_company : references
    dim_company {
        string symbol PK
        string company_name
        string sector
        decimal market_cap
        decimal pe_ratio
    }
    fact_market_metrics {
        date trading_date
        string symbol FK
        decimal close_price
        bigint volume
        decimal price_change_pct
    }
Loading

πŸš€ Getting Started

Prerequisites

  • Docker and Docker Compose
  • Python 3.9+
  • Make (optional, for using Makefile commands)
  • Alpha Vantage API key

πŸ“₯ Local Development Setup

  1. Clone the repository:
git clone https://github.com/javid912/datapipe-analytics.git
cd datapipe-analytics
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate
  1. Copy the example environment file and configure your API key:
cp .env.example .env
# Edit .env and add your Alpha Vantage API key
  1. Start the services:
# For standard hardware:
docker-compose up -d

# For older or resource-constrained hardware:
docker-compose -f docker-compose-minimal.yml up -d
  1. Access the services:

⚑ Performance Optimization

For older or resource-constrained hardware, we provide a minimal Docker Compose configuration:

docker-compose -f docker-compose-minimal.yml up -d

This configuration:

  • πŸ”½ Reduces memory usage for all containers
  • πŸ”½ Limits CPU usage
  • βœ… Starts only essential services
  • βœ… Optimizes database connections
  • βœ… Implements selective computation of technical indicators

πŸ“ Project Structure

datapipe-analytics/
β”œβ”€β”€ airflow/               # Airflow DAGs and configurations
β”‚   └── dags/             # DAG definitions
β”œβ”€β”€ dbt/                  # Data transformation
β”‚   β”œβ”€β”€ models/          # dbt models
β”‚   β”‚   β”œβ”€β”€ staging/    # Staging models
β”‚   β”‚   └── marts/      # Mart models
β”‚   β”œβ”€β”€ seeds/          # Seed data files
β”‚   └── tests/          # Data tests
β”œβ”€β”€ docker/              # Dockerfile definitions
β”œβ”€β”€ src/                 # Source code
β”‚   β”œβ”€β”€ dashboard/      # Streamlit dashboard
β”‚   β”œβ”€β”€ extractors/     # Data extraction modules
β”‚   └── loaders/        # Database loading modules
β”œβ”€β”€ tests/               # Python tests
└── docs/                # Documentation
    └── DEVELOPMENT_JOURNAL.md  # Development history

πŸ“Š Data Models

Our dbt models follow a layered architecture:

Layer Purpose Examples
Raw (public_raw) Original data from external sources raw_stock_prices, raw_company_info
Staging (public_staging) Clean, typed data from raw sources stg_daily_prices, stg_company_info
Marts (public_marts) Business logic transformations for analytics dim_company, fact_market_metrics

πŸ§ͺ Testing

The project includes comprehensive testing at multiple levels:

  • βœ… dbt tests: Data quality and business logic validation
  • βœ… Python unit tests: Code functionality verification
  • βœ… Integration tests: End-to-end pipeline validation
  • βœ… Container health checks: Service availability monitoring

πŸ“ˆ Visualization

Streamlit Dashboard

Streamlit Dashboard

Our Streamlit dashboard provides:

  • πŸ“Š Market overview with key metrics
  • πŸ“ˆ Technical analysis with indicators
  • πŸ” Company-specific deep dives
  • πŸ“‰ Historical price analysis

Metabase BI Platform

Metabase Dashboard

Metabase offers:

  • πŸ“Š Custom SQL queries and visualizations
  • πŸ“ˆ Scheduled reports and alerts
  • πŸ” Interactive filtering and exploration
  • πŸ“‰ Shareable dashboards and insights

Access Metabase at:

πŸ” Monitoring

Our monitoring approach includes:

  • πŸ”„ Service health monitoring via Docker health checks
  • πŸ“Š Airflow task monitoring and alerting
  • βœ… dbt test coverage and data quality metrics
  • πŸ“ Comprehensive logging for all components

🀝 Contributing

We welcome contributions! Please read our CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests.

πŸ› Issues and Feature Requests

Check out our Issues page to see current tasks, bugs, and feature requests. Feel free to pick up any issue labeled "good first issue" to get started!

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ—ΊοΈ Roadmap

  • βœ… Add Streamlit dashboard for data visualization
  • βœ… Implement resource optimization for older hardware
  • βœ… Add Metabase integration
  • πŸ”„ Implement real-time data processing
  • πŸ”„ Add more technical indicators
  • πŸ”„ Enhance monitoring and alerting
  • πŸ”„ Add support for more data sources

πŸ™ Acknowledgements

About

A production-grade ETL pipeline for processing financial market data using Apache Airflow, dbt, and PostgreSQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published