Skip to content

This project implements a reinforcement learning-based packet scheduler for a simulated router handling multiple traffic classes (video, voice, and best-effort). The goal is to meet delay-based QoS constraints while minimizing latency, using model-free RL techniques such as Q-Learning.

License

Notifications You must be signed in to change notification settings

danish1804/Reinforcement-Learning-Router

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

40 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“‘ Reinforcement Learning-Based Packet Scheduler

This project explores the use of Deep Reinforcement Learning (DRL) for intelligent packet scheduling in a simulated network router environment. The goal is to build agents that can make real-time queue-serving decisions while balancing latency, QoS (Quality of Service), and fairness across multiple traffic types.


🏷️ Badges

Python Reinforcement Learning Gym Evaluation Baselines Coursework License: MIT


🎯 Project Goals

  • Prioritize delay-sensitive traffic (Video, Voice) over BestEffort
  • Enforce queue-specific QoS constraints:
    • Video: Delay ≀ 6 ms
    • Voice: Delay ≀ 4 ms
  • Prevent starvation of BestEffort
  • Minimize packet drop rate and mean delay
  • Penalize excessive queue switching

🎯 Project Overview

This project investigates the application of Deep Reinforcement Learning (DRL) to the challenge of real-time packet scheduling in a multi-queue router system. The objective is to dynamically prioritize network trafficβ€”Video, Voice, and BestEffortβ€”by:

  • πŸ•’ Minimizing average delay
  • 🎯 Enforcing queue-specific Quality of Service (QoS) constraints
  • βš–οΈ Ensuring fairness across competing traffic classes

🧠 DRL Agents Implemented

Two key RL architectures are explored:

  • DQN (Deep Q-Network): Operates on a discretized action/state space
  • PPO (Proximal Policy Optimization): Trained in a continuous state space using policy gradients

🌐 Custom Gym Environments

  • πŸ§ͺ RouterEnv: A fine-grained continuous-state environment optimized for PPO
  • πŸ—‚οΈ TabularStyleRouterEnv: A discretized router model tailored for DQN-based learning

🚦 Traffic Scenarios

Agents are trained and tested across multiple scenarios that vary:

  • Traffic arrival rates
  • Queue switching penalties
  • Overall network dynamics

This allows for evaluation under both normal and stress-tested network conditions.


πŸ“Š Evaluation & Outputs

Each episode tracks key performance metrics:

  • πŸ“ˆ Total reward
  • ⏱️ Mean delay
  • 🎯 QoS success rate
  • πŸ” Queue switching behavior

Results are exported as:

  • πŸ“ CSV logs for reproducibility
  • πŸ“Š Visual plots showing model performance trends over time

This project highlights how deep reinforcement learning can be adapted for QoS-constrained network scheduling, showcasing critical trade-offs between:

  • πŸ“ˆ Policy performance
  • βš–οΈ Fairness across queues
  • πŸ” Generalization across dynamic traffic scenarios

βš™οΈ System Architecture

Module Description
router_scheduler_env.py Primary Gym-style environment with continuous (13D) state
tabular_style_router_env.py Lightweight environment with discretized (6D) tabular-style state
dqn_agent.py DQN agent with replay buffer, target network, and Q-value logging
ppo_agent.py PPO agent using Stable-Baselines3, MlpPolicy
main_train.py Trains and evaluates DQN on Scenario 1 & 2
main_train_ppo.py Trains and evaluates PPO independently
plotManager.py Generates all evaluation plots, CSV exports, and summaries

πŸ› οΈ Environments

RouterEnv (used by PPO)

  • 13D continuous observation space
  • Tracks packet urgency, backlog, delays, and QoS violations
  • Includes switch penalty and backlog difference shaping

TabularStyleRouterEnv (used by DQN)

  • 6D MultiDiscrete state space: [length_bin, deadline_flag, delay_bin] Γ— 2
  • Suitable for tabular approximations but powered by a neural DQN agent
  • Emulates Q-table-like behavior with DQN generalization

🧠 Agents

DQN Agent

  • 3-layer MLP
  • Uses SmoothL1Loss, gradient clipping, and epsilon-greedy with decay
  • Experience replay with target network sync every 10 steps
  • Custom logging for Q-values every 500 steps

PPO Agent

  • Policy-gradient method using SB3's MlpPolicy
  • Trained with DummyVecEnv and TransformObservation for tabular compatibility
  • CPU-optimized (no CNN policy used)

🌍 Traffic Scenarios

Scenario 1 (Standard Load)

  • Video, Voice, BestEffort: 0.3, 0.25, 0.4 arrival rates
  • No switch penalty
  • Baseline scenario for fair prioritization and reward calibration

Scenario 2 (Heavier Load + Switch Penalty)

  • Increased arrival pressure and delayed switches
  • Tests agent stability under stress
  • Highlights how well the model generalizes or collapses under congestion

πŸ” Transfer Learning (Decision Rationale)

  • Not used for DQN v3.1.4 intentionally
  • Wanted to evaluate each scenario independently
  • Ensures Scenario 2 learning is not biased by Scenario 1 policies
  • Future models may include transfer + fine-tuning comparisons

πŸ“Š Metrics Tracked

Metric Description
QoS_Success Packets delivered within delay constraint
Avg_Delay Per-queue average delay per episode
Switch_Count Queue switches (action != 0)
Total_Reward Smoothed reward = time-based + state-based
Dropped Packets lost due to queue overflow
QoS_Rate Success % out of served packets

πŸ“ˆ Example Results (DQN v3.1.4, 350 Episodes)

Scenario Video QoS Voice QoS BE Drop Max Reward Notes
Scenario 1 βœ… ~99.9% βœ… ~97.1% Low πŸ”Ό ~1723 Stable, fair
Scenario 2 βœ… ~98.6% ⚠️ ~35% Medium πŸ”» ~585 Voice underperforming

πŸ“€ Output

  • Saved models in results/models/
  • CSV logs in results/plots/dqn/csv/
  • Evaluation plots in:
    • results/plots/dqn/
    • results/plots/ppo/

πŸ§ͺ Future Work

  • βœ… Reward tuning for Voice queue violations (v3.1.5+)
  • 🧠 Hybrid agent using shared DQN+PPO architecture
  • βš–οΈ Fairness metric integration and age-based prioritization
  • πŸ“‰ Comparative baselines (FIFO, EDF, Strict Priority)
  • πŸ” Curriculum learning for sequential scenario training

πŸ”— References

πŸ“‚ How to Run

➀ DQN Training

python main_train.py

About

This project implements a reinforcement learning-based packet scheduler for a simulated router handling multiple traffic classes (video, voice, and best-effort). The goal is to meet delay-based QoS constraints while minimizing latency, using model-free RL techniques such as Q-Learning.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published