A real-time BGP monitoring application that detects and correlates Internet outages globally using live BGP updates from the RIPE RIS Live stream. Built in Java with Spring Boot, the system captures prefix-level visibility, identifies global withdrawals, detects recoveries, and correlates outages at the ASN level with real-time geolocation visualization.
- Real-time WebSocket ingestion of BGP UPDATEs (RIPE RIS Live)
- Live visibility tracking per prefix across global collectors
- Outage detection when a prefix becomes globally unreachable
- Recovery detection when a withdrawn prefix reappears
- ASN-level correlation of prefix outages into network-wide events
- Real-time geolocation with ASN information and organization names
- Interactive web dashboard with Leaflet.js map visualization
- Prometheus metrics exposure for monitoring dashboards
- REST API for frontend dashboard queries and external integrations
- Redis caching for performance optimization
- TimescaleDB for time-series data storage
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β RIPE RIS β β Spring Boot β β Frontend β
β Live Stream βββββΆβ Application βββββΆβ (Leaflet.js) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Redis β β TimescaleDB β β Prometheus β
β (Prefix State) β β (Event Store) β β (Metrics) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
- Connects to
wss://ris-live.ripe.net/v1/ws/
- Subscribes to BGP UPDATE messages
- Deserializes incoming BGP UPDATE messages
- Publishes messages to the
UpdateProcessor
- Automatic retry logic with exponential backoff
- Graceful shutdown handling
- Maintains per-prefix state in Redis with in-memory caching
- Handles announcements and withdrawals
- Detects:
- Global withdrawals β triggers
outage_start
- Re-announcements β triggers
recovery
- Global withdrawals β triggers
- Updates corresponding events in PostgreSQL
- Integrates with ASN correlation service
- Comprehensive error handling and metrics
- Multi-API redundancy for ASN information:
- BGPView API (primary)
- ASNLookup API (secondary)
- WHOIS mapping (fallback)
- Real geolocation data with latitude/longitude coordinates
- Organization names (e.g., "Google LLC", "Facebook, Inc.")
- Country information for geographic context
- Multi-level caching (memory + Redis with 24-hour TTL)
- Graceful fallbacks when APIs are unavailable
- Real-time correlation of prefix outages into ASN-wide events
- In-memory tracking of active ASN outages with timeout management
- Automatic closure when all prefixes recover or timeout expires
- Severity calculation based on percentage of ASN prefixes affected
- Scheduled cleanup of timed-out outages every minute
- Country information integration via geolocation service
- Stores peer visibility for each prefix
- TTL-based sliding cache to evict stale, stable prefixes (24 hours)
- Key format:
prefix:{CIDR}
- Value: set of collectors, last path, outage state
- In-memory caching layer for performance optimization
- JSON serialization for complex objects
Stores per-prefix events:
Column | Type | Description |
---|---|---|
id |
BIGSERIAL | Primary key |
prefix |
CIDR | BGP prefix (203.0.113.0/24 ) |
origin_asn |
INTEGER | ASN that originated the prefix |
timestamp |
TIMESTAMPTZ | Event time |
event_type |
VARCHAR | outage_start , recovery |
last_path |
TEXT | Last known AS path before event |
withdrawn_by |
TEXT[] | Collectors reporting withdrawals |
resolved_at |
TIMESTAMPTZ | If recovery, time the prefix reappeared |
duration |
INTERVAL | Duration of outage (if recovered) |
created_at |
TIMESTAMPTZ | Record creation time |
Stores grouped ASN-wide outage incidents:
Column | Type | Description |
---|---|---|
id |
BIGSERIAL | Primary key |
asn |
INTEGER | ASN affected |
start_time |
TIMESTAMPTZ | First prefix outage in group |
end_time |
TIMESTAMPTZ | Recovery or timer expiry |
duration |
INTERVAL | Total duration of the outage |
prefixes |
TEXT[] | List of affected prefixes |
severity |
INTEGER | Percentage of total ASN prefixes lost |
country |
TEXT | Country for ASN (from geolocation) |
created_at |
TIMESTAMPTZ | Record creation time |
GET /api/v1/outages/recent?limit=50
- Recent outage eventsGET /api/v1/outages/active
- Currently active outagesGET /api/v1/outages/map?hours=24
- Data for map visualization
GET /api/v1/asn/{asn}/events
- Events for specific ASNGET /api/v1/asn/{asn}/outages
- ASN-level outage correlationsGET /api/v1/asn/{asn}/info
- ASN information and geolocation
GET /api/v1/prefix/{prefix}/history
- Prefix outage history
GET /api/v1/stats/summary
- Summary statisticsGET /api/v1/health
- Health check
- Real-time visualization using Leaflet.js
- Color-coded markers:
- π΄ Red: Active outages
- π’ Green: Recovered outages
- π‘ Yellow: Mixed status (both active and recovered)
- Interactive popups with ASN details:
- Organization name
- ASN number
- Country information
- Outage count and status
- Affected prefixes
- Auto-refresh every 30 seconds
- Legend for map interpretation
- Live statistics (active outages, total outages, affected ASNs, avg duration)
- Recent outages list with clickable items
- Responsive design for mobile/desktop
- Real-time updates with geolocation data
Exposes comprehensive metrics:
ripe.bgp.messages.received
- BGP messages receivedripe.bgp.messages.processed
- BGP messages processedripe.bgp.processing.errors
- Processing errorsripe.prefix.outages
- Prefix outages detectedripe.prefix.recoveries
- Prefix recoveries detectedripe.stream.restarts
- Stream restart countripe.websocket.errors
- WebSocket errors
Scraped at /actuator/prometheus
-
BGP Announcement Arrives
- Add collector to prefix visibility set in Redis
- Update last seen timestamp and AS path
- If prefix was previously withdrawn β trigger recovery event
- Process for ASN correlation
-
BGP Withdrawal Arrives
- Remove collector from prefix visibility set
- Add collector to withdrawn_by set
- If visibility set becomes empty β trigger outage event
- Process for ASN correlation
-
ASN Correlation
- Group multiple prefix outages by ASN
- Track affected prefixes and duration
- 5-minute timeout window for correlation
- Calculate severity percentage
- Persist to
asn_outages
table
-
Recovery Detection
- Prefix is announced again after outage
- Trigger recovery event + update outage duration
- Remove from active ASN outage tracking
-
Geolocation Integration
- Fetch ASN information from multiple APIs
- Cache results in Redis and memory
- Provide real coordinates for map visualization
- Java 17 / Spring Boot 3.2.2
- Spring WebFlux (reactive programming)
- Spring Data JPA (database access)
- Spring Data Redis (caching)
- Gradle (build management)
- TimescaleDB (time-series database)
- PostgreSQL (relational database)
- Redis (in-memory data store)
- Micrometer + Prometheus (metrics)
- Leaflet.js (interactive maps)
- Bootstrap 5 (responsive UI)
- Vanilla JavaScript (ES6+)
- Font Awesome (icons)
- Docker (containerization)
- Docker Compose (service orchestration)
- Prometheus (metrics collection)
- Grafana (monitoring dashboards)
- Docker and Docker Compose
- At least 4GB RAM available
- Internet connection for BGP stream
git clone <repository-url>
cd NHP
docker-compose up -d
This starts:
- Application (port 8080)
- Redis (port 6379)
- TimescaleDB (port 5432)
- Prometheus (port 9090)
- Grafana (port 3000)
- Dashboard: http://localhost:8080
- API Documentation: http://localhost:8080/api/v1/health
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
docker-compose logs -f app
The database schema is automatically initialized via init-db.sql
:
- TimescaleDB extension enabled
- Hypertables for time-series optimization
- Indexes for query performance
- Triggers for automatic duration calculation
curl http://localhost:8080/api/v1/outages/recent?limit=10
curl http://localhost:8080/api/v1/asn/15169/info
curl http://localhost:8080/api/v1/stats/summary
curl http://localhost:8080/api/v1/outages/map?hours=24
Key metrics to monitor:
ripe_bgp_messages_received_total
- Message ingestion rateripe_prefix_outages_total
- Outage detection rateripe_prefix_recoveries_total
- Recovery detection ratejvm_memory_used_bytes
- Memory usageprocess_cpu_usage
- CPU usage
Import the following dashboards:
- System Overview - Application health and performance
- BGP Monitoring - Stream health and message rates
- Outage Analytics - Outage patterns and trends
./gradlew bootJar
./gradlew test
./gradlew bootRun
-
Database Connection Failed
- Ensure TimescaleDB container is running:
docker-compose ps
- Check logs:
docker-compose logs timescaledb
- Ensure TimescaleDB container is running:
-
Redis Connection Failed
- Ensure Redis container is running:
docker-compose ps
- Check logs:
docker-compose logs redis
- Ensure Redis container is running:
-
BGP Stream Not Receiving Data
- Check network connectivity
- Verify RIPE stream endpoint is accessible
- Review application logs for WebSocket errors
-
Map Not Loading
- Ensure application is running on port 8080
- Check browser console for JavaScript errors
- Verify API endpoints are responding
# Application logs
docker-compose logs -f app
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f redis
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request