A collection of stream processing pipelines demonstrating the capabilities of RisingWave, a PostgreSQL-compatible streaming database. This repository accompanies a Medium article series on building real-time data pipelines.
Pipedream provides three progressive stream processing pipelines, each demonstrating increasingly complex use cases and techniques:
- Log Analytics Pipeline - Process web server logs for real-time monitoring
- E-commerce Analytics Pipeline - Track user behavior, product performance, and sales metrics
- IoT Sensor Network Pipeline - Monitor sensor data with geospatial analytics
Each pipeline is self-contained with fully documented SQL scripts, sample data, and detailed explanations.
- RisingWave installed locally or in the cloud
psql
command-line client for PostgreSQL
- Clone this repository
- Start your RisingWave instance
- Choose a pipeline to explore and follow its README
# Example setup for the Log Analytics Pipeline
cd pipelines/01_sentence_stream
psql -h localhost -p 4566 -d dev -f create_tables.sql
psql -h localhost -p 4566 -d dev -f create_views.sql
psql -h localhost -p 4566 -d dev -f insert_test_data.sql
A beginner-friendly pipeline that processes web server logs:
- Core Features: Text parsing, timestamp handling, tumbling windows
- SQL Techniques: String functions, windowing, aggregation
- Skills Demonstrated: Basic stream processing, time-series analysis
An intermediate-level pipeline focused on business analytics:
- Core Features: Funnel analysis, conversion tracking, product performance
- SQL Techniques: JOIN operations, sliding windows, complex aggregations
- Skills Demonstrated: Business metrics, multi-table streaming queries
View E-commerce Analytics Pipeline →
An advanced pipeline for sensor data processing:
- Core Features: Geospatial analysis, anomaly detection, predictive maintenance
- SQL Techniques: JSON processing, statistical calculations, multi-dimensional analysis
- Skills Demonstrated: Complex event processing, time-series analysis, predictive analytics
View IoT Sensor Network Pipeline →
When working with RisingWave, be aware of certain constraints and best practices:
- Use
DOUBLE PRECISION
instead ofDECIMAL
for numeric computations - Always include explicit
PARTITION BY
clauses in window functions - Use
jsonb_build_object()
instead of string concatenation for JSON - Avoid
random()
in favor of deterministic alternatives for test data - Reference source stream timestamps instead of
NOW()
in streaming contexts
For a comprehensive list of limitations and solutions, see our limitations documentation.
This repository accompanies a Medium article series on building stream processing pipelines with RisingWave. The article provides additional context, visualizations, and step-by-step explanations of these pipelines.
This project is licensed under the MIT License - see the LICENSE file for details.