📢 Early Release: This is an early release of async-cassandra. While it has been tested extensively, you may encounter edge cases. We welcome your feedback and contributions! Please report any issues on our GitHub Issues page.
This is a monorepo containing two related Python packages:
- async-cassandra - The main async wrapper for the Cassandra Python driver, enabling async/await operations with Cassandra
- async-cassandra-bulk - (🚧 Active Development) High-performance bulk operations extension for async-cassandra
- 📦 Repository Structure
- ✨ Overview
- 🏗️ Why create this framework?
⚠️ Important Limitations- 🚀 Key Features
- 🔀 Alternative Libraries
- 📋 Requirements
- 🔧 Installation
- 📚 Quick Start
- 🤝 Contributing
- 📞 Support
- 📖 Documentation
- 🎯 Running the Examples
- ⚡ Performance
- 📝 License
- 🙏 Acknowledgments
- ⚖️ Legal Notices
async-cassandra is a Python library that enables the Cassandra driver to work seamlessly with async frameworks like FastAPI, aiohttp, and Quart. It provides an async/await interface that prevents blocking your application's event loop while maintaining full compatibility with the DataStax Python driver.
When using the standard Cassandra driver in async applications, blocking operations can freeze your entire service. This wrapper solves that critical issue by bridging the gap between Cassandra's thread-based I/O and Python's async ecosystem, ensuring your web services remain responsive under load.
In async Python applications, an event loop manages all operations in a single thread. Think of it like a smart traffic controller - it efficiently switches between tasks whenever one is waiting for I/O (like a database query). This allows handling thousands of concurrent requests without creating thousands of threads.
The key issue: When you use the standard Cassandra driver (which is synchronous) inside an async web framework like FastAPI or aiohttp, you create a blocking problem. The synchronous driver operations block the event loop, preventing your async application from handling other requests.
Important clarification: This blocking issue only occurs when:
- You're building an async application (using FastAPI, aiohttp, Quart, etc.)
- You use synchronous database operations inside async handlers
If you're building a traditional synchronous application (Flask, Django without async views), the standard Cassandra driver works fine and you don't need this wrapper.
Here's a concrete example showing the problem and solution:
# ❌ Synchronous code in an async handler - BLOCKS the event loop
from fastapi import FastAPI
from cassandra.cluster import Cluster
app = FastAPI()
cluster = Cluster(['localhost'])
session = cluster.connect()
@app.get("/users/{user_id}")
async def get_user(user_id: str):
# This blocks the event loop! While this query runs,
# your FastAPI app cannot process ANY other requests
result = session.execute("SELECT * FROM users WHERE id = %s", [user_id])
return {"user": result.one()}
# ✅ Async code with our wrapper - NON-BLOCKING
from contextlib import asynccontextmanager
from fastapi import FastAPI
from async_cassandra import AsyncCluster
session = None
user_stmt = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global session, user_stmt
# Use context managers for proper resource management
async with AsyncCluster(['localhost']) as cluster:
async with cluster.connect() as session:
# Prepare statement for better performance
user_stmt = await session.prepare(
"SELECT * FROM users WHERE id = ?"
)
yield # App runs here
# Cleanup happens automatically
app = FastAPI(lifespan=lifespan)
@app.get("/users/{user_id}")
async def get_user(user_id: str):
# This doesn't block! The event loop remains free to handle
# other requests while waiting for the database response
result = await session.execute(user_stmt, [user_id])
return {"user": result.one()}
The key difference: with the sync driver, each database query blocks your entire async application from handling other requests. With async-cassandra, the event loop remains free to process other work while waiting for database responses.
This library provides true async/await support, enabling:
- Non-blocking Operations: Prevents your async application from freezing during database queries
- Framework Compatibility: Works naturally with FastAPI, aiohttp, and other async frameworks
- Clean Async Code: Use async/await syntax throughout your application
- Better Concurrency Management: Leverage the event loop for handling concurrent requests
See our Architecture Overview for technical details, or learn more about What This Wrapper Actually Solves (And What It Doesn't).
The standard Cassandra driver's manual paging (fetch_next_page()
) is synchronous, which blocks your entire async application:
# ❌ With standard driver - blocks the event loop
result = await session.execute("SELECT * FROM large_table")
while result.has_more_pages:
result.fetch_next_page() # This blocks! Your app freezes here
# ✅ With async-cassandra streaming - truly async
result = await session.execute_stream("SELECT * FROM large_table")
async for row in result:
await process_row(row) # Non-blocking, other requests keep flowing
This is critical for web applications where blocking the event loop means all other requests stop being processed. For a detailed explanation of this issue, see our streaming documentation.
This wrapper makes the cassandra-driver compatible with async Python applications, but it's important to understand what it does and doesn't do:
What it DOES:
- ✅ Prevents blocking the event loop
- ✅ Provides async/await syntax
- ✅ Enables use with async frameworks (FastAPI, aiohttp)
- ✅ Allows concurrent operations via event loop
What it DOESN'T do:
- ❌ Make the underlying I/O truly asynchronous (still uses threads)
- ❌ Provide performance improvements over the sync driver
- ❌ Remove thread pool limitations (concurrency still bounded by driver's thread pool size)
- ❌ Eliminate thread overhead
The cassandra-driver uses blocking sockets and thread pools internally. This wrapper provides a compatibility layer but cannot change the fundamental architecture. For a detailed technical analysis, see our Why Async Wrapper documentation.
- Async/await interface for all Cassandra operations
- Streaming support for memory-efficient processing of large result sets
- Automatic retry logic for SELECT queries, with idempotency checking for writes
- Connection monitoring and health checking capabilities
- Metrics collection with pluggable backends (in-memory, Prometheus)
- Type hints throughout the codebase
- Compatible with standard cassandra-driver types (Statement, PreparedStatement, etc.)
- Context manager support for proper resource cleanup
Several other async Cassandra drivers exist for Python, each with different design approaches:
- ScyllaPy: Rust-based driver with Python bindings
- Acsylla: C++ driver wrapper using Cython
- DataStax AsyncioReactor: Experimental asyncio support in the official driver
See our comparison guide for technical differences between these libraries.
- Python 3.12 or higher
- Apache Cassandra 4.0+ (for CQL protocol v5 support)
- cassandra-driver 3.29.2+
- CQL Protocol v5 or higher (see below)
async-cassandra requires CQL protocol v5 or higher for all connections. We verify this requirement after connection to ensure you get the best available protocol version.
# Recommended: Let driver negotiate to highest available
cluster = AsyncCluster(['localhost']) # Negotiates to highest (currently v5)
await cluster.connect() # Fails if negotiated < v5
# Explicit versions (v5+):
cluster = AsyncCluster(['localhost'], protocol_version=5) # Forces v5 exactly
# This raises ConfigurationError immediately:
cluster = AsyncCluster(['localhost'], protocol_version=4) # ❌ Not supported
Why We Enforce v5+ (and not v4 or older):
-
Async Performance: Protocol v5 introduced features that significantly improve async operations:
- Better streaming control for large result sets
- Improved connection management per host
- More efficient prepared statement handling
-
Testing & Maintenance: Supporting older protocols would require:
- Testing against Cassandra 2.x/3.x (v3/v4 protocols)
- Handling protocol-specific quirks and limitations
- Maintaining compatibility code for deprecated features
-
Security & Features: Older protocols lack:
- Modern authentication mechanisms
- Proper error reporting for async contexts
- Features required for cloud-native deployments
-
Industry Standards:
- Cassandra 4.0 (with v5) was released in July 2021
- Major cloud providers default to v5+
- Cassandra 3.x reached EOL in 2023
What This Means for You:
When connecting:
- Cassandra 4.0+: Automatically uses v5 (highest currently supported)
- Cassandra 3.x or older: Connection fails with:
ConnectionError: Connected with protocol v4 but v5+ is required.
Your Cassandra server only supports up to protocol v4.
async-cassandra requires CQL protocol v5 or higher (Cassandra 4.0+).
Please upgrade your Cassandra cluster to version 4.0 or newer.
Upgrade Options:
- Self-hosted: Upgrade to Cassandra 4.0+ or 5.0 and consider AxonOps to help manage your cluster
- AxonOps: Supports Cassandra 4.0+ (and earlier releases also)
- AWS Keyspaces: Already supports v5
- Azure Cosmos DB: Check current documentation
- DataStax Astra: Supports v5+ by default
We understand this requirement may be inconvenient for some users, but it allows us to provide a better, more maintainable async experience while focusing our testing and development efforts on modern Cassandra deployments.
# From PyPI
pip install async-cassandra
# From source (for development)
cd libs/async-cassandra
pip install -e .
🚧 In Active Development: async-cassandra-bulk is currently under development and not yet available on PyPI. It will provide high-performance bulk operations for async-cassandra.
import asyncio
from async_cassandra import AsyncCluster
async def main():
# Connect to Cassandra
cluster = AsyncCluster(['localhost'])
session = await cluster.connect()
# Execute queries
result = await session.execute("SELECT * FROM system.local")
print(f"Connected to: {result.one().cluster_name}")
# Clean up
await session.close()
await cluster.shutdown()
if __name__ == "__main__":
asyncio.run(main())
For more detailed examples, see our Getting Started Guide.
We welcome contributions! Please see:
- Contributing Guidelines - How to contribute
- Developer Documentation - Development setup, testing, and architecture
Important: All contributors must sign our Contributor License Agreement (CLA) before their pull request can be merged.
- Issues: Please report bugs and feature requests on our GitHub Issues page
- Community: For questions and discussions, visit our GitHub Discussions
- Company: Learn more about AxonOps at https://axonops.com
- Getting Started Guide - Start here!
- API Reference - Detailed API documentation
- Troubleshooting Guide - Common issues and solutions
- Understanding Context Managers - Deep dive into Python context managers
- Architecture Overview - Technical deep dive
- Connection Pooling Guide - Understanding Python driver limitations
- Thread Pool Configuration - Tuning the driver's executor
- Streaming Large Result Sets - Efficiently handle large datasets
- Performance Guide - Optimization tips and benchmarks
- Retry Policies - Why we have our own retry policy
- Metrics and Monitoring - Track performance and health
- FastAPI Integration - Complete REST API example
- More Examples - Additional usage patterns
The async-cassandra library includes comprehensive examples demonstrating various features and use cases. Examples are located in the libs/async-cassandra/examples/
directory.
First, navigate to the async-cassandra directory:
cd libs/async-cassandra
Then run any example with: make example-<name>
make example-basic
- Basic connection and query executionmake example-streaming
- Memory-efficient streaming of large result sets with True Async Pagingmake example-context-safety
- Demonstrates proper context manager usage and resource isolationmake example-export-large-table
- Export large tables to CSV with progress trackingmake example-export-parquet
- Export data to Parquet format with complex data typesmake example-metrics
- Comprehensive metrics collection and performance monitoringmake example-metrics-simple
- Basic metrics collection examplemake example-realtime
- Real-time data processing with sliding window analyticsmake example-streaming-demo
- Visual demonstration that streaming doesn't block the event loop
If you have Cassandra running elsewhere:
# From the libs/async-cassandra directory:
cd libs/async-cassandra
# Single node
CASSANDRA_CONTACT_POINTS=10.0.0.1 make example-streaming
# Multiple nodes
CASSANDRA_CONTACT_POINTS=10.0.0.1,10.0.0.2,10.0.0.3 make example-streaming
# With custom port
CASSANDRA_CONTACT_POINTS=cassandra.example.com CASSANDRA_PORT=9043 make example-basic
- Basic Example: Shows fundamental operations like connecting, executing queries, and using prepared statements
- Streaming Examples: Demonstrate True Async Paging for processing millions of rows without memory issues
- Export Examples: Show how to export Cassandra data to various formats (CSV, Parquet) with progress tracking
- Metrics Examples: Illustrate performance monitoring, query tracking, and connection health checking
- Real-time Processing: Demonstrates processing time-series IoT data with concurrent operations
- Context Safety Demo: Proves that errors in one operation don't affect others when using context managers
Each example includes detailed comments explaining the concepts and best practices. Start with example-basic
if you're new to the library.
async-cassandra enables your async Python application to work with Cassandra without blocking the event loop. While it doesn't eliminate the underlying driver's thread pool, it prevents those blocking operations from freezing your entire application. This is crucial for web servers where a blocked event loop means no requests can be processed.
The wrapper's primary benefit is compatibility, not raw performance. It allows you to use Cassandra in async applications like FastAPI without sacrificing the responsiveness of your service.
For performance optimization tips and understanding the limitations, see our Performance Guide.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- DataStax™ for the Python Driver for Apache Cassandra
- The Python asyncio community for inspiration and best practices
- All contributors who help make this project better
This project may contain trademarks or logos for projects, products, or services. Any use of third-party trademarks or logos are subject to those third-party's policies.
Important: This project is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation or the Apache Cassandra project. It is an independent framework developed by AxonOps.
- AxonOps is a registered trademark of AxonOps Limited.
- Apache, Apache Cassandra, Cassandra, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.
- DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
This project is an independent work and has not been authorized, sponsored, or otherwise approved by the Apache Software Foundation.
This project uses the Apache License 2.0, which is compatible with the Apache Cassandra project. We acknowledge and respect all applicable licenses of dependencies used in this project.
Made with ❤️ by the AxonOps Team