LAYRA is the world’s first “visual-native” AI automation engine. It sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. From vision-driven Retrieval-Augmented Generation (RAG) to multi-step agent orchestration, LAYRA empowers you to build next-generation intelligent systems—no limits, no compromises.
- 🚀 Quick Start
- ❓ Why LAYRA?
- ⚡️ Core Superpowers
- 🚀 Latest Updates
- 🖼️ Screenshots
- 🧠 System Architecture
- 🧰 Tech Stack
- ⚙️ Deployment
- 📦 Roadmap
- 🤝 Contributing
- 📫 Contact
- 🌟 Star History
- 📄 License
Before starting, ensure your system meets these requirements:
- Docker and Docker Compose installed
- NVIDIA Container Toolkit configured (for GPU acceleration)
# Clone the repository
git clone https://github.com/liweiphys/layra.git
cd layra
# Edit configuration file (modify server IP/parameters as needed)
vim .env
# Key configuration options include:
# - SERVER_IP (server IP)
# - MODEL_BASE_URL (model download source)
# Initial startup will download ~15GB model weights (be patient)
docker compose up -d --build
# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>
Note: If you encounter issues with
docker compose
, try usingdocker-compose
(with the dash) instead. Also, ensure that you're using Docker Compose v2, as older versions may not support all features. You can check your version withdocker compose version
ordocker-compose version
.
Now that everything is running smoothly, happy building with Layra! 🚀✨ For detailed options, see the Deployment section.
While LAYRA's Visual RAG Engine revolutionizes document understanding, its true power lies in the Agent Workflow Engine - a visual-native platform for building complex AI agents that see, reason, and act. Unlike traditional RAG/Workflow systems limited to retrieval, LAYRA enables full-stack automation through:
-
🔄 Cyclic & Nested Structures
Build recursive workflows with loop nesting, conditional branching, and custom Python logic - no structural limitations. -
🐞 Node-Level Debugging
Inspect variables, pause/resume execution, and modify state mid-workflow with visual breakpoint debugging. -
👤 Human-in-the-Loop Integration
Inject user approvals at critical nodes for collaborative AI-human decision making. -
🧠 Chat Memory & MCP Integration
Maintain context across nodes with chat memory and access live information via Model Context Protocol (MCP). -
🐍 Full Python Execution
Run arbitrary Python code withpip
installs, HTTP requests, and custom libraries in sandboxed environments. -
🎭 Multimodal I/O Orchestration
Process and generate hybrid text/image outputs across workflow stages.
Traditional RAG systems fail because they:
- ❌ Lose layout fidelity (columns, tables, hierarchy collapse)
- ❌ Struggle with non-text visuals (charts, diagrams, figures)
- ❌ Break semantic continuity due to poor OCR segmentation
LAYRA changes this with pure visual embeddings:
🔍 It sees each page as a whole - just like a human reader - preserving:
- ✅ Layout structure (headers, lists, sections)
- ✅ Tabular integrity (rows, columns, merged cells)
- ✅ Embedded visuals (plots, graphs, stamps, handwriting)
- ✅ Multi-modal consistency between layout and content
Together, these engines form the first complete visual-native agent platform - where AI doesn't just retrieve information, but executes complex vision-driven workflows end-to-end.
Code Without Limits, Build Without Boundaries Our Agent Workflow Engine thinks in LLM, sees in visuals, and builds your logic in Python — no limits, just intelligence.
-
🔄 Unlimited Workflow Creation
Design complex custom workflows without structural constraints. Handle unique business logic, branching, loops, and conditions through an intuitive interface. -
⚡ Real-Time Streaming Execution (SSE)
Observe execution results streamed live – eliminate waiting times entirely. -
👥 Human-in-the-Loop Integration
Integrate user input at critical decision points to review, adjust, or direct model reasoning. Enables collaborative AI workflows with dynamic human oversight. -
👁️ Visual-First Multimodal RAG
Features LAYRA’s proprietary pure visual embedding system, delivering lossless document understanding across 50+ formats (PDF, DOCX, XLSX, PPTX, etc.). The AI actively "sees" your content. -
🧠 Chat Memory & MCP Integration
- MCP Integration Access and interact with live, evolving information beyond native context windows – enhancing adaptability for long-term tasks.
- ChatFlow Memory Maintain contextual continuity through chat memory, enabling personalized interactions and intelligent workflow evolution.
-
🐍 Full-Stack Python Control
- Drive logic with arbitrary Python expressions – conditions, loops, and more
- Execute unrestricted Python code in nodes (HTTP, AI calls, math, etc.)
- Sandboxed environments with secure pip installs and persistent runtime snapshots
-
🎨 Flexible Multimodal I/O
Process and generate text, images, or hybrid outputs – ideal for cross-modal applications. -
🔧 Advanced Development Suite
- Breakpoint Debugging: Inspect workflow states mid-execution
- Reusable Components: Import/export workflows and save custom nodes
- Nested Logic: Construct deeply dynamic task chains with loops and conditionals
-
🧩 Intelligent Data Utilities
- Extract variables from LLM outputs
- Parse JSON dynamically
- Template rendering engine
Essential tools for advanced AI reasoning and automation.
Forget tokenization. Forget layout loss.
With pure visual embeddings, LAYRA understands documents like a human — page by page, structure and all.
LAYRA uses next-generation Retrieval-Augmented Generation (RAG) technology powered by pure visual embeddings. It treats documents not as sequences of tokens but as visually structured artifacts — preserving layout, semantics, and graphical elements like tables, figures, and charts.
(2025.6.2) Workflow Engine Now Available:
- Breakpoint Debugging: Debug workflows interactively with pause/resume functionality.
- Unrestricted Python Customization: Execute arbitrary Python code, including external
pip
dependency installation, HTTP requests viarequests
, and advanced logic. - Nested Loops & Python-Powered Conditions: Build complex workflows with loop nesting and Python-based conditional logic.
- LLM Integration:
- Automatic JSON output parsing for structured responses.
- Persistent conversation memory across nodes.
- File uploads and knowledge-base retrieve with multi-modal RAG supporting 50+ formats (PDF, DOCX, XLSX, PPTX, etc.).
(2025.4.6) First Trial Version Now Available:
The first testable version of LAYRA has been released! Users can now upload PDF documents, ask questions, and receive layout-aware answers. We’re excited to see how this feature can help with real-world document understanding.
- Current Features:
- PDF batch upload and parsing functionality
- Visual-first retrieval-augmented generation (RAG) for querying document content
- Backend fully optimized for scalable data flow with FastAPI, Milvus, Redis, MongoDB, and MinIO
Stay tuned for future updates and feature releases!
Explore LAYRA's powerful interface and capabilities through these visuals:
LAYRA’s pipeline is designed for async-first, visual-native, and scalable document retrieval and generation.
The query goes through embedding → vector retrieval → anser generation:
PDFs are parsed into images and embedded visually via ColQwen2.5, with metadata and files stored in appropriate databases:
The workflow execution follows an event-driven, stateful debugging pattern with granular control:
-
Trigger & Debug Control
- Web UI submits workflow with configurable breakpoints for real-time inspection
- Backend validates workflow DAG before executing codes
-
Asynchronous Orchestration
- Kafka checks predefined breakpoints and triggers pause notifications
- Scanner performs AST-based code analysis with vulnerability detection
-
Secure Execution
- Sandbox spins up ephemeral containers with file system isolation
- Runtime state snapshots persisted to Redis/MongoDB for recovery
-
Observability
- Execution metrics streamed via Server-Sent Events (SSE)
- Users inject test inputs/resume execution through debug consoles
Frontend:
Next.js
,TypeScript
,TailwindCSS
,Zustand
,xyflow
Backend & Infrastructure:
FastAPI
,Kafka
,Redis
,MySQL
,MongoDB
,MinIO
,Milvus
,Docker
Models & RAG:
- Embedding:
colqwen2.5-v0.2
- LLM Serving:
Qwen2.5-VL series (or any OpenAI-compatible model)
Before starting, ensure your system meets these requirements:
- Docker and Docker Compose installed
- NVIDIA Container Toolkit configured (for GPU acceleration)
# Clone the repository
git clone https://github.com/liweiphys/layra.git
cd layra
# Edit configuration file (modify server IP/parameters as needed)
vim .env
# Key configuration options include:
# - SERVER_IP (public server IP)
# - MODEL_BASE_URL (model download source)
# Initial startup will download ~15GB model weights (be patient)
docker compose up -d --build
# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>
Note: If you encounter issues with
docker compose
, try usingdocker-compose
(with the dash) instead. Also, ensure that you're using Docker Compose v2, as older versions may not support all features. You can check your version withdocker compose version
ordocker-compose version
.
# Stop services (preserves data and configurations)
docker compose down
# Full cleanup (deletes databases, model weights and persistent data)
docker compose down -v
# Restart services
docker compose start
-
Initial model download may take significant time (~15GB). Monitor progress:
docker compose logs -f model-weights-init
-
Verify NVIDIA toolkit installation:
nvidia-container-toolkit --version
-
For network issues:
- Manually download model weights
- Copy to Docker volume: (typically at)
/var/lib/docker/volumes/layra_model_weights/_data/
- Create empty
complete.layra
file in both:colqwen2.5-base
foldercolqwen2.5-v0.2
folder
- 🚨 Critical: Verify downloaded weights integrity!
-
docker compose down
-v
flag warning: Permanently deletes all databases and models -
After modifying
.env
: Rebuild withdocker compose up --build
-
GPU requirements:
- Latest NVIDIA drivers
- Working
nvidia-container-toolkit
-
Monitoring tools:
# Container status docker compose ps -a # Resource usage docker stats
🧪 Technical Note: All components run exclusively via Docker containers.
Now that everything is running smoothly, happy building with Layra! 🚀✨
In the future, we will support multiple deployment methods including Kubernetes (K8s), and other environments. More details will be provided when these deployment options are available.
Short-term:
- Add Chinese Language Support (coming soon)
Long-term:
- Our evolving roadmap adapts to user needs and AI breakthroughs. New technologies and features will be deployed continuously.
Contributions are welcome! Feel free to open an issue or pull request if you’d like to contribute.
We are in the process of creating a CONTRIBUTING.md file, which will provide guidelines for code contributions, issue reporting, and best practices. Stay tuned!
liweiphys
📧 [email protected]
🐙 github.com/liweiphys/layra
📺 bilibili: Biggestbiaoge
🔍 微信公众号:LAYRA 项目
💼 Exploring Impactful Opportunities - Feel Free To Contact Me!
This project is licensed under the Apache License 2.0. See the LICENSE file for more details.
Endlessly Customizable Agent Workflow Engine - Code Without Limits, Build Without Boundaries.