🌌 LAYRA: The Visual-First AI Agent Engine That Sees, Understands & Acts

LAYRA is the world’s first “visual-native” AI automation engine. It sees documents like a human, preserves layout and graphical elements, and executes arbitrarily complex workflows with full Python control. From vision-driven Retrieval-Augmented Generation (RAG) to multi-step agent orchestration, LAYRA empowers you to build next-generation intelligent systems—no limits, no compromises.

🚀 Quick Start

📋 Prerequisites

Before starting, ensure your system meets these requirements:

Docker and Docker Compose installed
NVIDIA Container Toolkit configured (for GPU acceleration)

⚙️ Installation Steps

1. Configure Environment Variables

# Clone the repository
git clone https://github.com/liweiphys/layra.git
cd layra

# Edit configuration file (modify server IP/parameters as needed)
vim .env

# Key configuration options include:
# - SERVER_IP (server IP)
# - MODEL_BASE_URL (model download source)

2. Build and Start Service

# Initial startup will download ~15GB model weights (be patient)
docker compose up -d --build

# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>

Note: If you encounter issues with docker compose, try using docker-compose (with the dash) instead. Also, ensure that you're using Docker Compose v2, as older versions may not support all features. You can check your version with docker compose version or docker-compose version.

🎉 Enjoy Your Deployment!

Now that everything is running smoothly, happy building with Layra! 🚀✨ For detailed options, see the Deployment section.

❓ Why LAYRA?

🚀 Beyond RAG: The Power of Visual-First Workflows

While LAYRA's Visual RAG Engine revolutionizes document understanding, its true power lies in the Agent Workflow Engine - a visual-native platform for building complex AI agents that see, reason, and act. Unlike traditional RAG/Workflow systems limited to retrieval, LAYRA enables full-stack automation through:

⚙️ Advanced Workflow Capabilities

🔄 Cyclic & Nested Structures
Build recursive workflows with loop nesting, conditional branching, and custom Python logic - no structural limitations.
🐞 Node-Level Debugging
Inspect variables, pause/resume execution, and modify state mid-workflow with visual breakpoint debugging.
👤 Human-in-the-Loop Integration
Inject user approvals at critical nodes for collaborative AI-human decision making.
🧠 Chat Memory & MCP Integration
Maintain context across nodes with chat memory and access live information via Model Context Protocol (MCP).
🐍 Full Python Execution
Run arbitrary Python code with pip installs, HTTP requests, and custom libraries in sandboxed environments.
🎭 Multimodal I/O Orchestration
Process and generate hybrid text/image outputs across workflow stages.

🔍 Visual RAG: The Seeing Engine

Traditional RAG systems fail because they:

❌ Lose layout fidelity (columns, tables, hierarchy collapse)
❌ Struggle with non-text visuals (charts, diagrams, figures)
❌ Break semantic continuity due to poor OCR segmentation

LAYRA changes this with pure visual embeddings:

🔍 It sees each page as a whole - just like a human reader - preserving:

✅ Layout structure (headers, lists, sections)

✅ Tabular integrity (rows, columns, merged cells)

✅ Embedded visuals (plots, graphs, stamps, handwriting)

✅ Multi-modal consistency between layout and content

Together, these engines form the first complete visual-native agent platform - where AI doesn't just retrieve information, but executes complex vision-driven workflows end-to-end.

⚡️ Core Superpowers

🔥 The Agent Workflow Engine: Infinite Execution Intelligence

Code Without Limits, Build Without Boundaries Our Agent Workflow Engine thinks in LLM, sees in visuals, and builds your logic in Python — no limits, just intelligence.

🔄 Unlimited Workflow Creation
Design complex custom workflows without structural constraints. Handle unique business logic, branching, loops, and conditions through an intuitive interface.
⚡ Real-Time Streaming Execution (SSE)
Observe execution results streamed live – eliminate waiting times entirely.
👥 Human-in-the-Loop Integration
Integrate user input at critical decision points to review, adjust, or direct model reasoning. Enables collaborative AI workflows with dynamic human oversight.
👁️ Visual-First Multimodal RAG
Features LAYRA’s proprietary pure visual embedding system, delivering lossless document understanding across 50+ formats (PDF, DOCX, XLSX, PPTX, etc.). The AI actively "sees" your content.
🧠 Chat Memory & MCP Integration
- MCP Integration Access and interact with live, evolving information beyond native context windows – enhancing adaptability for long-term tasks.
- ChatFlow Memory Maintain contextual continuity through chat memory, enabling personalized interactions and intelligent workflow evolution.
🐍 Full-Stack Python Control
- Drive logic with arbitrary Python expressions – conditions, loops, and more
- Execute unrestricted Python code in nodes (HTTP, AI calls, math, etc.)
- Sandboxed environments with secure pip installs and persistent runtime snapshots
🎨 Flexible Multimodal I/O
Process and generate text, images, or hybrid outputs – ideal for cross-modal applications.
🔧 Advanced Development Suite
- Breakpoint Debugging: Inspect workflow states mid-execution
- Reusable Components: Import/export workflows and save custom nodes
- Nested Logic: Construct deeply dynamic task chains with loops and conditionals
🧩 Intelligent Data Utilities
- Extract variables from LLM outputs
- Parse JSON dynamically
- Template rendering engine
  Essential tools for advanced AI reasoning and automation.

👁️ Visual RAG Engine: Beyond Text, Beyond OCR

Forget tokenization. Forget layout loss.
With pure visual embeddings, LAYRA understands documents like a human — page by page, structure and all.

LAYRA uses next-generation Retrieval-Augmented Generation (RAG) technology powered by pure visual embeddings. It treats documents not as sequences of tokens but as visually structured artifacts — preserving layout, semantics, and graphical elements like tables, figures, and charts.

🚀 Latest Updates

(2025.6.2) Workflow Engine Now Available:

Breakpoint Debugging: Debug workflows interactively with pause/resume functionality.
Unrestricted Python Customization: Execute arbitrary Python code, including external pip dependency installation, HTTP requests via requests, and advanced logic.
Nested Loops & Python-Powered Conditions: Build complex workflows with loop nesting and Python-based conditional logic.
LLM Integration:
- Automatic JSON output parsing for structured responses.
- Persistent conversation memory across nodes.
- File uploads and knowledge-base retrieve with multi-modal RAG supporting 50+ formats (PDF, DOCX, XLSX, PPTX, etc.).

(2025.4.6) First Trial Version Now Available:
The first testable version of LAYRA has been released! Users can now upload PDF documents, ask questions, and receive layout-aware answers. We’re excited to see how this feature can help with real-world document understanding.

Current Features:
- PDF batch upload and parsing functionality
- Visual-first retrieval-augmented generation (RAG) for querying document content
- Backend fully optimized for scalable data flow with FastAPI, Milvus, Redis, MongoDB, and MinIO

Stay tuned for future updates and feature releases!

🖼️ Screenshots

LAYRA's web design consistently adheres to a minimalist philosophy, making it more accessible to new users.

Explore LAYRA's powerful interface and capabilities through these visuals:

Homepage - Your Gateway to LAYRA
Knowledge Base - Centralized Document Hub
Interactive Dialogue - Layout-Preserving Answers
Workflow Builder - Drag-and-Drop Agent Creation
Workflow Builder - MCP Example

🧠 System Architecture

LAYRA’s pipeline is designed for async-first, visual-native, and scalable document retrieval and generation.

🔍 Query Flow

The query goes through embedding → vector retrieval → anser generation:

📤 Upload & Indexing Flow

PDFs are parsed into images and embedded visually via ColQwen2.5, with metadata and files stored in appropriate databases:

📤 Execute Workflow (Chatflow)

The workflow execution follows an event-driven, stateful debugging pattern with granular control:

🔄 Execution Flow

Trigger & Debug Control
- Web UI submits workflow with configurable breakpoints for real-time inspection
- Backend validates workflow DAG before executing codes
Asynchronous Orchestration
- Kafka checks predefined breakpoints and triggers pause notifications
- Scanner performs AST-based code analysis with vulnerability detection
Secure Execution
- Sandbox spins up ephemeral containers with file system isolation
- Runtime state snapshots persisted to Redis/MongoDB for recovery
Observability
- Execution metrics streamed via Server-Sent Events (SSE)
- Users inject test inputs/resume execution through debug consoles

🧰 Tech Stack

Frontend:

Next.js, TypeScript, TailwindCSS, Zustand, xyflow

Backend & Infrastructure:

FastAPI, Kafka, Redis, MySQL, MongoDB, MinIO, Milvus, Docker

Models & RAG:

Embedding: colqwen2.5-v0.2
LLM Serving: Qwen2.5-VL series (or any OpenAI-compatible model)

⚙️ Deployment

📋 Prerequisites

Before starting, ensure your system meets these requirements:

Docker and Docker Compose installed
NVIDIA Container Toolkit configured (for GPU acceleration)

⚙️ Installation Steps

1. Configure Environment Variables

# Clone the repository
git clone https://github.com/liweiphys/layra.git
cd layra

# Edit configuration file (modify server IP/parameters as needed)
vim .env

# Key configuration options include:
# - SERVER_IP (public server IP)
# - MODEL_BASE_URL (model download source)

2. Build and Start Service

# Initial startup will download ~15GB model weights (be patient)
docker compose up -d --build

# Monitor logs in real-time (replace <container_name> with actual name)
docker compose logs -f <container_name>

Note: If you encounter issues with docker compose, try using docker-compose (with the dash) instead. Also, ensure that you're using Docker Compose v2, as older versions may not support all features. You can check your version with docker compose version or docker-compose version.

🛠️ Service Management Commands

# Stop services (preserves data and configurations)
docker compose down

# Full cleanup (deletes databases, model weights and persistent data)
docker compose down -v

# Restart services
docker compose start

⚠️ Important Notes

Initial model download may take significant time (~15GB). Monitor progress:
```
docker compose logs -f model-weights-init
```
Verify NVIDIA toolkit installation:
```
nvidia-container-toolkit --version
```
For network issues:
- Manually download model weights
- Copy to Docker volume: （typically at） /var/lib/docker/volumes/layra_model_weights/_data/
- Create empty complete.layra file in both:
  - colqwen2.5-base folder
  - colqwen2.5-v0.2 folder
- 🚨 Critical: Verify downloaded weights integrity!

🔑 Key Details

docker compose down -v flag warning: Permanently deletes all databases and models
After modifying .env: Rebuild with docker compose up --build
GPU requirements:
- Latest NVIDIA drivers
- Working nvidia-container-toolkit

Monitoring tools:

# Container status
docker compose ps -a

# Resource usage
docker stats

🧪 Technical Note: All components run exclusively via Docker containers.

🎉 Enjoy Your Deployment!

Now that everything is running smoothly, happy building with Layra! 🚀✨

▶️ Future Deployment Options

In the future, we will support multiple deployment methods including Kubernetes (K8s), and other environments. More details will be provided when these deployment options are available.

📦 Roadmap

Short-term:

Add Chinese Language Support (coming soon)

Long-term:

Our evolving roadmap adapts to user needs and AI breakthroughs. New technologies and features will be deployed continuously.

🤝 Contributing

Contributions are welcome! Feel free to open an issue or pull request if you’d like to contribute.
We are in the process of creating a CONTRIBUTING.md file, which will provide guidelines for code contributions, issue reporting, and best practices. Stay tuned!

📫 Contact

liweiphys
📧 [email protected]
🐙 github.com/liweiphys/layra
📺 bilibili: Biggestbiaoge
🔍 微信公众号：LAYRA 项目
💼 Exploring Impactful Opportunities - Feel Free To Contact Me!

🌟 Star History

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for more details.

Endlessly Customizable Agent Workflow Engine - Code Without Limits, Build Without Boundaries.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.vscode		.vscode
assets		assets
backend		backend
frontend		frontend
init-db		init-db
init-kafka		init-kafka
model-server		model-server
sandbox		sandbox
.env		.env
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
docker-compose.yml		docker-compose.yml

License

liweiphys/layra

Folders and files

Latest commit

History

Repository files navigation

🌌 LAYRA: The Visual-First AI Agent Engine That Sees, Understands & Acts

📚 Table of Contents

🚀 Quick Start

📋 Prerequisites

⚙️ Installation Steps

1. Configure Environment Variables

2. Build and Start Service

🎉 Enjoy Your Deployment!

❓ Why LAYRA?

🚀 Beyond RAG: The Power of Visual-First Workflows

⚙️ Advanced Workflow Capabilities

🔍 Visual RAG: The Seeing Engine

⚡️ Core Superpowers

🔥 The Agent Workflow Engine: Infinite Execution Intelligence

👁️ Visual RAG Engine: Beyond Text, Beyond OCR

🚀 Latest Updates

🖼️ Screenshots

LAYRA's web design consistently adheres to a minimalist philosophy, making it more accessible to new users.

🧠 System Architecture

🔍 Query Flow

📤 Upload & Indexing Flow

📤 Execute Workflow (Chatflow)

🔄 Execution Flow

🧰 Tech Stack

⚙️ Deployment

📋 Prerequisites

⚙️ Installation Steps

1. Configure Environment Variables

2. Build and Start Service

🛠️ Service Management Commands

⚠️ Important Notes

🔑 Key Details

🎉 Enjoy Your Deployment!

▶️ Future Deployment Options

📦 Roadmap

🤝 Contributing

📫 Contact

🌟 Star History

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages