You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to define and implement a minimal but extensible protocol for representing GUI interaction sequences. This protocol will unify the visual state, action metadata, and interaction history into a single structured format—enabling consistent logging, dataset creation, LLM training, planning, and replay.
This format serves as the foundation for downstream systems including the Action Graph (#10), ModelDrivenVisualState, and planner/LLM interfaces.
🧠 Background
OmniMCP currently:
Captures visual state via OmniParser
Plans actions using an LLM
Executes actions via InputController
But there is no standardized, reusable format for representing:
What was seen
What was done
Why it was done (optional)
This protocol fills that gap—similar to what OpenAI Operator, Adept’s AWL, and WebArena’s annotated programs use.
High. This is foundational to planning, replay, dataset creation, and eventual fine-tuning. Enables reuse of traces across components and simplifies future evaluation and debugging.
The text was updated successfully, but these errors were encountered:
🧩 Description
We need to define and implement a minimal but extensible protocol for representing GUI interaction sequences. This protocol will unify the visual state, action metadata, and interaction history into a single structured format—enabling consistent logging, dataset creation, LLM training, planning, and replay.
This format serves as the foundation for downstream systems including the Action Graph (#10), ModelDrivenVisualState, and planner/LLM interfaces.
🧠 Background
OmniMCP currently:
But there is no standardized, reusable format for representing:
This protocol fills that gap—similar to what OpenAI Operator, Adept’s AWL, and WebArena’s annotated programs use.
📦 Proposed Data Model (v0.1)
Using
pydantic
for type safety and validation.🧪 Examples
✅ Acceptance Criteria
pydantic
models with JSON schema exportprotocol/
directoryAgentExecutor
logging pipeline (optional, stub OK)📚 References
📌 Priority
High. This is foundational to planning, replay, dataset creation, and eventual fine-tuning. Enables reuse of traces across components and simplifies future evaluation and debugging.
The text was updated successfully, but these errors were encountered: