Perceive
CUA takes a screenshot of the current browser state and processes it as raw pixels using GPT-5.4's native vision and computer use capabilities.
Reason
The model analyzes the screenshot, considers the task goal and current progress, and plans the next action using chain of thought reasoning.
Act
CUA executes the planned action through a virtual mouse and keyboard: clicking buttons, typing text, scrolling, or pressing shortcuts.
Verify
A new screenshot is taken after the action. CUA checks whether the action succeeded and whether the task is progressing toward completion.
Repeat or Complete
If the task is not done, the loop continues. If a sensitive action is detected, takeover mode activates. If the task is complete, the agent reports results.
0%
OSWorld Benchmark Score
0%
Prompt Injection Recall
0 layers
Safety Architecture