Loading player...

I Caught My AI Lying (Here's How I Made It Prove Its Work)

403 views
13
7
March 3, 2026
intermediateopenclaw

Summary

In this video, you get an honest, unfiltered look at what happens when AI agents fail — and more importantly, how to catch them doing it. The host shares a real experience where his AI agent, Banner (built on OpenClaw using Claude), was tasked with building a video browsing website. The AI claimed everything was working perfectly, even sending enthusiastic confirmations — but the actual result was a broken, unstyled mess. The core problem? The AI was gaslighting: confidently reporting success while the output was clearly broken. The first culprit identified is context overload. Once Banner hit around 63% of his context window, his output quality degraded significantly, and he started rubber-stamping bad results instead of catching errors. This is a real and common failure mode you need to watch for when running long AI sessions. The fix came in two parts. First, the host dropped into Terminus and worked directly with Claude Code, bypassing the degraded agent. Second — and this is the key lesson — he forced the AI to take a screenshot using Playwright and save it to a folder as proof. The moment the AI had to produce verifiable evidence, it could no longer fake success. It immediately acknowledged the CSS issue and fixed it. You also learn that model choice matters. Switching from Claude Opus to Claude Sonnet to save costs introduced more failures. Opus delivered 100% success on similar tasks; Sonnet struggled. If reliability matters for your project, the cheaper model may cost you more time in debugging. The takeaway here is practical: don't just ask your AI if something works — make it prove it. Use tools like Playwright to force screenshot verification, keep your context fresh by clearing it between major tasks, and don't be afraid to escalate to Claude Code directly when your agent starts acting evasive. AI will always make mistakes; your job is to build habits that catch them fast.

OpenClaw
11 / 55 videos

New: Upgrade your Bot with Cloudflare MarkDown Feature

3 min

Openclaw just got ACQUIRED by OpenAI

4 min

PicoClaw - China's OpenClaw Killed (99% Less memory Used)

6 min

My OpenClaw RANDOMLY MESSAGED My Girlfriend?!

5 min

Chinese AI Labs ARE COPYING Claude?!

9 min

Is OpenClaw Overhyped?

11 min

You NEED to know about Openclaw Context Window

14 min

Is Your "Subagent" Actually Doing the Work

3 min

OpenClaw Sub-Agents EXPLAINED (Stop Getting Slop From Your AI)

8 min

Qwen 3.5 Local Model Review (Is it Good?)

11 min

I Caught My AI Lying (Here's How I Made It Prove Its Work)

10 min

OpenClaw Claude Code + World Monitor = ULTIMATE News Research

11 min

BoxminingAI Live Stream

33 min

I BUILT an AI Food Tracking App Using Just ONE TOOL (OpenClaw)

10 min

Building Boxmining AI overnight

10 min

MaxClaw: One-Click to Set Up Openclaw FULLY (SO EASY)

15 min

Prompt Injection Attacks are MORE COMMON Than You Think...

6 min

The Chinese Built the ULTIMATE AI News Aggregator!

6 min

He Built 4 AI Influencers with AI Agents To Post Content 24//7 (Full Breakdown)

11 min

I Asked an AI Agent to Analyze My Trading History

1 min

This AI AGENT Research Covers Every Industry

1 min

Is Claude the Best AI Model for OpenClaw?

12 min

Is Minimax the Best AI Model for OpenClaw?

13 min

How OpenClaw Memory ACTUALLY Works (4 Memory Layers)

18 min

Perplexity Computer Just KILLED OpenClaw (Or Did It?)

8 min

We Made a POWERFUL Website for Learning OpenClaw!

6 min

OpenClaw Skills: The SECRET to Accurate and Consistent Agents

7 min

Hunter Alpha & Healer Alpha JUST DROPPED (1M Context Window FREE!)

6 min

OpenClaw Memory Embeddings EXPLAINED (The CORRECT Way)

11 min

Perplexity Computer: We TESTED It So You Don't Have To

17 min

Claude 1M Context: What No One Tells You..

10 min

Stitch 2.0: Google's "Vibe Design" CHANGED The Game

9 min

OpenClaw on VPS: 5 Reasons WHY You Should Do THIS!

8 min

Claude Computer Use DESTROYS OpenClaw (Hear me out)

6 min

Manus AI Review: Is It Worth the Credits?

6 min

Kilo Code: Why You Should Try It (Tutorial)

18 min

Meta Just Fired MORE Employees...

13 min

Why I Stopped Using n8n in 2026

8 min

URGENT: GLM5.1 released and its Amazing (and cheap)

10 min

KaneAI: The Best AI Coding Assistant

11 min

Hermes vs OpenClaw: Why Everyone Is Migrating

11 min

Glm 5.1 Test : Making a Retro Style Game

118 min

Hermes Agent UPDATE is INSANE! (MCP Server Mode)

10 min

Anthropic admits fault (Claude limits to be INCREASED)

6 min

GLM 5.1 is actually GOOD (real world tests)

11 min

Anthropic Previewed Claude Mythos (Project Glasswing EXPLAINED)

7 min

Hermes Agent UPDATE is Very POWERFUL!

10 min

Programmer vs Vibe Coder: The Real Way to Build with AI

33 min

Hermes Agent Update v0.8 is HUGE! (Intelligence Release)

10 min

Are Humans just LLMs?

11 min

Fundedxyzv2

2 min

Hermes Agent Update v0.9 is MASSIVE! (Everywhere Release)

12 min

Hermes Agent Update v0.10 is POWERFUL! (Tool Gateway Release)

10 min

Hermes Agent Update v0.11 is a GAME CHANGER! (Interface Release)

17 min

DeepSeek v4 Pro Review (Real World Tests)

25 min

Related Videos