GLM-4.6V didn’t feel like an upgrade when I first tried it.
It felt like someone quietly rewired the way we interact with information altogether.
Vision, documents, video, reasoning—everything running through one engine.
Here’s what surprised me the most (with the exact prompts I used): 🧵👇
✓ See Anything, Find Anything
I tossed in a totally chaotic BTS photo—people, signs everywhere—and asked:
“Extract information who play role in "Dreamers" Music video and also provide information of next MV”
Honestly expected a partial mess.
Instead, it detected, boxed, and exported each one in seconds. No confusion, no second tries.
✓Turning Photos Into Clean, Structured Data
A messy handwritten note? It turned it into a clean table.
A crumpled invoice? It gave me an table with pe…
GLM-4.6V didn’t feel like an upgrade when I first tried it.
It felt like someone quietly rewired the way we interact with information altogether.
Vision, documents, video, reasoning—everything running through one engine.
Here’s what surprised me the most (with the exact prompts I used): 🧵👇
✓ See Anything, Find Anything
I tossed in a totally chaotic BTS photo—people, signs everywhere—and asked:
“Extract information who play role in "Dreamers" Music video and also provide information of next MV”
Honestly expected a partial mess.
Instead, it detected, boxed, and exported each one in seconds. No confusion, no second tries.
✓Turning Photos Into Clean, Structured Data
A messy handwritten note? It turned it into a clean table.
A crumpled invoice? It gave me an table with perfect row/column logic.
Even faded stamps and seals were recognized. This is the first time OCR didn’t feel like OCR.