Hey friends,
I didn’t think I’d use this feature much at first.
But over the past few months, it’s quietly become one of the most useful things AI can do for me.
Not generate content. Not brainstorm ideas.
But actually see what’s on my screen and help me make sense of it.
I’m not talking about anything futuristic. I mean something simple, practical, and already here:
Drop a screenshot. Ask a question. Get the answer you need.
That’s the entire learning curve.
From turning complex diagrams into easy explanations, to summarising a whiteboard, to pulling colours from a photo, models like ChatGPT, Claude, and Gemini are doing things I didn’t realize were even possible.
Let me show you exactly how.
(No audio overview this week. But reply to this email if that's something you want me to bring back)
đź§ What Is AI Vision (In Plain English)?
AI vision = your AI assistant can now see and read.
It’s powered by multimodal models that process both images and text at the same time.
Meaning: you can drop in a dashboard, a photo, a sketch, a diagram and the AI will analyze it, answer questions about it, and even take actions with that data.
No setup. No code. No special tools.
If you’ve got a camera or a screenshot button, you’re ready.
Here's some of my favourite use cases.
đź› 6 Use Cases You Can Try Right Now (Zero Experience Needed)
Let’s make this real. Here’s what you can do today and how I’ve actually used it:
1. Screenshot → Instant Summary + Insights
Take a quick screenshot of your social media analytics, sales dashboard, or product metrics
Drop it into ChatGPT 4o.
Prompt:
“Summarise what’s happening in this data and tell me one actionable takeaway I should focus on this week.”
(I use this for my Threads insights)
It’s the fastest way to go from overwhelmed by numbers to knowing your next move.
2. Inspiration Image → Brand Color Palette
Found an aesthetic image you like on Instagram or Etsy?
Upload it to ChatGPT.
Prompt:
“Give me the hex colour values from this image.”
It nails the colour selection and even picks up on the not so visible colours. Surprises me every time how accurate it is.
3. Whiteboard Snapshot → Actionable Meeting Minutes
After a brainstorm session, snap a photo of your whiteboard or sticky notes.
Drop it into Google Gemini.
Prompt:
“Summarize this into 5 key takeaways and next steps.”
What used to take 30 minutes of typing, done in 10 seconds.
4. Complex Diagram → Simple Explanation
Lets say you find a chart or diagram in a report you're reading.
Upload it to ChatGPT.
Prompt:
“Explain this in plain English, in 5 bullet points, like you would to a colleague.”
I’ve used this for complex graphs, dashboards, historical property growth charts. It’s a life saver!!
5. Screenshot of an Error → Instant Fix
When you hit a weird error on your website, app, or even your smart home device, simply snap a screenshot and upload it to ChatGPT or Gemini.
Prompt:
“What does this error mean, and how can I fix it?”
Instantly, you’ve got step-by-step troubleshooting. No more trawling forums for hours.
6. Everyday Photos → Personal Help (Travel & Home Life)
For those times when you’re dealing with a leaky tap, a mystery tool, or a DIY headache, just take a photo and let AI guide you.
Prompt:
“What am I looking at, and how can I fix or use this?”
Traveling overseas and facing an unfamiliar sign or menu?
Upload the photo and prompt:
“Translate this and tell me what it means. Suggest what I should do next.”
These models are shockingly good at understanding nearly anything you show them—making life’s random little moments a lot less stressful.
🤖 AI Vision = The Gateway to Real Agents
Here’s why this matters more than it seems:
The moment you trust AI to see, you unlock a new kind of delegation.
Browser agents already use visual inputs to click buttons, copy values, and complete form tasks.
Customer service agents process screenshots to resolve tech issues instantly, no back-and-forth needed.
This is where agents start to feel less like “bots” and more like helpful collaborators.
And it all starts by showing them something and asking, “What do you see?”
🚀 Try This Today
Grab a photo of your latest hand written notes. A whiteboard from your last call. Even a scribbled to-do list.
Drop it into ChatGPT.
Prompt:
“Help me turn this into a plan and a mind-map”
No code. No templates. Just one image and one prompt.
It’s the simplest way to turn visual chaos into structure, and get one step closer to letting AI actually assist your work.
🫡 Final Thought
The best tools don’t try to out-think you.
They help you see your ideas more clearly and act on them faster.
Letting AI understand images isn’t about shortcuts or hype; it’s about making your daily work just a little bit easier.
From notes on your desk to complex diagrams, it’s quietly taking the busywork out of your day, so you can focus on what actually matters.
Give it a try, see what changes for you, and remember: sometimes, progress looks like one less thing to worry about.
To seeing (and understanding) more,
​Nahid