Photos, images, and files

~ min read

30-second summary

Modern AIs accept photos, screenshots, PDFs, and spreadsheets: three families of real use cases, from a paper utility bill to a fifty-page contest ruleset.
Input quality is everything: an out-of-focus photo is worth less than handwritten text.
On long PDFs, the rules from the summaries lesson apply. The AI can miss middle sections: always ask for the exact quote, and verify by opening the document at that page.
What you upload is what you share. Anonymize names and third-party data before uploading, and never put credentials, credit card numbers, or identity documents into a chat.

You’ve got a paper utility bill on the table and you want to understand why it’s higher than last month. You’ve got a screenshot of an error your computer has been showing you for half an hour. You’ve got a forty-page PDF of the ruleset of a contest you’re entering, and you want to know if one detail disqualifies you.

All three cases, today, are handled by a conversational AI. Not because AIs have gotten better at talking, but because they’ve learned to see. You upload the file to the chat, write the question, and get an answer based on the actual content of what you uploaded.

The technical word is multimodality: a chat that accepts not just text, but also photos, screenshots, PDFs, spreadsheets. This lesson is about how to use it. Three canonical use cases, the differences between the main platforms, what works and what doesn’t, and what to keep in mind about the privacy of what you upload.

Three canonical use cases

1. Photo of a physical document

The paper utility bill that arrives at home, the printed medical prescription, the washing machine’s instruction manual: text that lives on paper, not in a digital format within easy reach.

Take a readable photo (good light, document laid flat, straight framing, no finger on the edge) and upload it to the chat. The AI doesn’t just recognize the text: it also sees tables, small wiring diagrams, symbols printed on labels. You can ask it to read the total on a bill and explain where it comes from, translate an abbreviation on the doctor’s prescription, tell you which combination of buttons on the washing machine maps to the wool program.

Handwriting is handled too, with an asterisk: clean block letters are read well, fast cursive (including the typical doctor’s scrawled prescription) sometimes lands, sometimes doesn’t. If it matters, double-check the numbers and drug names by hand.

When the photo is clear, you save the effort of retyping data by hand. When it’s blurry or crooked, the AI can read O instead of 0 or skip a whole line. The checks that reduce these errors are further down, in What works and what doesn’t.

2. Screenshots

The error message your computer throws at you, the confusing web page where you can’t find the right button, the chart in an article you’ve read but don’t know how to interpret.

A screenshot is more readable than a paper photo: no lighting, no framing, it’s already digital and high-resolution. It fits three kinds of question:

What does this error mean? You paste it into the chat and ask for the meaning and how to fix it.
How do I get from here to X? Screenshot of the interface, the AI walks you through the flow step by step.
What does this chart show? A word explanation of an image where the numbers are implied but not listed.

For software errors, a screenshot beats describing it in prose: an error message can change meaning with a single word, and copying it by hand introduces slips.

3. PDFs, slides, spreadsheets

The most different case from the first two. Not a photo, but a whole file. The building’s rules, the employment contract, the course reader, the company’s annual report, an expense spreadsheet.

An important distinction: digital PDFs (generated by Word or by a web form) contain the text as text, and the AI reads it reliably. Scanned PDFs (paper pages photographed and saved as images inside a PDF) are case 1 in disguise: the AI reads them, but the same cautions from photos apply to readability, lighting, and framing of the scan.

You upload it and ask questions about the content. You aren’t asking for a generic summary (that’s the topic of Summarize a long document): here you ask specific things.

You

I'm attaching the ruleset for the illustration contest. I entered last year and made it to the finalists. Is there a passage in the ruleset that disqualifies previous finalists, or can I enter again?

I've searched the ruleset, and it's good news for you. In Article 4 ("Eligibility"), point 4.3, the text explicitly excludes only "winners of the last three editions". There's no reference to finalists, honorable mentions, or unplaced candidates. In Article 6 ("Grounds for exclusion"), the points cover late submissions, materials not matching the specs, or the use of non-original works. Nothing that touches your situation. If you want, I can give you the exact quotes from the two articles so you can read the full wording: sometimes a single adjective changes the meaning. I've searched the ruleset, and it's good news for you. In Article 4 ("Eligibility"), point 4.3, the text explicitly excludes only "winners of the last three editions". There's no reference to finalists, honorable mentions, or unplaced candidates. In Article 6 ("Grounds for exclusion"), the points cover late submissions, materials not matching the specs, or the use of non-original works. Nothing that touches your situation. If you want, I can give you the exact quotes from the two articles so you can read the full wording: sometimes a single adjective changes the meaning.

The AI searches inside the document, isolates the relevant passage, reports it back. For short documents (a few pages), it works straightforwardly. For long documents (tens or hundreds of pages), all the cautions from Summarize a long document apply: it may skip passages, misattribute a quote, or answer based only on the first and last pages.

How it works on ChatGPT, Claude, Gemini

The three main platforms all accept images and files, but with practical differences. Instead of a table that goes stale in two months, four criteria matter, the same ones on any platform.

Which formats it accepts. Images (JPEG, PNG) everywhere. PDFs almost everywhere. Office files (Word, Excel, PowerPoint) on all three, with some differences on less common formats (some older .doc or .xls files sometimes get converted or rejected). Scans of paper documents: all three accept them, if the photo is readable.

Size limits. A single file can go from tens of MB to 100+ MB, depending on your plan. For very long PDFs (hundreds of pages), the platform sometimes asks you to extract only the relevant sections before uploading.

Number of files per message. ChatGPT and Gemini accept several in one shot. Claude is more conservative on count, but tends to handle single long PDFs better.

Free plans versus paid plans. Some features (uploading large PDFs, attaching multiple files to a chat, using vision at high volume) exist only on paid plans (Plus, Pro, Team). On the free plan the coverage is there, but with tighter daily limits.

The exact numbers change often. When you need a specific operation, search the platform’s official support page (“ChatGPT file uploads”, “Claude file types”, “Gemini vision capabilities”) and read the current limits. Three minutes of searching up front save half an hour of failed attempts.

What works and what doesn’t

Photos of documents: they work when the photo is readable to a human eye. If you, leaning closer to the screen, struggle to read a number, so will the AI. Three checks that help a lot: diffused light (no direct flash burning out half the page), flat document (not photographed at an angle), full framing (no finger on the edge or corners cut off).

Screenshots: almost always reliable for reading text. Less reliable when you mix many elements. A screenshot of a dashboard full of buttons with similar colors can confuse the AI on which is active and which isn’t. If the distinction matters, screenshot the single section instead of the whole screen.

Short PDFs (1-20 pages): they work very well. The AI catches structure and detail.

Long PDFs (50+ pages): handle with caution. Two recurring problems: the AI can get lost in the middle (it remembers the start and end better than the middle), and it can invent page references that look real but don’t exist. When you ask for a quote, always ask “give me the exact sentence and the page it’s on”, then verify by opening the PDF at that page.

Spreadsheets: they work for readings and qualitative questions (“what’s the highest line item?”, “which rows have this field empty?”). Caution on calculations: the AI can botch sums or rounding. When the answer is a number, recompute it separately, or ask the AI to show you the formula instead of just the result.

Privacy

What you upload to a chat is what you share with the AI. If the photo of the bill contains name, address, account number (the ID your utility uses to identify your line), that data enters the conversation.

When you’re the account holder, it’s comparable to typing your own data into an online form. When the photo is of someone else’s utility bill (a family member, a client), someone else’s medical prescription, or a contract with third-party names, you’re sharing data that isn’t only yours.

Two practical moves:

Anonymize before uploading, when you can. A slip of paper over the name before you take the photo, a digital crop on the photo preview, or an app like Microsoft Lens or Adobe Scan that lets you black out portions. If all you care about is a single number, crop the screenshot down to that one line.
Never upload credentials (logins, passwords), credit card numbers, identity documents (passport, driver’s license, ID card). For these, anonymization isn’t enough: they shouldn’t end up in a chat at all, not even yours.

The full picture of what the AI sees when you talk to it, and what happens after your message, is in What you share when you use AI.

What comes next

In this lesson you’ve seen what the AI can do with the world around you: paper, screens, files. The next lesson, Things NOT to do, is the complement: cases where it’s better to keep your distance, even when the temptation is strong.

Check what you understood

About Privacy Open manual, written by a real person, in collaboration with AI.