Lakwatseros - 🖼️ Module 4: From PDFs to Smart AI Agents with OCR and Image Annotations

🖼️ Module 4: From PDFs to Smart AI Agents with OCR and Image Annotations

🎯 Learning Objectives

Understand how OCR can extract both text and image annotations from PDF files.
Store extracted images in Supabase Storage and manage their public URLs.
Create and use a Postgres metadata table to link image IDs, annotations, and URLs.
Chunk and embed OCR text into a Supabase Vector Store for semantic search.
Build an AI Agent workflow that retrieves both text chunks and related image annotations.
Use GPT-4o as the Agent brain to combine information across text + Reranker for more accurate answers.
Deliver a complete chat-based assistant that can reason over documents with both text and annotated images.

HOME

BEGINNER COURSE

INTERMEDIATE COURSE

ADVANCED COURSE

📥 Download the workflow here for free (or you can buy me a coffee 😁)

Register to https://mistral.ai/ and get your API Key
- Uploading PDF file to Mistral OCR (you can check official documentation here.)

Adding annotations
- the code below is my sample curl that will add a basic annotation of the image
- you can read the whole documentation here to customize the annotation as needed

Add a PostGres table that will handle the image_id, image_url, and image_annotation
- You can check how to create PostGres table in our Module 1 Intermediate course
- For the image URL the standard format of supabase is:
  - https://YOUR_PROJECT_ID.supabase.co/storage/v1/object/public/BUCKET_NAME/PATH/FILE_NAME

Create a Vetor Store that will handle all the markdown data from OCR
- You can check how to create vector store in our Module 2 Intermediate course

To create a sample AI Agent to retreive data with pictures we only need a basic workflow:
- Chat Message node
- AI Agent
  - Preferred chat model: gpt-4O and above
  - Simple memory for testing
- Tools
  - Think tool
  - Vector Store with Reranker: to retreive the vectorized data
  - PostGres Table tool: to retreive the image annotation data
You can always review the Module 1, Module 2 and Module 3 to set the tools needed.

Page updated

Google Sites

Report abuse