π― Learning Objectives
Understand how OCR can extract both text and image annotations from PDF files.
Store extracted images in Supabase Storage and manage their public URLs.
Create and use a Postgres metadata table to link image IDs, annotations, and URLs.
Chunk and embed OCR text into a Supabase Vector Store for semantic search.
Build an AI Agent workflow that retrieves both text chunks and related image annotations.
Use GPT-4o as the Agent brain to combine information across text + Reranker for more accurate answers.
Deliver a complete chat-based assistant that can reason over documents with both text and annotated images.
Register to https://mistral.ai/ and get your API Key
Uploading PDF file to Mistral OCR (you can check official documentation here.)
Getting the signed URL
Getting the OCR result (without the image annotation)
Adding annotations
the code below is my sample curl that will add a basic annotation of the image
you can read the whole documentation here to customize the annotation as needed
Convert all base64 image data from OCR to file and upload to Supabase storage
Uploading Image to Supabase (you can check the documentation here.)
Add a PostGres table that will handle the image_id,Β image_url, and image_annotation
You can check how to create PostGres table in our Module 1 Intermediate course
For the image URL the standard format of supabase is:Β
https://YOUR_PROJECT_ID.supabase.co/storage/v1/object/public/BUCKET_NAME/PATH/FILE_NAME
Create a Vetor Store that will handle all the markdown data from OCR
You can check how to create vector store in our Module 2 Intermediate course
To create a sample AI Agent to retreive data with pictures we only need a basic workflow:
Chat Message node
AI Agent
Preferred chat model: gpt-4O and above
Simple memory for testing
Tools
Think tool
Vector Store with Reranker: to retreive the vectorized data
PostGres Table tool: to retreive the image annotation data
You can always review the Module 1, Module 2 and Module 3 to set the tools needed.