Build a receipt reader with Docuglean AI in under 10 minutes! ๐Ÿ“œ - Featured Image

Build a receipt reader with Docuglean AI in under 10 minutes! ๐Ÿ“œ

Ever wondered how you can build an AI receipt reader in under 10 minutes? In this tutorial, we'll explore how to extract structured data from PDF receipts using the Docuglean SDK. I'll guide you step-by-step through the entire setup process - it's surprisingly easy and only requires basic knowledge!

๐ŸŒŸ Visit and star Docuglean SDK

1- Prerequisites:

In order to be able to use the Docuglean AI SDK you need a little bit of JavaScript knowledge, you can review some JavaScript here , other than that youโ€™ll need Node.js and NPM installed in your machine, to check if you have them already, in Windows, click the Windows key + R on your keyboard, and type cmd, then type the commands:

node -vnpm -v

If you are able to see the versions of node and npm, good! Youโ€™re ready!, instead, you can simply go here and install the node.js windows installer (msi), after the setup is complete, check the versions again in cmd, and you will see that you are ready!

2- Installing the Docuglean SDK

Let's create your receipt reader project:

  1. Create a project directory:

    mkdir my-receipt-extractor
    cd my-receipt-extractor
    
  2. Install Docuglean:

    npm i docuglean
    

๐ŸŽ‰ Perfect! Your directory now has all the Docuglean AI features ready to use!

Getting Your API Key

You'll need an API key from one of these supported providers:

OpenAI: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o, o1-mini, o1, o3, o4-mini

Mistral: mistral-ocr-latest

โš ๏ธ Important: Keep your API key secure and never share it publicly!

More providers coming soon! Check the Docuglean repository for updates.

3- Creating a Zod Schema for receipts

The Zod schema acts as your blueprint - it tells Docuglean AI exactly what structure and data types to extract from receipts, ensuring consistent, predictable results every time.

Why schemas matter:

  • Predictable Output: Guarantees data comes back in your expected format

  • Type Safety: Ensures fields are the correct type (dates as strings, totals as numbers, etc.)

  • AI Guidance: Helps the AI understand exactly what information to extract

Here's a comprehensive Zod schema for receipts:

import { z } from 'zod';

// Define the structure for individual receipt items
const ReceiptItemSchema = z.object({
  name: z.string().describe('The name of the item purchased.'),
  price: z.number().describe('The price of this specific item.')
});

// Define the complete receipt structure
const ReceiptSchema = z.object({
  date: z.string().describe('The date of the receipt in YYYY-MM-DD format.'),
  total: z.number().describe('The grand total amount shown on the receipt.'),
  currency: z.string().optional().describe('The currency symbol or code (e.g., "$", "EUR").'),
  vendorName: z.string().optional().describe('The name of the store or business.'),
  items: z.array(ReceiptItemSchema).describe('A list of all individual items purchased, with their names and prices.')
});

4- Writing the extraction script

This script is the heart of your application - it combines your receipt, API key, and Zod schema to tell Docuglean AI exactly what to do.

import { extract } from 'docuglean';
import { z } from 'zod';
import * as dotenv from 'dotenv';

dotenv.config(); // Load environment variables securely

// Import your Zod schema (from section 3)
const ReceiptItemSchema = z.object({
  name: z.string().describe('The name of the item purchased.'),
  price: z.number().describe('The price of this specific item.')
});

const ReceiptSchema = z.object({
  date: z.string().describe('The date of the receipt in YYYY-MM-DD format.'),
  total: z.number().describe('The grand total amount shown on the receipt.'),
  currency: z.string().optional().describe('The currency symbol or code (e.g., "$", "EUR").'),
  vendorName: z.string().optional().describe('The name of the store or business.'),
  items: z.array(ReceiptItemSchema).describe('A list of all individual items purchased, with their names and prices.')
});

async function runReceiptExtraction() {
  const apiKey = process.env.OPENAI_API_KEY; // Set this in your .env file
  const receiptFilePath = './receipt_example.pdf'; // Ensure this file exists

  if (!apiKey) {
    console.error("โŒ Error: API key not found. Please set OPENAI_API_KEY in your .env file.");
    return;
  }

  try {
    console.log("๐Ÿ” Starting receipt data extraction...");
    
    const extractedData = await extract({
      filePath: receiptFilePath,
      apiKey: apiKey,
      provider: 'openai', // or 'mistral'
      responseFormat: ReceiptSchema,
      prompt: 'Extract the date, total, currency, vendor name, and a list of items with their names and prices from this receipt.'
    });

    console.log("โœ… Extraction successful!");
    console.log("๐Ÿ“‹ Extracted Data:");
    console.log(JSON.stringify(extractedData, null, 2));
    
  } catch (error) {
    console.error("โŒ An error occurred during extraction:", error);
  }
}

runReceiptExtraction();

5- Understanding the output

Once your script runs successfully, Docuglean returns a JavaScript object containing all extracted information, perfectly matching your Zod schema structure.

Example Output:

{
  "date": "2024-07-09",
  "total": 55.75,
  "currency": "USD",
  "vendorName": "SuperMart",
  "items": [
    {
      "name": "Organic Bananas",
      "price": 3.49
    },
    {
      "name": "Milk (1 Gallon)",
      "price": 4.99
    },
    {
      "name": "Avocado (Each)",
      "price": 2.50
    }
  ]
}

6- Let's wrap up!

Congratulations! You've successfully built an AI-powered receipt reader that automates tedious data entry and opens up new possibilities for document processing.

What's coming soon to Docuglean:

  • ๐Ÿ”ฎ summarize() - Get quick TLDRs of long documents

  • ๐ŸŒ translate() - Built-in multilingual document support

  • ๐Ÿท๏ธ classify() - Automatically detect document types (receipt, invoice, ID, etc.)

  • ๐Ÿ” search(query) - AI-powered document search

  • ๐Ÿค– More AI Models - Integrations with Meta's Llama, Together AI, and OpenRouter

Keep up with the latest updates by starring the Docuglean GitHub repository!


โญ Want More?

Check out the full Docuglean repository on GitHub and star the project to support future updates!

๐ŸŒŸ Visit and star Docuglean SDK

Have questions/requests? Drop them the comments! ๐Ÿ–Š๏ธ

Posted on: 14/7/2025

AmiraBkh

Computer science student | Data analytics, data science, and machine learning enthusiast

Posted by

โ€Œ
โ€Œ
โ€Œ
โ€Œ

Subscribe to our newsletter

Join 2,000+ subscribers

Stay in the loop with everything you need to know.

We care about your data in our privacy policy

Background shadow leftBackground shadow right

Have something to share?

Write on the platform and dummy copy content

Be Part of Something Big

Shifters, a developer-first community platform, is launching soon with all the features. Don't miss out on day one access. Join the waitlist: