LLM

LlamaOCR - Free AI-powered AI that convert Complex Docs to Markdown Context

Hazem Abbas

Nov 14, 2024 — 4 min read

Table of Content

Llama OCR is an npm library that brings the power of Llama 3.2 Vision for free OCR (Optical Character Recognition) to your projects! With the llama-ocr package, you can easily extract text from images (and soon PDFs!) using the free Llama 3.2 model endpoint provided by Together AI.

For those looking for even faster processing or higher rate limits, paid endpoints featuring Llama 3.2’s larger models—11B and 90B—are available, offering enhanced speed and performance.

Current Version & Installation

To get started, install the library with npm:

npm i llama-ocr

Usage

The library is simple to use. Import the ocr function, point it to your image file path, and provide your Together AI API key:

import { ocr } from "llama-ocr";

const markdown = await ocr({
  filePath: "./trader-joes-receipt.jpg", // path to your image
  apiKey: process.env.TOGETHER_API_KEY, // Together AI API key
});

This code returns the extracted text in a clean markdown format, making it a great fit for documentation, receipts, and other text-heavy images.

Hosted Demo

Want to try it out before installing? Visit LlamaOCR.com to see the hosted demo in action.

How It Works

Llama OCR taps into the Llama 3.2 endpoint by Together AI, leveraging advanced vision models for image parsing. By default, it uses the high-performing Llama-3.2-90B-Vision model but allows you to select the free or Llama-3.2-11B-Vision model if needed.