📄

PDF Table Extractor

Extract tabular data from PDF pages using layout-aware parsing

POST 1 credit /v1/pdf/table-extract

curl -X POST "https://pdf.toolkitapi.io/v1/pdf/table-extract" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://toolkitapi.io/financial-report.pdf",
    "pages": "3-5"
  }'

import httpx

resp = httpx.post(
    "https://pdf.toolkitapi.io/v1/pdf/table-extract",
    json={
    "url": "https://toolkitapi.io/financial-report.pdf",
    "pages": "3-5"
  },
)
print(resp.json())

const resp = await fetch("https://pdf.toolkitapi.io/v1/pdf/table-extract", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    "url": "https://toolkitapi.io/financial-report.pdf",
    "pages": "3-5"
  }),
});
const data = await resp.json();
console.log(data);

# See curl example

Response 200 OK

{
  "tables": [
    {
      "page": 3,
      "table_index": 0,
      "rows": [
        ["Quarter", "Revenue", "Expenses", "Profit"],
        ["Q1", "$2.4M", "$1.8M", "$0.6M"],
        ["Q2", "$2.8M", "$1.9M", "$0.9M"],
        ["Q3", "$3.1M", "$2.0M", "$1.1M"]
      ],
      "row_count": 4,
      "col_count": 4
    }
  ],
  "total_tables": 1,
  "pages_scanned": 3
}

Description

Extract tabular data from PDF pages using layout-aware parsing

How to Use

1

1. Provide the PDF via `pdf` (base64) or `url` (public URL).

2

2. Optionally set `pages` to limit which pages are scanned for tables. Omit to scan the entire document.

3

3. The response contains an array of tables, each with its page number, row data, and dimensions.

4

4. Process the `rows` arrays as needed — the first row is typically the header.

About This Tool

PDF Table Extractor identifies and extracts tabular data from PDF documents using layout-aware parsing. It detects table structures — rows, columns, and cell boundaries — and returns the data as clean arrays you can convert to CSV, JSON, or feed into a database.

This tool works best on PDFs with clearly defined table structures: bordered tables, consistent column alignment, and regular row spacing. It scans the pages you specify and returns every table it finds, along with page location and dimensions.

Use it to automate data extraction from invoices, financial reports, scientific papers, or any document with structured tabular content.

Why Use This Tool

Financial data extraction — Pull revenue tables, balance sheets, or transaction records from PDF reports
Invoice processing — Extract line items and totals from PDF invoices
Scientific data capture — Grab experimental results and statistical tables from research papers
Spreadsheet conversion — Convert PDF tables to CSV or Excel format for further analysis
Database ingestion — Parse structured data from PDF documents into database records

Frequently Asked Questions

What types of tables are detected?

The extractor works best with bordered tables and consistently aligned columns. Borderless tables with inconsistent spacing may not be detected. Complex merged cells can reduce accuracy.

Can cells contain null values?

Yes. Cells that are empty or couldn't be parsed will appear as `null` in the rows array.

Does this work on scanned PDFs?

No — the table extractor relies on the PDF's text layer. For scanned documents, first use the OCR endpoint to create a text layer, then extract tables from the result.

Start using PDF Table Extractor now

Get your free API key and make your first request in under a minute.

Get Free API Key View Docs