PDF Table Extractor
Extract tabular data from PDF pages using layout-aware parsing
/v1/pdf/table-extract
curl -X POST "https://pdf.toolkitapi.io/v1/pdf/table-extract" \
-H "Content-Type: application/json" \
-d '{
"url": "https://toolkitapi.io/financial-report.pdf",
"pages": "3-5"
}'
import httpx
resp = httpx.post(
"https://pdf.toolkitapi.io/v1/pdf/table-extract",
json={
"url": "https://toolkitapi.io/financial-report.pdf",
"pages": "3-5"
},
)
print(resp.json())
const resp = await fetch("https://pdf.toolkitapi.io/v1/pdf/table-extract", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
"url": "https://toolkitapi.io/financial-report.pdf",
"pages": "3-5"
}),
});
const data = await resp.json();
console.log(data);
# See curl example
{
"tables": [
{
"page": 3,
"table_index": 0,
"rows": [
["Quarter", "Revenue", "Expenses", "Profit"],
["Q1", "$2.4M", "$1.8M", "$0.6M"],
["Q2", "$2.8M", "$1.9M", "$0.9M"],
["Q3", "$3.1M", "$2.0M", "$1.1M"]
],
"row_count": 4,
"col_count": 4
}
],
"total_tables": 1,
"pages_scanned": 3
}
Description
How to Use
1. Provide the PDF via `pdf` (base64) or `url` (public URL).
2. Optionally set `pages` to limit which pages are scanned for tables. Omit to scan the entire document.
3. The response contains an array of tables, each with its page number, row data, and dimensions.
4. Process the `rows` arrays as needed — the first row is typically the header.
About This Tool
PDF Table Extractor identifies and extracts tabular data from PDF documents using layout-aware parsing. It detects table structures — rows, columns, and cell boundaries — and returns the data as clean arrays you can convert to CSV, JSON, or feed into a database.
This tool works best on PDFs with clearly defined table structures: bordered tables, consistent column alignment, and regular row spacing. It scans the pages you specify and returns every table it finds, along with page location and dimensions.
Use it to automate data extraction from invoices, financial reports, scientific papers, or any document with structured tabular content.
Why Use This Tool
- Financial data extraction — Pull revenue tables, balance sheets, or transaction records from PDF reports
- Invoice processing — Extract line items and totals from PDF invoices
- Scientific data capture — Grab experimental results and statistical tables from research papers
- Spreadsheet conversion — Convert PDF tables to CSV or Excel format for further analysis
- Database ingestion — Parse structured data from PDF documents into database records
Frequently Asked Questions
What types of tables are detected?
Can cells contain null values?
Does this work on scanned PDFs?
Start using PDF Table Extractor now
Get your free API key and make your first request in under a minute.