Multi-Page Document Processing
FileLens provides comprehensive multi-page document processing capabilities, allowing you to generate preview images for every page in PDF, DOC, DOCX, PPT, PPTX, and other document formats. This guide covers everything you need to know about working with multi-page documents.
Supported Formats
Document Types
FileLens supports multi-page processing for the following document formats:
PDF Documents
- Native PDF processing
- Preserves vector graphics when possible
- Supports password-protected PDFs
- Handles complex layouts and fonts
Microsoft Office
- DOC, DOCX (Word documents)
- XLS, XLSX (Excel spreadsheets)
- PPT, PPTX (PowerPoint presentations)
- Converts via LibreOffice pipeline
OpenDocument Formats
- ODT (Text documents)
- ODS (Spreadsheets)
- ODP (Presentations)
- Full compatibility with LibreOffice
Other Formats
- RTF (Rich Text Format)
- TXT (Plain text files)
- CSV (Comma-separated values)
- Many other document formats via LibreOffice
Processing Pipeline
Office documents (DOC, DOCX, PPT, PPTX, etc.) are first converted to PDF using LibreOffice, then processed page by page for optimal quality and compatibility.
Processing Options
Page Control
The all_pages option controls how many pages are processed:
{
"input": "https://example.com/document.pdf",
"output_format": "jpg",
"options": {
"all_pages": true,
"width": 800,
"height": 600
}
}
Quality Settings
Different quality settings work better for different document types:
- Name
quality- Type
- integer
- Description
High Quality (90-100): Best for presentations and graphics-heavy documents
- Name
quality- Type
- integer
- Description
Medium Quality (70-89): Good balance for text documents
- Name
quality- Type
- integer
- Description
Low Quality (50-69): Fastest processing, suitable for thumbnails
Resolution Settings
Choose appropriate dimensions based on your use case:
Thumbnail Size
- Width: 200-400px
- Height: 200-400px
- Best for file browsers and quick previews
Standard Preview
- Width: 600-800px
- Height: 800-1000px
- Good for web display and general viewing
High Resolution
- Width: 1200-1920px
- Height: 1600-2560px
- Best for detailed viewing and printing
Custom Aspect Ratios
- Maintain document proportions
- Consider target display requirements
- Balance file size vs. quality
File Naming
Naming Convention
Generated files follow a consistent naming pattern:
{type}_{timestamp}_{process_id}_{hash}_{page_number}.{extension}
Synchronous Files
sync_1641312000_12345_abc1_1.jpg # Page 1
sync_1641312000_12345_abc1_2.jpg # Page 2
sync_1641312000_12345_abc1_3.jpg # Page 3
Asynchronous Files
result_550e8400-e29b-41d4-a716-446655440000_1641312000_1.png # Page 1
result_550e8400-e29b-41d4-a716-446655440000_1641312000_2.png # Page 2
result_550e8400-e29b-41d4-a716-446655440000_1641312000_3.png # Page 3
File Components
- Name
type- Type
- string
- Description
syncfor synchronous requests,resultfor asynchronous jobs
- Name
timestamp- Type
- string
- Description
Unix timestamp when processing started
- Name
process_id- Type
- string
- Description
Process ID (sync) or Job UUID (async)
- Name
hash- Type
- string
- Description
Short hash of input (sync only)
- Name
page_number- Type
- integer
- Description
Sequential page number starting from 1
- Name
extension- Type
- string
- Description
File extension matching output_format
Examples
PDF Multi-Page Processing
curl -X POST http://localhost:3000/preview \
-H "Content-Type: application/json" \
-d '{
"input": "https://example.com/report.pdf",
"output_format": "jpg",
"options": {
"width": 800,
"height": 600,
"quality": 90,
"all_pages": true
}
}'
Response:
{
"success": true,
"message": "Preview generated successfully",
"preview_urls": [
"/download/sync_1641312000_12345_def4_1.jpg",
"/download/sync_1641312000_12345_def4_2.jpg",
"/download/sync_1641312000_12345_def4_3.jpg",
"/download/sync_1641312000_12345_def4_4.jpg",
"/download/sync_1641312000_12345_def4_5.jpg"
],
"total_pages": 5,
"job_id": null
}
PowerPoint Presentation Processing
# Submit job
curl -X POST http://localhost:3000/preview/async \
-H "Content-Type: application/json" \
-d '{
"input": "https://example.com/presentation.pptx",
"output_format": "png",
"options": {
"width": 1920,
"height": 1080,
"quality": 95,
"all_pages": true
}
}'
# Check status
curl http://localhost:3000/preview/status/550e8400-e29b-41d4-a716-446655440000
# Download slides
curl http://localhost:3000/download/result_550e8400-e29b-41d4-a716-446655440000_1641312000_1.png -o slide1.png
curl http://localhost:3000/download/result_550e8400-e29b-41d4-a716-446655440000_1641312000_2.png -o slide2.png
Excel Spreadsheet Processing
<?php
class ExcelProcessor {
private $client;
public function __construct() {
$this->client = new FileLensClient();
}
public function processSpreadsheet($input, $outputDir = './sheets/') {
$result = $this->client->generatePreview($input, 'png', [
'width' => 1200,
'height' => 900,
'quality' => 85,
'all_pages' => true
]);
if (!is_dir($outputDir)) {
mkdir($outputDir, 0777, true);
}
$downloadedSheets = [];
foreach ($result['preview_urls'] as $index => $url) {
$sheetNumber = $index + 1;
$filename = "sheet_{$sheetNumber}.png";
$outputPath = $outputDir . $filename;
$fileContent = file_get_contents('http://localhost:3000' . $url);
file_put_contents($outputPath, $fileContent);
$downloadedSheets[] = [
'sheet' => $sheetNumber,
'file' => $outputPath,
'url' => $url
];
}
return [
'total_sheets' => $result['total_pages'],
'sheets' => $downloadedSheets
];
}
}
// Usage
$processor = new ExcelProcessor();
$result = $processor->processSpreadsheet('https://example.com/data.xlsx');
echo "Processed {$result['total_sheets']} sheets:\n";
foreach ($result['sheets'] as $sheet) {
echo "Sheet {$sheet['sheet']}: {$sheet['file']}\n";
}
?>
Best Practices
Performance Optimization
File Size Considerations
- Documents > 50 pages: Use async processing
- Large presentations: Use async processing
- Complex spreadsheets: Consider smaller preview sizes
Quality vs. Speed
- Lower quality (70-80) for thumbnails
- Higher quality (90-95) for detailed viewing
- Balance based on use case requirements
Batch Processing
- Process multiple documents concurrently
- Use connection pooling for efficiency
- Implement proper error handling for batches
Caching Strategy
- Cache preview URLs to avoid reprocessing
- Store metadata about processed documents
- Implement TTL for cache invalidation
Memory Management
async function downloadLargeDocument(previewUrls, outputDir) {
const fs = require('fs');
const path = require('path');
// Create output directory
if (!fs.existsSync(outputDir)) {
fs.mkdirSync(outputDir, { recursive: true });
}
// Download pages in batches to manage memory
const batchSize = 5;
for (let i = 0; i < previewUrls.length; i += batchSize) {
const batch = previewUrls.slice(i, i + batchSize);
await Promise.all(batch.map(async (url, index) => {
const pageNumber = i + index + 1;
const filename = `page_${pageNumber}.jpg`;
const outputPath = path.join(outputDir, filename);
const response = await fetch(`http://localhost:3000${url}`);
const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync(outputPath, buffer);
console.log(`Downloaded page ${pageNumber}`);
}));
// Small delay between batches
if (i + batchSize < previewUrls.length) {
await new Promise(resolve => setTimeout(resolve, 100));
}
}
}
Error Recovery
- Name
Page-level errors- Type
- info
- Description
If processing fails for specific pages, the service will still return successfully processed pages. Check the total_pages count against the number of URLs returned.
- Name
Memory limits- Type
- warning
- Description
Very large documents (>200 pages) may hit memory limits. Consider splitting into smaller batches or using lower resolution settings.
- Name
Timeout handling- Type
- error
- Description
Long processing times can cause timeouts. Use async processing for documents with >50 pages or complex layouts.
Use Case Examples
Document Viewer
- Generate thumbnails for navigation
- Use progressive loading for large documents
- Implement zoom functionality with higher-resolution versions
Archive System
- Process documents in background jobs
- Store previews alongside metadata
- Implement search within document content
Presentation Tools
- Generate slide thumbnails for editing interface
- Create animated previews from slide sequences
- Export individual slides as images
Reporting Dashboard
- Create visual summaries of document content
- Generate thumbnail galleries
- Provide quick document previews