Automatic PDF Processor - automatically process PDF files

The complete solution for automated processing of PDF documents

Using Data Sources

Where an extraction rule gets its value from

At a Glance

Difficulty: Intermediate
Time required: ~15 minutes
Prerequisites: Understanding Data Extraction
What you'll learn: The eight data sources and when to use each

What is a data source?

Before an extraction rule can process a value, it must be clear where that value comes from. That is exactly what the data source defines. While the data type determines how a value is understood, the data source determines where it comes from - the visible document text, a barcode, the metadata, the file information and more.

You select the data source in the rule editor under "General". Depending on your choice, the program shows the matching settings.

The eight data sources at a glance

Data source	Provides values from ...
Determine data from document text	the visible, searchable text of the pages
Determine data from QR or barcode	QR codes and barcodes in the document
Use metadata of the document	PDF metadata such as title, author or creation date
Use file information	file name, path and file date values
Use custom text	a fixed text you specify yourself
Use placeholder value	the result of another rule
Use form data	fillable PDF form fields
Use sequential number	an automatically incrementing counter

Determine data from document text

The most important and most common source. The value is read from the visible text of the PDF - for example via a keyword and the adjacent data area. The tutorial Understanding Data Extraction explains the basics.

Important: This source only works with PDFs that contain searchable text. Pure image PDFs (scans) must first be made searchable using text recognition (OCR).

Determine data from QR or barcode

Reads the content of QR codes and barcodes from the document. This is especially useful when documents already carry a code that contains a unique identifier - such as a case or document number.

Example: Incoming documents carry a QR code with the case number. You read it out and name the file accordingly.

Use metadata of the document

Accesses the metadata stored in the PDF - such as title, author, subject or the creation and modification date held in the document. This information is not part of the visible text but belongs to the properties of the file.

Example: You sort documents into different folders based on the author stored in the PDF.

Use file information

Uses properties of the file itself - the file name, the path and the file system date values (created/modified). Handy when the file name or storage location already contains usable information.

Example: The file name already contains a customer number that you want to reuse for further filing.

Use custom text

Provides a fixed text that you specify yourself - independent of the document content. This is useful for fixed building blocks or as a fallback value: create a second rule with the same name and it steps in if the actual extraction does not find a value.

Note: With this source, only the Text data type is available.

Use placeholder value

This source builds on the result of another rule. This lets you process already extracted values further or combine several values, without setting up the same extraction again.

Example: One rule reads the invoice date. A second rule uses this value to produce a different notation from it.

Use form data

Reads the content of fillable PDF form fields (such as text fields or check boxes). This requires the PDF to contain real form fields - not just printed text. You can find a detailed guide at Extract PDF form data.

Use sequential number

Generates an automatically incrementing number - for example a continuous document number. The numbering is managed via named counters that you maintain centrally. Several rules or profiles that use the same counter share a guaranteed unique, gap-free sequence of numbers.

Example: Each processed invoice receives a continuous internal number such as 000123, 000124, 000125 - with a freely selectable start value and format.

Which source is the right one?

Your goal	Suitable data source
Value is in the visible document text	Determine data from document text
Document carries a QR/barcode	Determine data from QR or barcode
Information is in the file name or path	Use file information
Title/author from the PDF properties	Use metadata of the document
Fillable form PDF	Use form data
Fixed text or fallback value	Use custom text
Build on an already extracted value	Use placeholder value
Continuous, unique numbering	Use sequential number

Next steps

Using Data Types - How a value is processed: Text, Date, Number, Query and Query (with list)
Placeholder System Explained - Use extracted values in file names and paths
Extract PDF form data - Extract values from form fields in a targeted way

Other step-by-step instructions

Try Automatic PDF Processor now for 30 days... Go to the download page

Using Data Sources

Where an extraction rule gets its value from

What is a data source?

The eight data sources at a glance

Determine data from document text

Determine data from QR or barcode

Use metadata of the document

Use file information

Use custom text

Use placeholder value

Use form data

Use sequential number

Which source is the right one?

Next steps

Other step-by-step instructions

Getting Started

Basic Tasks

PDF Editing

E-Invoicing & Archiving

Practical Examples

Operation & Server

Active products

Discontinued products