Using Data Sources

Where an extraction rule gets its value from

At a Glance

  • Difficulty: Intermediate
  • Time required: ~15 minutes
  • Prerequisites: Understanding Data Extraction
  • What you'll learn: The eight data sources and when to use each

What is a data source?

Before an extraction rule can process a value, it must be clear where that value comes from. That is exactly what the data source defines. While the data type determines how a value is understood, the data source determines where it comes from - the visible document text, a barcode, the metadata, the file information and more.

You select the data source in the rule editor under "General". Depending on your choice, the program shows the matching settings.


The eight data sources at a glance

Data source Provides values from ...
Determine data from document textthe visible, searchable text of the pages
Determine data from QR or barcodeQR codes and barcodes in the document
Use metadata of the documentPDF metadata such as title, author or creation date
Use file informationfile name, path and file date values
Use custom texta fixed text you specify yourself
Use placeholder valuethe result of another rule
Use form datafillable PDF form fields
Use sequential numberan automatically incrementing counter

Determine data from document text

The most important and most common source. The value is read from the visible text of the PDF - for example via a keyword and the adjacent data area. The tutorial Understanding Data Extraction explains the basics.

Important: This source only works with PDFs that contain searchable text. Pure image PDFs (scans) must first be made searchable using text recognition (OCR).


Determine data from QR or barcode

Reads the content of QR codes and barcodes from the document. This is especially useful when documents already carry a code that contains a unique identifier - such as a case or document number.

Example: Incoming documents carry a QR code with the case number. You read it out and name the file accordingly.


Use metadata of the document

Accesses the metadata stored in the PDF - such as title, author, subject or the creation and modification date held in the document. This information is not part of the visible text but belongs to the properties of the file.

Example: You sort documents into different folders based on the author stored in the PDF.


Use file information

Uses properties of the file itself - the file name, the path and the file system date values (created/modified). Handy when the file name or storage location already contains usable information.

Example: The file name already contains a customer number that you want to reuse for further filing.


Use custom text

Provides a fixed text that you specify yourself - independent of the document content. This is useful for fixed building blocks or as a fallback value: create a second rule with the same name and it steps in if the actual extraction does not find a value.

Note: With this source, only the Text data type is available.


Use placeholder value

This source builds on the result of another rule. This lets you process already extracted values further or combine several values, without setting up the same extraction again.

Example: One rule reads the invoice date. A second rule uses this value to produce a different notation from it.


Use form data

Reads the content of fillable PDF form fields (such as text fields or check boxes). This requires the PDF to contain real form fields - not just printed text. You can find a detailed guide at Extract PDF form data.


Use sequential number

Generates an automatically incrementing number - for example a continuous document number. The numbering is managed via named counters that you maintain centrally. Several rules or profiles that use the same counter share a guaranteed unique, gap-free sequence of numbers.

Example: Each processed invoice receives a continuous internal number such as 000123, 000124, 000125 - with a freely selectable start value and format.


Which source is the right one?

Your goal Suitable data source
Value is in the visible document textDetermine data from document text
Document carries a QR/barcodeDetermine data from QR or barcode
Information is in the file name or pathUse file information
Title/author from the PDF propertiesUse metadata of the document
Fillable form PDFUse form data
Fixed text or fallback valueUse custom text
Build on an already extracted valueUse placeholder value
Continuous, unique numberingUse sequential number

Next steps


Other step-by-step instructions

Getting Started

Basic Tasks

PDF Editing

E-Invoicing & Archiving

Practical Examples

Operation & Server


To the product page of Automatic PDF Processor
Try Automatic PDF Processor now for 30 days...     Go to the download page