Choose the right data type for every value
At a Glance
- Difficulty: Intermediate
- Time required: ~15 minutes
- Prerequisites: Understanding Data Extraction
- What you'll learn: The five data types and when to use each - especially Date and Query (with list)
What is a data type?
Every extraction rule reads a value from your PDF. The data type determines how that value
is understood and processed. A value recognized as a Date, for example, can be reformatted freely;
a value recognized as a Number can be checked against a valid range; and a Query
returns a result value you define, depending on the content.
You select the data type in the rule editor under "General" via the
"Data type" field. It is available regardless of the data source - whether the value comes
from the document text, a barcode or the file information.
Note: If the value comes from the source "Use custom text",
only the Text data type is available. Learn more in the tutorial
Using Data Sources.
Text
Text is the default data type and the right choice for most values. The recognized text is taken
over unchanged. Use it for anything that does not need special handling as a date or number - such as invoice
numbers, names or reference codes.
Tip: Even purely numeric codes such as an invoice or customer number are
usually best kept as Text. You only need the Number data type if you actually want
to calculate, round or check ranges.
Date
The Date data type automatically recognizes dates in many notations - for example
12/15/2024, 2024-12-15 or December 15, 2024. The big advantage: a value
recognized as a date can then be reformatted freely, because the program knows the year, month and
day individually.
Example: reassembling a date
The PDF contains:
Invoice date: December 15, 2024
Using the date placeholders, you produce for example:
<RuleId:1(InvoiceDate){Year4}>-<RuleId:1(InvoiceDate){Month}>-<RuleId:1(InvoiceDate){Day}> → 2024-12-15
<RuleId:1(InvoiceDate){Year4}>\<RuleId:1(InvoiceDate){MonthName}> → 2024\December
This lets you name files in a sortable way or file them automatically into year and month folders - regardless of
how the date was written in the original document. You can find the available date building blocks (year, month,
month name, day and more) in the tutorial
Placeholder System Explained.
Tip: For dates, always choose the Date data type -
even if you want to keep the date unchanged. Only then are the reformatting options available later.
Number
The Number data type reads numeric values and understands different notations (for example thousands
separators and decimal marks such as 1,234.56). Use it when you need the value as an actual number - for
example to check a valid range or to enforce a consistent format.
Example: You extract an invoice amount and want to move only invoices of
1000 or more to a special folder. With the Number data type, the value can be checked
as a number.
Query
With the Query data type, a rule returns not the found text itself, but a result value you
define - depending on what the document contains. To do this, you set up one or more conditions and assign a
return value to each.
Example: determining the payment status
- The document contains the word
Paid → result paid
- Otherwise → result
open
You can then use the result (paid or open)
as a placeholder in the file name or target folder.
The Query is ideal whenever you want to sort documents into fixed categories based on their content -
but the search term itself should not be output.
Query (with list)
The Query (with list) is the most powerful variant. Instead of individual conditions, you set up an
entire list of search terms, each with an assigned result value. The program checks the content
against the list and returns the assigned value of the match. This keeps even many cases clear and manageable.
Example: turning codes into full names
| Found in the document | Result value |
MM | Mustermann Ltd |
BS | Example & Sons Inc |
MH | Sample Trading Corp |
If the program finds the code MM in the document, the rule returns Mustermann Ltd. Typical
uses are determining document types, departments, suppliers or
customers based on a maintained mapping list.
Fixed list or central (dynamic) list
You can maintain the list in two ways:
- Fixed in the rule: the assignments belong to this one rule only. Ideal for short, stable lists.
- Central as a dynamic list: you maintain the list once in the program options and can reuse it
across multiple rules and profiles. Changes to the list take effect everywhere immediately. Ideal for longer or
frequently changing assignments.
Tip: Use a central (dynamic) list when the same assignment is
needed in several profiles or has to be extended regularly. That way you maintain it in just one place.
Practical example: combining data types
Suppose you process incoming invoices and want to name and file them sensibly. To do so, you combine several rules
with different data types:
| Rule |
Data type |
Result |
| Invoice number | Text | RE-2024-0042 |
| Invoice date | Date | reformatted to 2024-12-15 |
| Supplier | Query (with list) | code → Mustermann Ltd |
| Payment status | Query | paid or open |
From this, you can assemble the following file name, for example:
<RuleId:2(InvoiceDate){Year4}>-<RuleId:2(InvoiceDate){Month}>-<RuleId:2(InvoiceDate){Day}>_<RuleId:3(Supplier)>_<RuleId:1(InvoiceNumber)>_<RuleId:4(PaymentStatus)>.pdf
Result: 2024-12-15_Mustermann Ltd_RE-2024-0042_paid.pdf