8.4 Filter

8.4.1 Overview and Basic Principle

Filters allow you to precisely define which PDF files should be processed by a profile. If no filter criteria are defined, all PDF files in the monitored folders will be processed.

How Filters Work

For each new PDF file, the program checks: 1. Does the file meet all defined filter criteria? 2. If yes: The file is processed 3. If no: The file is skipped (status: “No Match” if not processed by any profile due to filters)

The Filter Tabs

The filter settings are divided into the following tabs:

Tab Description
File Properties Filters by file name, path, size, date
PDF Data Filters by PDF metadata, content, attachments, and barcodes
Results Preview Shows filter results for example files
Overlap Check Checks for conflicts with other profiles

8.4.2 Tab: File Properties

Here you filter by properties of the file itself.

Path Contains / Does Not Contain

Filters by the storage location of the file.

Example: To process only files from the “Invoices” subfolder: - Path contains: Invoices

File Name Contains / Does Not Contain

Filters by the name of the PDF file.

Examples: - File name contains: Invoice - Processes all files with “Invoice” in the name - File name does not contain: DRAFT - Ignores files with “DRAFT” in the name

File Size

Filters by the size of the file. Available comparison operators: - Less than - Greater than - Between

Units: Bytes (B), Kilobytes (KB), Megabytes (MB)

Use Case: Route very large files (e.g., > 50 MB) to a separate profile for compression.

Creation Date / Modification Date

Filters by the file date. Available options: - Between: Specify two fixed dates - Older than: X days/weeks/months/years - Newer than: X days/weeks/months/years

Use Case: Only process files created in the last 7 days.


8.4.3 Tab: PDF Data

Here you filter by contents and properties of the PDF file itself.

PDF Metadata

PDF files can contain metadata set by the creator program.

Filter Description
Author contains / does not contain The author of the document
Title contains / does not contain The document title
Subject contains / does not contain The subject
Keywords contain / do not contain Stored keywords
Creator contains / does not contain The creation program
Producer contains / does not contain The conversion program

Tip: You can view the metadata of a PDF file in the document properties (right-click > Properties in many PDF viewers).

Use Case: A scanner stores its name as “Creator”. This allows you to process files from different scanners differently.

Document Text Contains / Does Not Contain

Searches the entire text of the PDF document.

Examples: - Document text contains: Invoice - Only PDFs with the word “Invoice” - Document text contains: Mustermann GmbH - Only PDFs from this sender

Page Range: Optionally, you can restrict the search area to specific pages: - All pages (default) - First page only - Last page only - Page range (e.g., “1-3”)

Page Count

Filters by the number of pages. Comparison operators: - Less than - Greater than - Between

Use Case: Process single-page documents differently than multi-page ones.

Character Count

Filters by the number of characters in the document.

Use Case: Route PDFs without text (character count = 0) to OCR processing.

PDF Attachments

Some PDF files contain embedded attachments.

Attachment Count: Filters by the number of embedded attachments. Comparison operators: - Less than - Greater than - Between

Attachment Name Contains / Does Not Contain: Filters by the name of embedded attachments.

Example: ZUGFeRD invoices often contain an attachment named factur-x.xml: - Attachment name contains: factur-x.xml

Use Case: Process PDFs with attachments (e.g., ZUGFeRD invoices) separately.

Barcode

Filters by barcode contents in the PDF.

Option Description
PDF must contain barcode Only PDFs with at least one barcode
PDF must not contain barcode Only PDFs without barcodes
Barcode value contains Filters by barcode content
Barcode value does not contain Excludes certain barcode values

Supported Barcode Types: - 1D Codes: Code128, Code39, EAN-13, EAN-8, UPC-A, ITF, Codabar - 2D Codes: QR Code, DataMatrix, PDF417, Aztec

Tip: Barcode detection uses machine learning for higher accuracy. You can enable or disable this option in Program Options under Processing.


8.4.4 Tab: Results Preview

The results preview shows you how your filter settings affect the example files.

Prerequisite

Add at least 5 representative PDF files on the Example Files tab that correspond to the typical documents to be processed.

Display

For each example file, the following is shown: - Yes - The file meets all filter criteria - No - The file does not meet at least one filter criterion (with indication of the unmet criterion)

How to Use the Preview Effectively

  1. Add files that should be processed (expected result: “Yes”)
  2. Add files that should NOT be processed (expected result: “No”)
  3. Check whether the results match your expectations
  4. Adjust filters as needed

Tip: The results preview is particularly important for complex filters with AND/OR combinations or regular expressions.


8.4.5 Tab: Overlap Check

This tab shows potential conflicts with other profiles.

How It Works

The program checks whether other profiles: - Use the same monitored folders - Have similar or overlapping filter criteria

Displayed Information

Column Description
Profile Name of the possibly overlapping profile
Monitored Folders Common monitored folders
Filter Overlap Type of possible overlap

Why Is This Important?

If multiple profiles could process the same files: - The file may be processed multiple times - The processing order could be unpredictable - Conflicts with file operations may occur

Recommendation: Ensure your filters are unambiguous or enable “Stop processing after applying” in the first matching profile.


8.4.6 AND/OR Logic

When entering multiple terms in a filter field, you can combine them with logical operators.

AND Combination

All terms must be present.

Syntax: <AND> or <UND>

Example: Invoice<AND>Mustermann - Matches: “Invoice to Mustermann GmbH” - Does not match: “Invoice to Schmidt GmbH”

OR Combination

At least one of the terms must be present.

Syntax: <OR> or <ODER>

Example: Invoice<OR>Bill<OR>Receipt - Matches any PDF containing “Invoice”, “Bill”, or “Receipt”

Combinations

You can combine AND and OR. The rule is: AND separates expressions, OR is evaluated within segments.

Example: Invoice<AND>2024<OR>Bill<AND>2024

For clarification, think of <AND> as a line break:

Invoice
<AND>
2024<OR>Bill
<AND>
2024

This means: (Invoice) AND (2024 OR Bill) AND (2024)

Result: - Matches if: Text contains “Invoice” AND (“2024” OR “Bill”) AND “2024” - Matches “Invoice dated 15.12.2024” (contains Invoice, 2024, 2024) - Matches “Invoice Bill 2024” (contains Invoice, Bill, 2024) - Does not match “Invoice dated 15.12.2023” (does not contain “2024”)

Tip: Always test complex filters with the results preview to ensure the desired result is achieved.


8.4.7 Regular Expressions

For advanced filtering, regular expressions (Regex) are available.

Syntax

Enclose the regular expression with: <BeginOfRegex>PATTERN<EndOfRegex>

Examples

Regex Description Matches
<BeginOfRegex>INV-\d{5}<EndOfRegex> Invoice number with 5 digits INV-12345, INV-00001
<BeginOfRegex>^Invoice<EndOfRegex> Starts with “Invoice” “Invoice No. 123”
<BeginOfRegex>\d{2}\.\d{2}\.\d{4}<EndOfRegex> Date in format DD.MM.YYYY 15.12.2024
<BeginOfRegex>€\s*\d+[,\.]\d{2}<EndOfRegex> Euro amount € 123.45 or €99.00

Commonly Used Regex Elements

Element Meaning
\d A digit (0-9)
\d{5} Exactly 5 digits
\d+ One or more digits
\s A whitespace
\s* Any number of whitespaces (including none)
^ Beginning of line/text
$ End of line/text
. Any character
.* Any number of any characters
[A-Z] An uppercase letter
[a-zA-Z] A letter (upper or lowercase)

Tip: Test your regular expressions on websites like regex101.com before using them in the filter.


8.4.8 Number Ranges

With the syntax <NumberRange{MIN,MAX}> you can filter by number ranges.

Syntax

<NumberRange{Minimum,Maximum}>

Examples

Filter Description
<NumberRange{1,99}> Numbers from 1 to 99
<NumberRange{2020,2025}> Years from 2020 to 2025
<NumberRange{100,999}> Three-digit numbers

Use Case

Filter for documents with specific customer numbers: - Document text contains: Customer number: <NumberRange{5100000,5200000}>

This matches all PDFs containing a customer number between 5100000 and 5200000.


8.4.9 Dynamic Filter Lists

With dynamic lists, you can make filters flexible without having to change the profile.

How It Works

  1. Create a new list under Extras > Program Options > Dynamic Lists
  2. Add the desired entries (e.g., client names, project numbers)
  3. Use the list in the filter with the syntax: <EntryFromList{ListName}>

Syntax

<EntryFromList{Name of List}>

Example

You have a list “Important Customers” with the following entries: - Mustermann GmbH - Schmidt AG - Meyer & Co

Filter: Document text contains: <EntryFromList{Important Customers}>

The program automatically checks whether any of the list entries appears in the document.

Advantages

  • Flexibility: New entries can be added to the list at any time
  • Central Management: One list can be used in multiple profiles
  • Clarity: Long lists don’t need to be in the filter itself

Tip: Lists can also be imported from external sources such as Excel files or databases.


8.4.10 Practical Tips

Build Filters Step by Step

Start with a simple filter and expand it gradually: 1. First filter only by file name 2. Then add text filter 3. Add more complex conditions

Don’t Filter Too Restrictively

Overly strict filters can cause files to be missed: - Watch for different spellings (Invoice/INVOICE) - Consider typos in source documents - Use OR combinations for variants

Case Sensitivity

Normal text search does not distinguish between uppercase and lowercase - “Invoice”, “INVOICE”, and “invoice” are treated equally.

Note: Regular expressions by default distinguish between uppercase and lowercase. Use the flag (?i) for case-insensitive regex search:

<BeginOfRegex>(?i)invoice<EndOfRegex> finds “Invoice”, “INVOICE”, “invoice”, etc.

Multiple Profiles Instead of Complex Filters

Sometimes it’s easier to create multiple profiles with simple filters than one profile with very complex filters.

Example: Instead of a complex filter for different document types: - Profile 1: Invoices (Filter: Document text contains “Invoice”) - Profile 2: Delivery Notes (Filter: Document text contains “Delivery Note”) - Profile 3: Orders (Filter: Document text contains “Order”)