70.2 Text Extraction

70.2.1 Overview

Text extraction is an extension of the placeholder logic that lets you specifically read out partial values from emails or attachments - e.g. an invoice number from the subject, a booking code from the body, or a contract partner from an attached TXT or CSV file.

In contrast to the fixed placeholders (see chapter 70.1), text extraction rules are configurable: per rule, you define which part of the email is searched, with which boundaries (from / to), and with which additional constraint (regex, number of characters).


70.2.2 Direct regex in subject or body

The simplest variant is direct regex extraction - without a separate rule definition. In any input field you can write:

<BeginOfSubjectRegex>INV-(\d{4}-\d{3})<EndOfRegex>

The program applies the regex to the subject and replaces the placeholder with the first match group. With multiple groups, $1, $2 etc. can be referenced.

Analogously, there is <BeginOfBodyRegex>...<EndOfRegex> for the body.

Example:

Subject Regex Result
Invoice INV-2026-456 Mueller GmbH <BeginOfSubjectRegex>INV-(\d{4}-\d{3})<EndOfRegex> 2026-456
Order Number 78901 <BeginOfSubjectRegex>Number (\d+)<EndOfRegex> 78901

70.2.3 Text Extraction Rules

For more complex extractions (e.g. multi-step range narrowing, application to attachments, encoding control), use text extraction rules, which are defined in the profile editor under Text Extraction.

For each rule you configure:

Field Description
Name Unique identifier (for the placeholder reference)
Source Message body or Attachment (with file filter)
Encoding ANSI, UTF-8, Unicode, or explicit code page (for attachments with a special format)
Range from Search string or regex from which extraction begins
Range to Search string or regex at which extraction ends
Constraint First X characters, Last X characters, or Regex on the extracted range
Value conversion Optional lookup table that further maps the extracted value (e.g. code -> plain text)

70.2.4 Using the Rule as a Placeholder

You reference a configured rule as a placeholder:

Placeholder Effect
<MRuleId:5(InvoiceNumber)> Applies the rule with ID 5 (display name “InvoiceNumber”) to the message body
<FRuleId:7(BookingCode)> Applies the rule with ID 7 (display name “BookingCode”) to the matching attachment

MRuleId stands for Message-Rule (message body), FRuleId for File-Rule (file attachment). The ID is the unique key of the rule; the bracketed suffix is only a readable display name and is ignored during processing.

Selection is made in the placeholder menu - all defined rules appear under “Text Extraction”.


70.2.5 Range Narrowing

The two-step range narrowing (from + to) is the central logic:

  1. Range from: Search string identifies the starting position. Everything before it is ignored.
  2. Range to: Search string identifies the end position. Everything after it is ignored.
  3. The text in between is the raw match.
  4. The constraint is applied to the raw match (e.g. first 20 characters).
  5. Optional: value conversion through a lookup table.

Example email body:

Dear Sir or Madam,
We hereby send you Invoice Number INV-2026-456
with a total amount of 1,234.56 EUR.
Best regards

Rule: - Range from: Number - Range to: with - Constraint: none

Result: INV-2026-456


70.2.6 Encoding and Attachment Sources

For file-based extraction (source: attachment), the program reads the attachment with the configured encoding:

Encoding When to use
ANSI Classic Windows text files
UTF-8 Modern text files, JSON, XML
Unicode UTF-16 Little-Endian (typical Windows email bodies)
Code page Explicit code page (e.g. 1252, 850) for legacy formats

Text extraction only works for pure text attachments (e.g. TXT, CSV, XML, JSON, HTML). Binary formats are not supported.


70.2.7 Use case

Invoice number from subject

Email subject: “Invoice INV-2026-456 from May 7.” Rule: direct regex <BeginOfSubjectRegex>INV-([0-9-]+)<EndOfRegex> -> returns 2026-456. Used in the path construction as <EmailYear4>-<EmailMonth>-<EmailDay>_<BeginOfSubjectRegex>...<EndOfRegex>.pdf.


70.2.8 Tips

  • The value conversion through a lookup table is powerful - you can directly convert an extracted code into a readable plain text (see chapter 70.3)
  • Test new rules on sample emails in the profile editor - the preview shows the result directly