70.2 Text Extraction
70.2.1 Overview ¶
Text extraction is an extension of the placeholder logic that lets you specifically read out partial values from emails or attachments - e.g. an invoice number from the subject, a booking code from the body, or a contract partner from an attached TXT or CSV file.
In contrast to the fixed placeholders (see chapter 70.1), text extraction rules are configurable: per rule, you define which part of the email is searched, with which boundaries (from / to), and with which additional constraint (regex, number of characters).
70.2.2 Direct regex in subject or body ¶
The simplest variant is direct regex extraction - without a separate rule definition. In any input field you can write:
<BeginOfSubjectRegex>INV-(\d{4}-\d{3})<EndOfRegex>
The program applies the regex to the subject and replaces the placeholder with the first match group. With multiple groups, $1, $2 etc. can be referenced.
Analogously, there is <BeginOfBodyRegex>...<EndOfRegex> for the body.
Example:
| Subject |
Regex |
Result |
Invoice INV-2026-456 Mueller GmbH |
<BeginOfSubjectRegex>INV-(\d{4}-\d{3})<EndOfRegex> |
2026-456 |
Order Number 78901 |
<BeginOfSubjectRegex>Number (\d+)<EndOfRegex> |
78901 |
70.2.3 Text Extraction Rules ¶
For more complex extractions (e.g. multi-step range narrowing, application to attachments, encoding control), use text extraction rules, which are defined in the profile editor under Text Extraction.
For each rule you configure:
| Field |
Description |
| Name |
Unique identifier (for the placeholder reference) |
| Source |
Message body or Attachment (with file filter) |
| Encoding |
ANSI, UTF-8, Unicode, or explicit code page (for attachments with a special format) |
| Range from |
Search string or regex from which extraction begins |
| Range to |
Search string or regex at which extraction ends |
| Constraint |
First X characters, Last X characters, or Regex on the extracted range |
| Value conversion |
Optional lookup table that further maps the extracted value (e.g. code -> plain text) |
70.2.4 Using the Rule as a Placeholder ¶
You reference a configured rule as a placeholder:
| Placeholder |
Effect |
<MRuleId:5(InvoiceNumber)> |
Applies the rule with ID 5 (display name “InvoiceNumber”) to the message body |
<FRuleId:7(BookingCode)> |
Applies the rule with ID 7 (display name “BookingCode”) to the matching attachment |
MRuleId stands for Message-Rule (message body), FRuleId for File-Rule (file attachment). The ID is the unique key of the rule; the bracketed suffix is only a readable display name and is ignored during processing.
Selection is made in the placeholder menu - all defined rules appear under “Text Extraction”.
70.2.5 Range Narrowing ¶
The two-step range narrowing (from + to) is the central logic:
- Range from: Search string identifies the starting position. Everything before it is ignored.
- Range to: Search string identifies the end position. Everything after it is ignored.
- The text in between is the raw match.
- The constraint is applied to the raw match (e.g. first 20 characters).
- Optional: value conversion through a lookup table.
Example email body:
Dear Sir or Madam,
We hereby send you Invoice Number INV-2026-456
with a total amount of 1,234.56 EUR.
Best regards
Rule: - Range from: Number - Range to: with - Constraint: none
Result: INV-2026-456
70.2.6 Encoding and Attachment Sources ¶
For file-based extraction (source: attachment), the program reads the attachment with the configured encoding:
| Encoding |
When to use |
| ANSI |
Classic Windows text files |
| UTF-8 |
Modern text files, JSON, XML |
| Unicode |
UTF-16 Little-Endian (typical Windows email bodies) |
| Code page |
Explicit code page (e.g. 1252, 850) for legacy formats |
Text extraction only works for pure text attachments (e.g. TXT, CSV, XML, JSON, HTML). Binary formats are not supported.
70.2.7 Use case ¶
Invoice number from subject
Email subject: “Invoice INV-2026-456 from May 7.” Rule: direct regex <BeginOfSubjectRegex>INV-([0-9-]+)<EndOfRegex> -> returns 2026-456. Used in the path construction as <EmailYear4>-<EmailMonth>-<EmailDay>_<BeginOfSubjectRegex>...<EndOfRegex>.pdf.
70.2.8 Tips ¶
- The value conversion through a lookup table is powerful - you can directly convert an extracted code into a readable plain text (see chapter 70.3)
- Test new rules on sample emails in the profile editor - the preview shows the result directly