Документ взят из кэша поисковой машины. Адрес оригинального документа : http://mirror.msu.net/pub/rfc-editor/rfc-ed-all/pdfrfc/rfc5228.txt.pdf
Дата изменения: Wed Jan 16 22:50:06 2008
Дата индексирования: Sun Sep 12 18:16:44 2010
Кодировка:
Поисковые слова: star trail

Network Working Group Request for Comments: 5228 Obsoletes: 3028 Category: Standards Track

P. Guenther, Sendmail, T. Showalter, January

Ed. Inc. Ed. 2008

Sieve: An Email Filtering Language Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited. Abstract This document describes a language for filtering email messages at time of final delivery. It is designed to be implementable on either a mail client or mail server. It is meant to be extensible, simple, and independent of access protocol, mail architecture, and operating system. It is suitable for running on a mail server where users may not be allowed to execute arbitrary programs, such as on black box Internet Message Access Protocol (IMAP) servers, as the base language has no variables, loops, or ability to shell out to external programs.

Guenther & Showalter

Standards Track

[Page 1]

RFC 5228

Sieve: An Email Filtering Language

January 2008

Table of Contents 1. Introduction ....................................................4 1.1. Conventions Used in This Document ..........................4 1.2. Example Mail Messages ......................................5 2. Design ..........................................................6 2.1. Form of the Language .......................................6 2.2. Whitespace .................................................7 2.3. Comments ...................................................7 2.4. Literal Data ...............................................7 2.4.1. Numbers .............................................7 2.4.2. Strings .............................................8 2.4.2.1. String Lists ...............................9 2.4.2.2. Headers ....................................9 2.4.2.3. Addresses .................................10 2.4.2.4. Encoding Characters Using "encoded-character" .......................10 2.5. Tests .....................................................11 2.5.1. Test Lists .........................................12 2.6. Arguments .................................................12 2.6.1. Positional Arguments ...............................12 2.6.2. Tagged Arguments ...................................12 2.6.3. Optional Arguments .................................13 2.6.4. Types of Arguments .................................13 2.7. String Comparison .........................................13 2.7.1. Match Type .........................................14 2.7.2. Comparisons across Character Sets ..................15 2.7.3. Comparators ........................................15 2.7.4. Comparisons against Addresses ......................16 2.8. Blocks ....................................................17 2.9. Commands ..................................................17 2.10. Evaluation ...............................................18 2.10.1. Action Interaction ................................18 2.10.2. Implicit Keep .....................................18 2.10.3. Message Uniqueness in a Mailbox ...................19 2.10.4. Limits on Numbers of Actions ......................19 2.10.5. Extensions and Optional Features ..................19 2.10.6. Errors ............................................20 2.10.7. Limits on Execution ...............................20 3. Control Commands ...............................................21 3.1. Control if ................................................21 3.2. Control require ...........................................22 3.3. Control stop ..............................................22 4. Action Commands ................................................23 4.1. Action fileinto ...........................................23 4.2. Action redirect ...........................................23 4.3. Action keep ...............................................24 4.4. Action discard ............................................25

Guenther & Showalter

Standards Track

[Page 2]

RFC 5228

Sieve: An Email Filtering Language

January 2008

5. Test Commands ..................................................26 5.1. Test address ..............................................26 5.2. Test allof ................................................27 5.3. Test anyof ................................................27 5.4. Test envelope .............................................27 5.5. Test exists ...............................................28 5.6. Test false ................................................28 5.7. Test header ...............................................29 5.8. Test not ..................................................29 5.9. Test size .................................................29 5.10. Test true ................................................30 6. Extensibility ..................................................30 6.1. Capability String .........................................31 6.2. IANA Considerations .......................................31 6.2.1. Template for Capability Registrations ..............32 6.2.2. Handling of Existing Capability Registrations ......32 6.2.3. Initial Capability Registrations ...................32 6.3. Capability Transport ......................................33 7. Transmission ...................................................33 8. Parsing ........................................................34 8.1. Lexical Tokens ............................................34 8.2. Grammar ...................................................36 8.3. Statement Elements ........................................36 9. Extended Example ...............................................37 10. Security Considerations .......................................38 11. Acknowledgments ...............................................39 12. Normative References ..........................................39 13. Informative References ........................................40 14. Changes from RFC 3028 .........................................41

Guenther & Showalter

Standards Track

[Page 3]

RFC 5228

Sieve: An Email Filtering Language

January 2008

1.

Introduction This memo documents a language that can be used to create filters for electronic mail. It is not tied to any particular operating system or mail architecture. It requires the use of [IMAIL]-compliant messages, but should otherwise generalize to many systems. The language is powerful enough to be useful but limited in order to allow for a safe server-side filtering system. The intention is to make it impossible for users to do anything more complex (and dangerous) than write simple mail filters, along with facilitating the use of graphical user interfaces (GUIs) for filter creation and manipulation. The base language was not designed to be Turingcomplete: it does not have a loop control structure or functions. Scripts written in Sieve are executed during final message is moved to the user-accessible mailbox. the Mail Transfer Agent (MTA) does final delivery, traditional Unix mail, it is reasonable to filter deposits mail into the user's mailbox. delivery, when the In systems where such as when the MTA

There are a number of reasons to use a filtering system. Mail traffic for most users has been increasing due to increased usage of email, the emergence of unsolicited email as a form of advertising, and increased usage of mailing lists. Experience at Carnegie Mellon has shown made available to users, many will make messages from specific users or mailing did not make use of the Andrew system's [FLAMES] due to difficulty in setting it that if use of lists. FLAMES up. a filtering system is it in order to file However, many others filtering language

Because of the expectation that users will make use of filtering if it is offered and easy to use, this language has been made simple enough to allow many users to make use of it, but rich enough that it can be used productively. However, it is expected that GUI-based editors will be the preferred way of editing filters for a large number of users. 1.1. Conventions Used in This Document

In the sections of this document that discuss the requirements of various keywords and operators, the following conventions have been adopted. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [KEYWORDS].

Guenther & Showalter

Standards Track

[Page 4]

RFC 5228

Sieve: An Email Filtering Language

January 2008

Each section on a command (test, action, or control) has a line labeled "Usage:". This line describes the usage of the command, including its name and its arguments. Required arguments are listed inside angle brackets ("<" and ">"). Optional arguments are listed inside square brackets ("[" and "]"). Each argument is followed by its type, so "" represents an argument called "key" that is a string. Literal strings are represented with double-quoted strings. Alternatives are separated with slashes, and parentheses are used for grouping, similar to [ABNF]. In the "Usage:" line, there are three special pieces of syntax that are frequently repeated, MATCH-TYPE, COMPARATOR, and ADDRESS-PART. These are discussed in sections 2.7.1, 2.7.3, and 2.7.4, respectively. The formal grammar for these commands is defined in section 8 and is the authoritative reference on how to construct commands, but the formal grammar does not specify the order, semantics, number or types of arguments to commands, or the legal command names. The intent is to allow for extension without changing the grammar. 1.2. Example Mail Messages

The following mail messages will be used throughout this document in examples. Message A ----------------------------------------------------------Date: Tue, 1 Apr 1997 09:06:31 -0800 (PST) From: coyote@desert.example.org To: roadrunner@acme.example.com Subject: I have a present for you Look, I'm sorry about the whole anvil thing, and I really didn't mean to try and drop it on you from the top of the cliff. I want to try to make it up to you. I've got some great birdseed over here at my place--top of the line stuff--and if you come by, I'll have it all wrapped up for you. I'm really sorry for all the problems I've caused for you over the years, but I know we can work this out. -Wile E. Coyote "Super Genius" coyote@desert.example.org -----------------------------------------------------------

Guenther & Showalter

Standards Track

[Page 5]

RFC 5228

Sieve: An Email Filtering Language

January 2008

Message B ----------------------------------------------------------From: youcouldberich!@reply-by-postal-mail.invalid Sender: b1ff@de.res.example.com To: rube@landru.example.com Date: Mon, 31 Mar 1997 18:26:10 -0800 Subject: $$$ YOU, TOO, CAN BE A MILLIONAIRE! $$$ YOU MAY HAVE ALREADY WON TEN MILLION DOLLARS, BUT I DOUBT IT! SO JUST POST THIS TO SIX HUNDRED NEWSGROUPS! IT WILL GUARANTEE THAT YOU GET AT LEAST FIVE RESPONSES WITH MONEY! MONEY! MONEY! COLD HARD CASH! YOU WILL RECEIVE OVER $20,000 IN LESS THAN TWO MONTHS! AND IT'S LEGAL!!!!!!!!! !!!!!!!!!!!!!!!!!!111111111!!!!!!!11111111111!!1 JUST SEND $5 IN SMALL, UNMARKED BILLS TO THE ADDRESSES BELOW! ----------------------------------------------------------2. 2.1. Design Form of the Language a set of commands. Each command consists of by whitespace. The command identifier is followed by zero or more argument tokens. data, tags, blocks of commands, or test

The language consists of a set of tokens delimited the first token and it is Arguments may be literal commands.

With the exceptions of strings and comments, the language is limited to US-ASCII characters. Strings and comments may contain octets outside the US-ASCII range. Specifically, they will normally be in UTF-8, as specified in [UTF-8]. NUL (US-ASCII 0) is never permitted in scripts, while CR and LF can only appear as the CRLF line ending. Note: While this specification permits arbitrary octets to in Sieve scripts inside strings and comments, this has made difficult to robustly handle Sieve scripts in programs that sensitive to the encodings used. The "encoded-character" capability (section 2.4.2.4) provides an alternative means representing such octets in strings using just US-ASCII characters. As such, the use of non-UTF-8 text in scripts be considered a deprecated feature that may be abandoned. Tokens other than strings are considered case-insensitive. appear it are of should

Guenther & Showalter

Standards Track

[Page 6]

RFC 5228

Sieve: An Email Filtering Language

January 2008

2.2.

Whitespace

Whitespace is used to separate tokens. Whitespace is made up of tabs, newlines (CRLF, never just CR or LF), and the space character. The amount of whitespace used is not significant. 2.3. Comments

Two types of comments are offered. Comments are semantically equivalent to whitespace and can be used anyplace that whitespace is (with one exception in multi-line strings, as described in the grammar). Hash comments begin with a "#" character that is not contained within a string and continue until the next CRLF. Example: if size :over 100k { # this is a comment discard; }

Bracketed comments begin with the token "/*" and end with "*/" outside of a string. Bracketed comments may span multiple lines. Bracketed comments do not nest. Example: if size :over 100K { /* this is a comment this is still a comment */ discard /* this is a comment */ ; }

2.4.

Literal Data

Literal data means data that is not executed, merely evaluated "as is", to be used as arguments to commands. Literal data is limited to numbers, strings, and string lists. 2.4.1. Numbers

Numbers are given as ordinary decimal numbers. As a shorthand for expressing larger values, such as message sizes, a suffix of "K", "M", or "G" MAY be appended to indicate a multiple of a power of two. To be comparable with the power-of-two-based versions of SI units that computers frequently use, "K" specifies kibi-, or 1,024 (2^10) times the value of the number; "M" specifies mebi-, or 1,048,576 (2^20) times the value of the number; and "G" specifies gibi-, or 1,073,741,824 (2^30) times the value of the number [BINARY-SI].

Guenther & Showalter

Standards Track

[Page 7]

RFC 5228

Sieve: An Email Filtering Language

January 2008

Implementations MUST support integer values in the inclusive range zero to 2,147,483,647 (2^31 - 1), but MAY support larger values. Only non-negative integers are permitted by this specification. 2.4.2. Strings

Scripts involve large numbers of string values as they are used for pattern matching, addresses, textual bodies, etc. Typically, short quoted strings suffice for most uses, but a more convenient form is provided for longer strings such as bodies of messages. A quoted string starts and ends with a single double quote (the <"> character, US-ASCII 34). A backslash ("\", US-ASCII 92) inside of a quoted string is followed by either another backslash or a double quote. These two-character sequences represent a single backslash or double quote within the value, respectively. Scripts SHOULD NOT escape other characters with a backslash. An undefined escape sequence (such as "\a" in a context where "a" has no special meaning) is interpreted as if there were no backslash (in this case, "\a" is just "a"), though that may be changed by extensions. Non-printing characters are permitted in quoted lines. An unencoded NUL section 2.4.2.4 for how such as tabs, CRLF, and control characters strings. Quoted strings MAY span multiple (US-ASCII 0) is not allowed in strings; see it can be encoded.

As message header data is converted to [UTF-8] for comparison (see section 2.7.2), most string values will use the UTF-8 encoding. However, implementations MUST accept all strings that match the grammar in section 8. The ability to use non-UTF-8 encoded strings matches existing practice and has proven to be useful both in tests for invalid data and in arguments containing raw MIME parts for extension actions that generate outgoing messages. For entering larger amounts of text, such as an email message, a multi-line form is allowed. It starts with the keyword "text:", followed by a CRLF, and ends with the sequence of a CRLF, a single period, and another CRLF. The CRLF before the final period is considered part of the value. In order to allow the message to contain lines with a single dot, lines are dot-stuffed. That is, when composing a message body, an extra '.' is added before each line that begins with a '.'. When the server interprets the script, these extra dots are removed. Note that a line that begins with a dot followed by a non-dot character is not interpreted as dot-stuffed;

Guenther & Showalter

Standards Track

[Page 8]

RFC 5228

Sieve: An Email Filtering Language

January 2008

that is, ".foo" is interpreted as ".foo". However, because this is potentially ambiguous, scripts SHOULD be properly dot-stuffed so such lines do not appear. Note that a hashed comment or whitespace may occur in between the "text:" and the CRLF, but not within the string itself. Bracketed comments are not allowed here. 2.4.2.1. String Lists

When matching patterns, it is frequently convenient to match against groups of strings instead of single strings. For this reason, a list of strings is allowed in many tests, implying that if the test is true using any one of the strings, then the test is true. For instance, the test 'header :contains ["To", "Cc"] ["me@example.com", "me00@landru.example.com"]' is true if either a To header or Cc header of the input message contains either of the email addresses "me@example.com" or "me00@landru.example.com". Conversely, in single string equivalent to 'exists "To"' 2.4.2.2. Headers any case where a list of strings is allowed without being a member a list with a single member. This is equivalent to the test 'exists is appropriate, a of a list: it is means that the test ["To"]'.

Headers are a subset of strings. In the Internet Message Specification [IMAIL], each header line is allowed to have whitespace nearly anywhere in the line, including after the field name and before the subsequent colon. Extra spaces between the header name and the ":" in a header field are ignored. A header name never contains a colon. The "From" header refers to a line beginning "From:" (or "From :", etc.). No header will match the string "From:" due to the trailing colon. Similarly, no header will match a syntactically invalid header name. An implementation MUST NOT cause an error for syntactically invalid header names in tests. Header lines are unfolded as described in [IMAIL] section 2.2.3. Interpretation of header data SHOULD be done according to [MIME3] section 6.2 (see section 2.7.2 below for details).

Guenther & Showalter

Standards Track

[Page 9]

RFC 5228

Sieve: An Email Filtering Language

January 2008

2.4.2.3.

Addresses are also a outbound but are further defined in

A number of commands call for email addresses, which subset of strings. When these addresses are used in contexts, addresses must be compliant with [IMAIL], constrained within this document. Using the symbols [IMAIL], section 3, the syntax of an address is:

sieve-address = addr-spec ; simple address / phrase "<" addr-spec ">" ; name & addr-spec That is, routes and group syntax are not permitted. If multiple addresses are required, use a string list. Named groups are not permitted. It is an error for a script to execute an action with a value for use as an outbound address that doesn't match the "sieve-address" syntax. 2.4.2.4. Encoding Characters Using "encoded-character"

When the "encoded-character" extension is in effect, certain character sequences in strings are replaced by their decoded value. This happens after escape sequences are interpreted and dotunstuffing has been done. Implementations SHOULD support "encodedcharacter". Arbitrary octets can be embedded in strings by using the syntax encoded-arb-octets. The sequence is replaced by the octets with the hexadecimal values given by each hex-pair. blank encoded-arb-octets hex-pair-seq hex-pair = = = = WSP / CRLF "${hex:" hex-pair-seq "}" *blank hex-pair *(1*blank hex-pair) *blank 1*2HEXDIG

Where WSP and HEXDIG non-terminals are defined in Appendix B.1 of [ABNF]. It may be inconvenient or undesirable to enter Unicode characters verbatim, and for these cases the syntax encoded-unicode-char can be used. The sequence is replaced by the UTF-8 encoding of the specified Unicode characters, which are identified by the hexadecimal value of unicode-hex. encoded-unicode-char = "${unicode:" unicode-hex-seq "}" unicode-hex-seq = *blank unicode-hex *(1*blank unicode-hex) *blank unicode-hex = 1*HEXDIG

Guenther & Showalter

Standards Track

[Page 10]

RFC 5228

Sieve: An Email Filtering Language

January 2008

It is an error for a script to use a hexadecimal value that isn't in either the range 0 to D7FF or the range E000 to 10FFFF. (The range D800 to DFFF is excluded as those character numbers are only used as part of the UTF-16 encoding form and are not applicable to the UTF-8 encoding that the syntax here represents.) Note: Implementations MUST NOT raise an error for an out-of-range Unicode value unless the sequence containing it is well-formed according to the grammar. The capability string for use with the require command is "encodedcharacter". In the following script, message B is discarded, since the specified test string is equivalent to "$$$". Example: require "encoded-character"; if header :contains "Subject" "$${hex:24 24}" { discard; } The following examples demonstrate valid and invalid encodings and how they are handled: "$${hex:40}" "${hex: 40 }" "${HEX: 40}" "${hex:40" "${hex:400}" "${hex:4${hex:30}}" "${unicode:40}" "${ unicode:40}" "${UNICODE:40}" "${UnICoDE:0000040}" "${Unicode:40}" "${Unicode:Cool}" "${unicode:200000}" "${Unicode:DF01} 2.5. Tests -> -> -> -> -> -> -> -> -> -> -> -> -> -> "$@" "@" "@" "${hex:40" "${hex:400}" "${hex:40}" "@" "${ unicode:40}" "@" "@" "@" "${Unicode:Cool}" error error

Tests are given as arguments to commands in order to control their actions. In this document, tests are given to if/elsif to decide which block of code is run.

Guenther & Showalter

Standards Track

[Page 11]

RFC 5228

Sieve: An Email Filtering Language

January 2008

2.5.1.

Test Lists which implement logical "and" and require more than a single test as an element provides a way of grouping in parentheses.

Some tests ("allof" and "anyof", logical "or", respectively) may argument. The test-list syntax tests as a comma-separated list Example:

if anyof (not exists ["From", "Date"], header :contains "from" "fool@example.com") { discard; }

2.6.

Arguments

In order to specify what to do, most commands take arguments. There are three types of arguments: positional, tagged, and optional. It is an error for a script, on a single command, to use conflicting arguments or to use a tagged or optional argument more than once. 2.6.1. Positional Arguments

Positional arguments are given to a command that discerns their meaning based on their order. When a command takes positional arguments, all positional arguments must be supplied and must be in the order prescribed. 2.6.2. Tagged Arguments

This document provides for tagged arguments in the style of CommonLISP. These are also similar to flags given to commands in most command-line systems. A tagged argument is an argument for a command that begins with ":" followed by a tag naming the argument, such as ":contains". This argument means that zero or more of the next tokens have some particular meaning depending on the argument. These next tokens may be literal data, but they are never blocks. Tagged arguments are similar to positional arguments, except that instead of the meaning being derived from the command, it is derived from the tag. Tagged arguments must appear before positional arguments, but they may appear in any order with other tagged arguments. For simplicity of the specification, this is not expressed in the syntax definitions

Guenther & Showalter

Standards Track

[Page 12]

RFC 5228

Sieve: An Email Filtering Language

January 2008

with commands, but they still may be reordered arbitrarily provided they appear before positional arguments. Tagged arguments may be mixed with optional arguments. Tagged arguments SHOULD NOT take tagged arguments as arguments. 2.6.3. Optional Arguments are exactly like tagged arguments except that they which case a default value is implied. Because tend to result in shorter scripts, they have been tagged arguments.

Optional arguments may be left out, in optional arguments used far more than

One particularly noteworthy case is the ":comparator" argument, which allows the user to specify which comparator [COLLATION] will be used to compare two strings, since different languages may impose different orderings on UTF-8 [UTF-8] strings. 2.6.4. Types of Arguments

Abstractly, arguments may be literal data, tests, or blocks of commands. In this way, an "if" control structure is merely a command that happens to take a test and a block as arguments and may execute the block of code. However, this abstraction is ambiguous from a parsing standpoint. The grammar in section 8.2 presents a parsable version of this: Arguments are string lists (string-lists), numbers, and tags, which may be followed by a test or a test list (test-list), which may be followed by a block of commands. No more than one test or test list, or more than one block of commands, may be used, and commands that end with a block of commands do not end with semicolons. 2.7. String Comparison

When matching one string against another, there are a number of ways of performing the match operation. These are accomplished with three types of matches: an exact match, a substring match, and a wildcard glob-style match. These are described below. In order to provide for matches between character sets and case insensitivity, Sieve uses the comparators defined in the Internet Application Protocol Collation Registry [COLLATION].

Guenther & Showalter

Standards Track

[Page 13]

RFC 5228

Sieve: An Email Filtering Language

January 2008

However, when a string represents the name of a header, the comparator is never user-specified. Header comparisons are always done with the "i;ascii-casemap" operator, i.e., case-insensitive comparisons, because this is the way things are defined in the message specification [IMAIL]. 2.7.1. Match Type

Commands that perform string comparisons may have an optional match type argument. The three match types in this specification are ":contains", ":is", and ":matches". The ":contains" match type describes a substring match. If the value argument contains the key argument as a substring, the match is true. For instance, the string "frobnitzm" contains "frob" and "nit", but not "fbm". The empty key ("") is contained in all values. The ":is" match type describes an absolute the first string are absolutely the same as second string, they match. Only the string "frobnitzm". The empty key ("") only ":is" value. match; if the the contents "frobnitzm" matches with contents of of the is the string the empty

The ":matches" match type specifies a wildcard match using the characters "*" and "?"; the entire value must be matched. "*" matches zero or more characters in the value and "?" matches a single character in the value, where the comparator that is used (see section 2.7.3) defines what a character is. For example, the comparators "i;octet" and "i;ascii-casemap" define a character to be a single octet, so "?" will always match exactly one octet when one of those comparators is in use. In contrast, a Unicode-based comparator would define a character to be any UTF-8 octet sequence encoding one Unicode character and thus "?" may match more than one octet. "?" and "*" may be escaped as "\\?" and "\\*" in strings to match against themselves. The first backslash escapes the second backslash; together, they escape the "*". This is awkward, but it is commonplace in several programming languages that use globs and regular expressions. In order to specify what type of match is supposed to happen, commands that support matching take optional arguments ":matches", ":is", and ":contains". Commands default to using ":is" matching if no match type argument is supplied. Note that these modifiers interact with comparators; in particular, only comparators that support the "substring match" operation are suitable for matching with ":contains" or ":matches". It is an error to use a comparator with ":contains" or ":matches" that is not compatible with it.

Guenther & Showalter

Standards Track

[Page 14]

RFC 5228

Sieve: An Email Filtering Language

January 2008

It is an error to give more than one of these arguments to a given command. For convenience, the "MATCH-TYPE" syntax element is defined here as follows: Syntax: 2.7.2. ":is" / ":contains" / ":matches"

Comparisons across Character Sets

Messages may involve a number of character sets. In order for comparisons to work across character sets, implementations SHOULD implement the following behavior: Comparisons are performed on octets. Implementations convert text from header fields in all charsets [MIME3] to Unicode, encoded as UTF-8, as input to the comparator (see section 2.7.3). Implementations MUST be capable of converting US-ASCII, ISO-88591, the US-ASCII subset of ISO-8859-* character sets, and UTF-8. Text that the implementation cannot convert to Unicode for any reason MAY be treated as plain US-ASCII (including any [MIME3] syntax) or processed according to local conventions. An encoded NUL octet (character zero) SHOULD NOT cause early termination of the header content being compared against. If implementations fail to support the above behavior, they MUST conform to the following: No two strings can be considered equal if one contains octets greater than 127. 2.7.3. Comparators

In order to allow for language-independent, case-independent matches, the match type may be coupled with a comparator name. The Internet Application Protocol Collation Registry [COLLATION] provides the framework for describing and naming comparators. All implementations MUST support the "i;octet" comparator (simply compares octets) and the "i;ascii-casemap" comparator (which treats uppercase and lowercase characters in the US-ASCII subset of UTF-8 as the same). If left unspecified, the default is "i;ascii-casemap". Some comparators may not be usable with substring matches; that is, they ma