Chapter 22 20 min read

Regex

stdlib::regex is a pure-Glide, PCRE-like regular-expression engine built on a backtracking bytecode VM. You compile a pattern into a *Regex, then test, search, capture, replace, or split strings with it. Patterns operate on raw bytes.

Import

import stdlib::regex::*;

At a glance

Item	Kind	Summary
`Regex`	struct	A compiled pattern (`pattern`, `flags`, `n_groups` are `pub`).
`Match`	struct	One match + its capture spans (`input`, `start`, `end` are `pub`).
`CharClass`	struct	A set of inclusive byte ranges, optionally negated.
`Regex::compile` / `compile_with`	fn	Build a `Regex` (returns `!Regex`).
`matches` / `matches_full`	method	Boolean test (substring / whole-string).
`find` / `find_at` / `find_all`	method	Locate matches.
`replace` / `replace_all` / `replace_with`	method	Substitute matches.
`split`	method	Split on matches.
`full` / `group` / `group_opt` / `named` / `named_opt` / `n_groups`	method	Read a `*Match`.
`regex_matches` / `regex_find` / `regex_replace_all`	fn	One-shot helpers (compile every call).
`RxNode` / `RxParser` / `RxCompiler`	struct	Internal AST/parser/compiler (exported, no stable API).
`RX_*`	const	Opcode / flag / AST / assertion-kind / limit constants.

Compiling patterns

A pattern is compiled once into a *Regex and reused for every operation. Compilation parses the pattern, reports syntax errors as the err of the !*Regex, and produces bytecode for the VM. The whole match is wrapped in an implicit group 0, so start/end/full() are always available.

`Regex::compile` / `Regex::compile_with`

Function	Signature	Description
`compile`	`fn compile(pattern: string) -> !*Regex`	Compile with no flags.
`compile_with`	`fn compile_with(pattern: string, flag_str: string) -> !*Regex`	Compile with a flag string (any subset of `imsxU`).

import stdlib::regex::*;

fn main() -> i32 {
    let r: !*Regex = Regex::compile("(\\d+)-(\\d+)");
    if !r.ok {
        println!("bad pattern:", r.err);
        return 1;
    }
    let re: *Regex = r.val;

    if re.matches("phone: 555-1234") {
        println!("hit");
    }
    return 0;
}

compile_with turns each flag character into a flag bit; an unknown character returns err("unknown flag in flag string"). Compilation also rejects malformed patterns (err("expected ')'"), "unterminated character class", "trailing characters in pattern", and so on).

// Case-insensitive (i) + multiline (m).
let r: !*Regex = Regex::compile_with("^err", "im");

import stdlib::regex::*;

fn main() -> i32 {
    // A malformed pattern surfaces in .err.
    let bad: !*Regex = Regex::compile("(unclosed");
    if !bad.ok { println!("err:", bad.err); }       // expected ')'

    // Unknown flag.
    let bf: !*Regex = Regex::compile_with("x", "q");
    if !bf.ok { println!("flagerr:", bf.err); }     // unknown flag in flag string
    return 0;
}

The `Regex` struct

pub struct Regex {
    pub pattern: string,   // the original source pattern
    pub flags: i32,        // resolved flag bits (see RX_FLAG_*)
    pub n_groups: i32,     // number of capturing groups (excludes group 0)
    // ... internal bytecode/classes/name table
}

The three pub fields are read-only metadata you can inspect after compiling. flags reflects any flags resolved during parsing, including inline (?i)-style settings at top level.

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("(?<a>\\d)(?<b>\\d)").val;
    println!("pattern", re.pattern);    // (?<a>\d)(?<b>\d)
    println!("flags", re.flags);        // 0
    println!("n_groups", re.n_groups);  // 2
    return 0;
}

Testing for a match

Method	Signature	Description
`matches`	`fn matches(self: *Regex, s: string) -> bool`	`true` if any substring matches.
`matches_full`	`fn matches_full(self: *Regex, s: string) -> bool`	`true` only if the pattern matches the entire string.

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("\\d+").val;
    if re.matches("hi 42 there") { println!("matches substring"); } // true
    if !re.matches("nothing")    { println!("no digits"); }         // true

    let rf: *Regex = Regex::compile("[a-z]+").val;
    if rf.matches_full("hello")        { println!("full ok"); }  // true
    if !rf.matches_full("hello world") { println!("full no"); }  // space/'world' unmatched
    return 0;
}

matches is find(s).has. matches_full succeeds only when the first match starts at offset 0 and ends at s.len() — it does not anchor the pattern, so a leftmost-shorter match can make it return false even when a full-length match exists. Anchor explicitly with ^...$ (or \A...\z) if you need that.

Searching and finding

Method	Signature	Description
`find`	`fn find(self: Regex, s: string) -> ?Match`	First match, or `none()`.
`find_at`	`fn find_at(self: Regex, s: string, from: i32) -> ?Match`	First match starting at byte offset `from`.
`find_all`	`fn find_all(self: Regex, s: string) -> Vector<*Match>`	All non-overlapping matches (empty matches advance by one byte).

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("\\d+").val;

    // find_all: every non-overlapping match, with spans.
    let all: *Vector<*Match> = re.find_all("a12 b3 c456");
    for let i: i32 = 0; i < all.len(); i++ {
        let m: *Match = all.get(i);
        println!("match", i, m.full(), m.start, m.end);
    }

    // find_at: start searching at an offset.
    match re.find_at("a12 b3", 3) {
        some(m) => println!("from 3:", m.full(), m.start),  // 3 at offset 4
        none()  => println!("none"),
    }

    // find: no match -> none().
    match re.find("no digits here") {
        some(m) => println!("found", m.full()),
        none()  => println!("no match"),
    }
    return 0;
}

find is find_at(s, 0). Both scan forward from the start offset, trying the VM at each byte position (leftmost match wins). find_all repeatedly calls find_at, advancing past each match; a zero-width match advances by one byte so the loop terminates.

The `Match` result and captures

A *Match describes one match and its capture groups. Group 0 is the whole match; groups 1..n are the parenthesized captures in order.

pub struct Match {
    pub input: string,   // the string that was searched
    pub start: i32,      // byte offset of the match start
    pub end: i32,        // byte offset just past the match
    // ... capture spans + name table
}

Method	Signature	Description
`full`	`fn full(self: *Match) -> string`	The full matched substring (same as `group(0)`).
`group`	`fn group(self: *Match, i: i32) -> string`	Capture `i` (`0` = full match); `""` for missing/uncaptured.
`group_opt`	`fn group_opt(self: *Match, i: i32) -> ?string`	`some` if group `i` participated, else `none()`.
`named`	`fn named(self: *Match, name: string) -> string`	Named-capture lookup; `""` if absent.
`named_opt`	`fn named_opt(self: *Match, name: string) -> ?string`	`some` if the named group participated, else `none()`.
`n_groups`	`fn n_groups(self: *Match) -> i32`	Number of capturing groups (excluding group 0).

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("(?<year>\\d{4})-(?<month>\\d{2})").val;

    match re.find("date 2026-05 end") {
        some(m) => {
            println!("full:", m.group(0), m.full());          // 2026-05  2026-05
            println!("g1:", m.group(1), "g2:", m.group(2));   // 2026  05
            println!("year:", m.named("year"), m.named("month"));
            println!("groups:", m.n_groups());                // 2
            println!("span:", m.start, m.end);
            println!("input:", m.input);
        }
        none() => println!("no match"),
    }
    return 0;
}

import stdlib::regex::*;

fn main() -> i32 {
    // (b)? did not participate when matching just "a".
    let re: *Regex = Regex::compile("(a)(b)?").val;
    match re.find("a") {
        some(m) => {
            let g2: ?string = m.group_opt(2);
            if g2.has { println!("g2 present"); } else { println!("g2 missing"); }
            println!("g1", m.group(1));   // a
        }
        none() => println!("no match"),
    }

    // (x*) DID participate but captured the empty string.
    let re2: *Regex = Regex::compile("(a)(x*)").val;
    match re2.find("a") {
        some(m) => {
            let g2: ?string = m.group_opt(2);
            if g2.has { println!("g2 empty-but-present:", g2.val.len()); }  // 0
        }
        none() => {}
    }
    return 0;
}

Replacing

Method	Signature	Description
`replace`	`fn replace(self: *Regex, s: string, repl: string) -> string`	Replace the first match.
`replace_all`	`fn replace_all(self: *Regex, s: string, repl: string) -> string`	Replace every non-overlapping match.
`replace_with`	`fn replace_with(self: Regex, s: string, f: fn(Match) -> string) -> string`	Replace each match with the result of `f(match)`.

The replacement string (replace / replace_all) supports backreferences and a literal-$ escape:

Token	Expands to
`$0` .. `$9`	The corresponding capture group (`$0` = full match).
`${name}`	The named capture group `name`.
`$$`	A literal `$`.

Numbered and $$ tokens are easy to write as Glide string literals:

import stdlib::regex::*;

fn main() -> i32 {
    // $$ escapes a literal '$'; $1 is the first group.
    let price: *Regex = Regex::compile("(\\d+)").val;
    println!(price.replace_all("a5 b6", "$$$1"));     // a$5 b$6

    // Reorder groups.
    let pair: *Regex = Regex::compile("(\\w+)=(\\w+)").val;
    println!(pair.replace("k=v rest", "$2:$1"));      // v:k rest

    // $0 = whole match. replace touches only the first hit.
    let w: *Regex = Regex::compile("\\w+").val;
    println!(w.replace("hi there", "[$0]"));          // [hi] there
    return 0;
}

import stdlib::regex::*;

fn main() -> i32 {
    let date: *Regex =
        Regex::compile("(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})").val;
    let d: string = "$";
    let repl: string =
        d.concat("{d}/").concat(d).concat("{m}/").concat(d).concat("{y}");
    println!(date.replace("on 2026-05-30 ok", repl));   // on 30/05/2026 ok
    return 0;
}

Numbered groups ($1) and $$ have no such conflict. If named substitution is awkward, prefer replace_with and read m.named(...) in the callback.

replace_with calls your function for each match and substitutes the returned string verbatim (no $-expansion):

import stdlib::regex::*;

fn mask(m: *Match) -> string {
    let n: i32 = m.full().len();
    let mut s: string = "";
    for let i: i32 = 0; i < n; i++ { s = s.concat("*"); }
    return s;
}

fn main() -> i32 {
    let digits: *Regex = Regex::compile("\\d+").val;
    println!(digits.replace_with("card 4242 9999", mask));  // card **** ****
    return 0;
}

If there is no match, replace/replace_all/replace_with all return the input string unchanged.

Splitting

`split`

fn split(self: *Regex, s: string) -> *Vector<string>

Splits s on every non-overlapping match of the regex, returning the pieces between matches. The result always has at least one element (the whole string when nothing matches).

import stdlib::regex::*;

fn main() -> i32 {
    let sep: *Regex = Regex::compile(",|;").val;
    let parts: *Vector<string> = sep.split("alpha,beta;gamma");
    for let i: i32 = 0; i < parts.len(); i++ {
        println!("part", i, parts.get(i));     // alpha / beta / gamma
    }

    // \s+ collapses runs of whitespace into single separators.
    let ws: *Regex = Regex::compile("\\s+").val;
    let words: *Vector<string> = ws.split("the   quick  fox");
    println!("nwords", words.len());           // 3

    // No match -> single-element vector (the whole string).
    let z: *Regex = Regex::compile("z").val;
    let one: *Vector<string> = z.split("abc");
    println!("one", one.len(), one.get(0));    // 1  abc
    return 0;
}

Free-function convenience API

One-shot helpers that compile the pattern on every call. Prefer keeping a *Regex around for hot loops.

Function	Signature	Description
`regex_matches`	`fn regex_matches(pat: string, s: string) -> bool`	`Regex::compile(pat).val.matches(s)`; `false` on bad pattern.
`regex_find`	`fn regex_find(pat: string, s: string) -> ?*Match`	`Regex::compile(pat).val.find(s)`; `none()` on bad pattern.
`regex_replace_all`	`fn regex_replace_all(pat: string, s: string, repl: string) -> string`	`Regex::compile(pat).val.replace_all(s, repl)`; returns `s` on bad pattern.

import stdlib::regex::*;

fn main() -> i32 {
    if regex_matches("\\d+", "abc 42") { println!("matched"); }

    match regex_find("(\\w+)@(\\w+)", "x@y") {
        some(m) => println!(m.group(1), m.group(2)),   // x  y
        none()  => {},
    }

    println!(regex_replace_all("a1b2c3", "\\d", "#"));  // a#b#c#

    // Bad pattern: silently returns false / none() / the input.
    if !regex_matches("(unclosed", "x") { println!("bad pattern swallowed"); }
    return 0;
}

Supported syntax

Literals and escapes

Most characters match themselves. Backslash escapes:

Escape	Matches
`\n` `\t` `\r` `\f` `\v`	newline, tab, return, form-feed, vertical-tab
`\a` `\e` `\0`	bell (0x07), escape (0x1B), NUL (0x00)
`\xHH`	the byte with hex value `HH` (e.g. `\x41` = `A`)
`\.` `\\` `\(` ...	a literal metacharacter

Character classes

Syntax	Matches
`[abc]`	any of `a`, `b`, `c`
`[a-z]`	a byte range
`[^a-z]`	negation (any byte not in the set)
`[a-zA-Z0-9_]`	union of ranges
`[\d\s]`	predefined classes are allowed inside `[...]`
`[]a]`	a `]` placed first is a literal `]`

Predefined classes (usable bare or inside [...]):

Class	Matches	Negated
`\d`	digits `0-9`	`\D`
`\w`	word bytes `[A-Za-z0-9_]`	`\W`
`\s`	whitespace `\t\n\v\f\r` and space	`\S`
`.`	any byte except `\n` (any byte with the `s` flag)	—

import stdlib::regex::*;

fn main() -> i32 {
    let hex: *Regex = Regex::compile("[0-9a-fA-F]+").val;
    println!(hex.find("DEADbeef!").val.full());    // DEADbeef

    let neg: *Regex = Regex::compile("[^aeiou ]+").val;
    println!(neg.find("the fox").val.full());      // th

    let byte: *Regex = Regex::compile("\\x41\\x42").val;  // matches "AB"
    if byte.matches("xABy") { println!("hex byte ok"); }

    let mix: *Regex = Regex::compile("[\\d.]+").val;       // digits or dot
    println!(mix.find("v3.14!").val.full());               // 3.14
    return 0;
}

Quantifiers

Quantifier	Repetitions	Lazy form
`*`	0 or more	`*?`
`+`	1 or more	`+?`
`?`	0 or 1	`??`
`{n}`	exactly `n`	`{n}?`
`{n,}`	`n` or more	`{n,}?`
`{n,m}`	between `n` and `m`	`{n,m}?`

Quantifiers are greedy by default; appending ? makes them lazy. The U flag (or inline (?U)) flips greediness globally.

import stdlib::regex::*;

fn main() -> i32 {
    let greedy: *Regex = Regex::compile("<.+>").val;
    println!(greedy.find("<a><b>").val.full());     // <a><b>

    let lazy: *Regex = Regex::compile("<.+?>").val;
    println!(lazy.find("<a><b>").val.full());        // <a>

    let bounded: *Regex = Regex::compile("a{2,3}").val;
    println!(bounded.find("aaaa").val.full());       // aaa

    let exact: *Regex = Regex::compile("\\d{3}").val;
    println!(exact.find("12345").val.full());        // 123

    // The U flag flips default greediness: <.+> behaves like <.+?>.
    let ung: *Regex = Regex::compile_with("<.+>", "U").val;
    println!(ung.find("<a><b>").val.full());         // <a>
    return 0;
}

Groups, alternation, anchors

Syntax	Meaning
`(...)`	capturing group
`(?:...)`	non-capturing group
`(?<name>...)` / `(?P<name>...)`	named capturing group
`a\	b\	c`	alternation
`^` `$`	start / end of string (or line with the `m` flag)
`\A` `\z` `\Z`	absolute start / end of string (`\Z` is treated as `\z`)
`\b` `\B`	word boundary / non-boundary

import stdlib::regex::*;

fn main() -> i32 {
    let alt: *Regex = Regex::compile("cat|dog|bird").val;
    println!(alt.find("I have a dog").val.full());   // dog

    let anch: *Regex = Regex::compile("^\\d+$").val;
    if anch.matches("12345")  { println!("all digits"); }
    if !anch.matches("12a45") { println!("not all"); }

    // \b: whole word, not a substring.
    let word: *Regex = Regex::compile("\\bcat\\b").val;
    if word.matches("a cat sat")  { println!("whole word"); }
    if !word.matches("category")  { println!("no substring"); }

    // m flag: ^ matches after each newline.
    let ml: *Regex = Regex::compile_with("^x", "m").val;
    let hits: *Vector<*Match> = ml.find_all("ax\nx\nx");
    println!("ml hits", hits.len());                 // 2
    return 0;
}

Backreferences

Syntax	Meaning
`\1` .. `\9`	match the same text a numbered group captured
`\k<name>`	match the same text a named group captured

Lookaround

Syntax	Meaning
`(?=...)`	positive lookahead
`(?!...)`	negative lookahead
`(?<=...)`	positive lookbehind
`(?<!...)`	negative lookbehind

import stdlib::regex::*;

fn main() -> i32 {
    // Positive lookahead: foo only when followed by bar (bar not consumed).
    let la: *Regex = Regex::compile("foo(?=bar)").val;
    println!(la.find("foobar").val.full());        // foo

    // Negative lookahead.
    let nla: *Regex = Regex::compile("\\d+(?!px)").val;
    if nla.matches("10em") { println!("nla ok"); }

    // Lookbehind: digits preceded by '$'.
    let lb: *Regex = Regex::compile("(?<=\\$)\\d+").val;
    println!(lb.find("price $42 yen").val.full());  // 42

    // Backreference: a doubled word.
    let dup: *Regex = Regex::compile("\\b(\\w+)\\s+\\1\\b").val;
    if dup.matches("the the cat") { println!("dup"); }

    // Named backreference: matching open/close tags.
    let tag: *Regex = Regex::compile("<(?<t>\\w+)>.*?</\\k<t>>").val;
    if tag.matches("<b>hi</b>") { println!("tag ok"); }
    return 0;
}

Flags

Pass as a flag string to compile_with, or inline in the pattern via (?flags) (sets flags from that point) or (?flags:subpattern) (scoped); (?flags-flags:...) turns flags off within the scope.

Flag	Constant	Effect
`i`	`RX_FLAG_I`	case-insensitive (ASCII only)
`m`	`RX_FLAG_M`	`^`/`$` match at line breaks
`s`	`RX_FLAG_S`	`.` matches `\n` (dot-all)
`x`	`RX_FLAG_X`	extended: unescaped whitespace and `#` comments ignored
`U`	`RX_FLAG_U`	ungreedy: flip default greediness

import stdlib::regex::*;

fn main() -> i32 {
    // (?i:...) — case-insensitive only inside the group.
    let scoped: *Regex = Regex::compile("(?i:hello) world").val;
    if scoped.matches("HELLO world")   { println!("scoped i"); }
    if !scoped.matches("HELLO WORLD")  { println!("outside still sensitive"); }

    // (?x) — extended: whitespace and # comments ignored.
    let ext: *Regex = Regex::compile("(?x) \\d+  # the number \n -  \\d+").val;
    if ext.matches("12-34") { println!("extended ok"); }
    return 0;
}

import stdlib::regex::*;

fn main() -> i32 {
    // Inline flags at top level + lookahead.
    let re: *Regex = Regex::compile("(?i)foo(?=bar)").val;
    if re.matches("FOObar") { println!("lookahead ok"); }
    return 0;
}

Lower-level building blocks

These public items back the engine. Most programs never touch them directly, but they are exported and documented here for completeness.

`CharClass`

A set of inclusive byte ranges, optionally negated. Used internally for classes; you can build one by hand.

pub struct CharClass {
    ranges: *Vector<i32>,   // [lo1, hi1, lo2, hi2, ...] inclusive
    negated: bool,
}

Method	Signature	Description
`new`	`fn new() -> *CharClass`	Empty, non-negated class.
`add`	`fn add(self: *CharClass, lo: i32, hi: i32)`	Add an inclusive byte range.
`contains`	`fn contains(self: *CharClass, b: i32) -> bool`	Test byte `b` (respects `negated`).

import stdlib::regex::*;

fn main() -> i32 {
    let cc: *CharClass = CharClass::new();
    cc.add(48, 57);   // '0'..'9'
    cc.add(65, 70);   // 'A'..'F'
    if cc.contains(53)  { println!("'5' in class"); }   // true
    if !cc.contains(103) { println!("'g' not in class"); } // true
    return 0;
}

Exported constants

These name the internal opcodes, AST node kinds, flag bits, assertion kinds, and limits. They are mainly of interest when inspecting re.flags or extending the engine.

Group	Constants
Flag bits	`RX_FLAG_I` (1), `RX_FLAG_M` (2), `RX_FLAG_S` (4), `RX_FLAG_X` (8), `RX_FLAG_U` (16)
Assertion kinds	`RX_AHEAD_POS` (0), `RX_AHEAD_NEG` (1), `RX_BEHIND_POS` (2), `RX_BEHIND_NEG` (3)
Limits	`RX_MAX_GROUPS` (64)
VM opcodes	`RX_OP_CHAR`, `RX_OP_ANY`, `RX_OP_ANY_NL`, `RX_OP_CLASS`, `RX_OP_BOL`, `RX_OP_EOL`, `RX_OP_STR_BEG`, `RX_OP_STR_END`, `RX_OP_WORDB`, `RX_OP_NWORDB`, `RX_OP_JMP`, `RX_OP_SPLIT`, `RX_OP_SAVE`, `RX_OP_BACKREF`, `RX_OP_ASSERT`, `RX_OP_ASRT_END`, `RX_OP_MATCH`
AST node kinds	`RX_AST_LIT`, `RX_AST_ANY`, `RX_AST_CLASS`, `RX_AST_CONCAT`, `RX_AST_ALT`, `RX_AST_QUANT`, `RX_AST_GROUP`, `RX_AST_ANCHOR`, `RX_AST_BACK`, `RX_AST_LOOK`

The flag bits are OR-able, so you can test re.flags:

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile_with("abc", "is").val;
    if (re.flags & RX_FLAG_I) != 0 { println!("case-insensitive"); }
    if (re.flags & RX_FLAG_S) != 0 { println!("dot-all"); }
    return 0;
}

Internal types

RxNode (AST node), RxParser (pattern parser), and RxCompiler (AST-to-bytecode compiler) are exported structs used internally by compile. They have no stable public method surface and should be treated as implementation detail — do not depend on their fields.

Unsupported / known limitations

Documented honestly so you do not reach for features that silently behave differently:

Feature	Status
Unicode property classes (`\p{...}`), Unicode-aware `\w`/`.`/case-fold	Not supported — engine is byte-oriented, `i` folds ASCII only.
POSIX bracket classes (`[[:alpha:]]`)	Not supported — use explicit ranges or `\d`/`\w`/`\s`.
Atomic groups `(?>...)`, possessive quantifiers (`a++`, `a*+`)	Not supported.
Conditionals `(?(1)...)`, recursion `(?R)`, subroutine calls `(?1)`	Not supported.
Comments via `(?#...)`	Not supported (use the `x` flag with `#`).
`\Z` semantics (before a trailing newline)	Treated identically to `\z` (absolute end).
Octal escapes beyond `\0`, `\cX` control escapes, `\Q...\E`	Not supported.
`${name}` replacement token as a bare Glide string literal	Collides with Glide string interpolation — see the callout under Replacing.

Import

At a glance

Compiling patterns

Regex::compile / Regex::compile_with

The Regex struct

Testing for a match

Searching and finding

The Match result and captures

Replacing

Splitting

split

Free-function convenience API

Supported syntax

Literals and escapes

Character classes

Quantifiers

Groups, alternation, anchors

Backreferences

Lookaround

Flags

Lower-level building blocks

CharClass

Exported constants

Internal types

Unsupported / known limitations

See also

`Regex::compile` / `Regex::compile_with`

The `Regex` struct

The `Match` result and captures

`split`

`CharClass`