Chapter 22 20 min read

Regex

stdlib::regex is a pure-Glide, PCRE-like regular-expression engine built on a backtracking bytecode VM. You compile a pattern into a *Regex, then test, search, capture, replace, or split strings with it. Patterns operate on raw bytes.

Import

import stdlib::regex::*;

At a glance

Item Kind Summary
Regex struct A compiled pattern (pattern, flags, n_groups are pub).
Match struct One match + its capture spans (input, start, end are pub).
CharClass struct A set of inclusive byte ranges, optionally negated.
Regex::compile / compile_with fn Build a *Regex (returns !*Regex).
matches / matches_full method Boolean test (substring / whole-string).
find / find_at / find_all method Locate matches.
replace / replace_all / replace_with method Substitute matches.
split method Split on matches.
full / group / group_opt / named / named_opt / n_groups method Read a *Match.
regex_matches / regex_find / regex_replace_all fn One-shot helpers (compile every call).
RxNode / RxParser / RxCompiler struct Internal AST/parser/compiler (exported, no stable API).
RX_* const Opcode / flag / AST / assertion-kind / limit constants.

Compiling patterns

A pattern is compiled once into a *Regex and reused for every operation. Compilation parses the pattern, reports syntax errors as the err of the !*Regex, and produces bytecode for the VM. The whole match is wrapped in an implicit group 0, so start/end/full() are always available.

Regex::compile / Regex::compile_with

Function Signature Description
compile fn compile(pattern: string) -> !*Regex Compile with no flags.
compile_with fn compile_with(pattern: string, flag_str: string) -> !*Regex Compile with a flag string (any subset of imsxU).
import stdlib::regex::*;

fn main() -> i32 {
    let r: !*Regex = Regex::compile("(\\d+)-(\\d+)");
    if !r.ok {
        println!("bad pattern:", r.err);
        return 1;
    }
    let re: *Regex = r.val;

    if re.matches("phone: 555-1234") {
        println!("hit");
    }
    return 0;
}

compile_with turns each flag character into a flag bit; an unknown character returns err("unknown flag in flag string"). Compilation also rejects malformed patterns (err("expected ')'"), "unterminated character class", "trailing characters in pattern", and so on).

// Case-insensitive (i) + multiline (m).
let r: !*Regex = Regex::compile_with("^err", "im");
import stdlib::regex::*;

fn main() -> i32 {
    // A malformed pattern surfaces in .err.
    let bad: !*Regex = Regex::compile("(unclosed");
    if !bad.ok { println!("err:", bad.err); }       // expected ')'

    // Unknown flag.
    let bf: !*Regex = Regex::compile_with("x", "q");
    if !bf.ok { println!("flagerr:", bf.err); }     // unknown flag in flag string
    return 0;
}

The Regex struct

pub struct Regex {
    pub pattern: string,   // the original source pattern
    pub flags: i32,        // resolved flag bits (see RX_FLAG_*)
    pub n_groups: i32,     // number of capturing groups (excludes group 0)
    // ... internal bytecode/classes/name table
}

The three pub fields are read-only metadata you can inspect after compiling. flags reflects any flags resolved during parsing, including inline (?i)-style settings at top level.

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("(?<a>\\d)(?<b>\\d)").val;
    println!("pattern", re.pattern);    // (?<a>\d)(?<b>\d)
    println!("flags", re.flags);        // 0
    println!("n_groups", re.n_groups);  // 2
    return 0;
}

Testing for a match

Method Signature Description
matches fn matches(self: *Regex, s: string) -> bool true if any substring matches.
matches_full fn matches_full(self: *Regex, s: string) -> bool true only if the pattern matches the entire string.
import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("\\d+").val;
    if re.matches("hi 42 there") { println!("matches substring"); } // true
    if !re.matches("nothing")    { println!("no digits"); }         // true

    let rf: *Regex = Regex::compile("[a-z]+").val;
    if rf.matches_full("hello")        { println!("full ok"); }  // true
    if !rf.matches_full("hello world") { println!("full no"); }  // space/'world' unmatched
    return 0;
}

matches is find(s).has. matches_full succeeds only when the first match starts at offset 0 and ends at s.len() — it does not anchor the pattern, so a leftmost-shorter match can make it return false even when a full-length match exists. Anchor explicitly with ^...$ (or \A...\z) if you need that.

Searching and finding

Method Signature Description
find fn find(self: *Regex, s: string) -> ?*Match First match, or none().
find_at fn find_at(self: *Regex, s: string, from: i32) -> ?*Match First match starting at byte offset from.
find_all fn find_all(self: *Regex, s: string) -> *Vector<*Match> All non-overlapping matches (empty matches advance by one byte).
import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("\\d+").val;

    // find_all: every non-overlapping match, with spans.
    let all: *Vector<*Match> = re.find_all("a12 b3 c456");
    for let i: i32 = 0; i < all.len(); i++ {
        let m: *Match = all.get(i);
        println!("match", i, m.full(), m.start, m.end);
    }

    // find_at: start searching at an offset.
    match re.find_at("a12 b3", 3) {
        some(m) => println!("from 3:", m.full(), m.start),  // 3 at offset 4
        none()  => println!("none"),
    }

    // find: no match -> none().
    match re.find("no digits here") {
        some(m) => println!("found", m.full()),
        none()  => println!("no match"),
    }
    return 0;
}

find is find_at(s, 0). Both scan forward from the start offset, trying the VM at each byte position (leftmost match wins). find_all repeatedly calls find_at, advancing past each match; a zero-width match advances by one byte so the loop terminates.

The Match result and captures

A *Match describes one match and its capture groups. Group 0 is the whole match; groups 1..n are the parenthesized captures in order.

pub struct Match {
    pub input: string,   // the string that was searched
    pub start: i32,      // byte offset of the match start
    pub end: i32,        // byte offset just past the match
    // ... capture spans + name table
}
Method Signature Description
full fn full(self: *Match) -> string The full matched substring (same as group(0)).
group fn group(self: *Match, i: i32) -> string Capture i (0 = full match); "" for missing/uncaptured.
group_opt fn group_opt(self: *Match, i: i32) -> ?string some if group i participated, else none().
named fn named(self: *Match, name: string) -> string Named-capture lookup; "" if absent.
named_opt fn named_opt(self: *Match, name: string) -> ?string some if the named group participated, else none().
n_groups fn n_groups(self: *Match) -> i32 Number of capturing groups (excluding group 0).
import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile("(?<year>\\d{4})-(?<month>\\d{2})").val;

    match re.find("date 2026-05 end") {
        some(m) => {
            println!("full:", m.group(0), m.full());          // 2026-05  2026-05
            println!("g1:", m.group(1), "g2:", m.group(2));   // 2026  05
            println!("year:", m.named("year"), m.named("month"));
            println!("groups:", m.n_groups());                // 2
            println!("span:", m.start, m.end);
            println!("input:", m.input);
        }
        none() => println!("no match"),
    }
    return 0;
}
import stdlib::regex::*;

fn main() -> i32 {
    // (b)? did not participate when matching just "a".
    let re: *Regex = Regex::compile("(a)(b)?").val;
    match re.find("a") {
        some(m) => {
            let g2: ?string = m.group_opt(2);
            if g2.has { println!("g2 present"); } else { println!("g2 missing"); }
            println!("g1", m.group(1));   // a
        }
        none() => println!("no match"),
    }

    // (x*) DID participate but captured the empty string.
    let re2: *Regex = Regex::compile("(a)(x*)").val;
    match re2.find("a") {
        some(m) => {
            let g2: ?string = m.group_opt(2);
            if g2.has { println!("g2 empty-but-present:", g2.val.len()); }  // 0
        }
        none() => {}
    }
    return 0;
}

Replacing

Method Signature Description
replace fn replace(self: *Regex, s: string, repl: string) -> string Replace the first match.
replace_all fn replace_all(self: *Regex, s: string, repl: string) -> string Replace every non-overlapping match.
replace_with fn replace_with(self: *Regex, s: string, f: fn(*Match) -> string) -> string Replace each match with the result of f(match).

The replacement string (replace / replace_all) supports backreferences and a literal-$ escape:

Token Expands to
$0 .. $9 The corresponding capture group ($0 = full match).
${name} The named capture group name.
$$ A literal $.

Numbered and $$ tokens are easy to write as Glide string literals:

import stdlib::regex::*;

fn main() -> i32 {
    // $$ escapes a literal '$'; $1 is the first group.
    let price: *Regex = Regex::compile("(\\d+)").val;
    println!(price.replace_all("a5 b6", "$$$1"));     // a$5 b$6

    // Reorder groups.
    let pair: *Regex = Regex::compile("(\\w+)=(\\w+)").val;
    println!(pair.replace("k=v rest", "$2:$1"));      // v:k rest

    // $0 = whole match. replace touches only the first hit.
    let w: *Regex = Regex::compile("\\w+").val;
    println!(w.replace("hi there", "[$0]"));          // [hi] there
    return 0;
}
import stdlib::regex::*;

fn main() -> i32 {
    let date: *Regex =
        Regex::compile("(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})").val;
    let d: string = "$";
    let repl: string =
        d.concat("{d}/").concat(d).concat("{m}/").concat(d).concat("{y}");
    println!(date.replace("on 2026-05-30 ok", repl));   // on 30/05/2026 ok
    return 0;
}

Numbered groups ($1) and $$ have no such conflict. If named substitution is awkward, prefer replace_with and read m.named(...) in the callback.

replace_with calls your function for each match and substitutes the returned string verbatim (no $-expansion):

import stdlib::regex::*;

fn mask(m: *Match) -> string {
    let n: i32 = m.full().len();
    let mut s: string = "";
    for let i: i32 = 0; i < n; i++ { s = s.concat("*"); }
    return s;
}

fn main() -> i32 {
    let digits: *Regex = Regex::compile("\\d+").val;
    println!(digits.replace_with("card 4242 9999", mask));  // card **** ****
    return 0;
}

If there is no match, replace/replace_all/replace_with all return the input string unchanged.

Splitting

split

fn split(self: *Regex, s: string) -> *Vector<string>

Splits s on every non-overlapping match of the regex, returning the pieces between matches. The result always has at least one element (the whole string when nothing matches).

import stdlib::regex::*;

fn main() -> i32 {
    let sep: *Regex = Regex::compile(",|;").val;
    let parts: *Vector<string> = sep.split("alpha,beta;gamma");
    for let i: i32 = 0; i < parts.len(); i++ {
        println!("part", i, parts.get(i));     // alpha / beta / gamma
    }

    // \s+ collapses runs of whitespace into single separators.
    let ws: *Regex = Regex::compile("\\s+").val;
    let words: *Vector<string> = ws.split("the   quick  fox");
    println!("nwords", words.len());           // 3

    // No match -> single-element vector (the whole string).
    let z: *Regex = Regex::compile("z").val;
    let one: *Vector<string> = z.split("abc");
    println!("one", one.len(), one.get(0));    // 1  abc
    return 0;
}

Free-function convenience API

One-shot helpers that compile the pattern on every call. Prefer keeping a *Regex around for hot loops.

Function Signature Description
regex_matches fn regex_matches(pat: string, s: string) -> bool Regex::compile(pat).val.matches(s); false on bad pattern.
regex_find fn regex_find(pat: string, s: string) -> ?*Match Regex::compile(pat).val.find(s); none() on bad pattern.
regex_replace_all fn regex_replace_all(pat: string, s: string, repl: string) -> string Regex::compile(pat).val.replace_all(s, repl); returns s on bad pattern.
import stdlib::regex::*;

fn main() -> i32 {
    if regex_matches("\\d+", "abc 42") { println!("matched"); }

    match regex_find("(\\w+)@(\\w+)", "x@y") {
        some(m) => println!(m.group(1), m.group(2)),   // x  y
        none()  => {},
    }

    println!(regex_replace_all("a1b2c3", "\\d", "#"));  // a#b#c#

    // Bad pattern: silently returns false / none() / the input.
    if !regex_matches("(unclosed", "x") { println!("bad pattern swallowed"); }
    return 0;
}

Supported syntax

Literals and escapes

Most characters match themselves. Backslash escapes:

Escape Matches
\n \t \r \f \v newline, tab, return, form-feed, vertical-tab
\a \e \0 bell (0x07), escape (0x1B), NUL (0x00)
\xHH the byte with hex value HH (e.g. \x41 = A)
\. \\ \( ... a literal metacharacter

Character classes

Syntax Matches
[abc] any of a, b, c
[a-z] a byte range
[^a-z] negation (any byte not in the set)
[a-zA-Z0-9_] union of ranges
[\d\s] predefined classes are allowed inside [...]
[]a] a ] placed first is a literal ]

Predefined classes (usable bare or inside [...]):

Class Matches Negated
\d digits 0-9 \D
\w word bytes [A-Za-z0-9_] \W
\s whitespace \t\n\v\f\r and space \S
. any byte except \n (any byte with the s flag)
import stdlib::regex::*;

fn main() -> i32 {
    let hex: *Regex = Regex::compile("[0-9a-fA-F]+").val;
    println!(hex.find("DEADbeef!").val.full());    // DEADbeef

    let neg: *Regex = Regex::compile("[^aeiou ]+").val;
    println!(neg.find("the fox").val.full());      // th

    let byte: *Regex = Regex::compile("\\x41\\x42").val;  // matches "AB"
    if byte.matches("xABy") { println!("hex byte ok"); }

    let mix: *Regex = Regex::compile("[\\d.]+").val;       // digits or dot
    println!(mix.find("v3.14!").val.full());               // 3.14
    return 0;
}

Quantifiers

Quantifier Repetitions Lazy form
* 0 or more *?
+ 1 or more +?
? 0 or 1 ??
{n} exactly n {n}?
{n,} n or more {n,}?
{n,m} between n and m {n,m}?

Quantifiers are greedy by default; appending ? makes them lazy. The U flag (or inline (?U)) flips greediness globally.

import stdlib::regex::*;

fn main() -> i32 {
    let greedy: *Regex = Regex::compile("<.+>").val;
    println!(greedy.find("<a><b>").val.full());     // <a><b>

    let lazy: *Regex = Regex::compile("<.+?>").val;
    println!(lazy.find("<a><b>").val.full());        // <a>

    let bounded: *Regex = Regex::compile("a{2,3}").val;
    println!(bounded.find("aaaa").val.full());       // aaa

    let exact: *Regex = Regex::compile("\\d{3}").val;
    println!(exact.find("12345").val.full());        // 123

    // The U flag flips default greediness: <.+> behaves like <.+?>.
    let ung: *Regex = Regex::compile_with("<.+>", "U").val;
    println!(ung.find("<a><b>").val.full());         // <a>
    return 0;
}

Groups, alternation, anchors

Syntax Meaning
(...) capturing group
(?:...) non-capturing group
(?<name>...) / (?P<name>...) named capturing group
`a\ b\ c` alternation
^ $ start / end of string (or line with the m flag)
\A \z \Z absolute start / end of string (\Z is treated as \z)
\b \B word boundary / non-boundary
import stdlib::regex::*;

fn main() -> i32 {
    let alt: *Regex = Regex::compile("cat|dog|bird").val;
    println!(alt.find("I have a dog").val.full());   // dog

    let anch: *Regex = Regex::compile("^\\d+$").val;
    if anch.matches("12345")  { println!("all digits"); }
    if !anch.matches("12a45") { println!("not all"); }

    // \b: whole word, not a substring.
    let word: *Regex = Regex::compile("\\bcat\\b").val;
    if word.matches("a cat sat")  { println!("whole word"); }
    if !word.matches("category")  { println!("no substring"); }

    // m flag: ^ matches after each newline.
    let ml: *Regex = Regex::compile_with("^x", "m").val;
    let hits: *Vector<*Match> = ml.find_all("ax\nx\nx");
    println!("ml hits", hits.len());                 // 2
    return 0;
}

Backreferences

Syntax Meaning
\1 .. \9 match the same text a numbered group captured
\k<name> match the same text a named group captured

Lookaround

Syntax Meaning
(?=...) positive lookahead
(?!...) negative lookahead
(?<=...) positive lookbehind
(?<!...) negative lookbehind
import stdlib::regex::*;

fn main() -> i32 {
    // Positive lookahead: foo only when followed by bar (bar not consumed).
    let la: *Regex = Regex::compile("foo(?=bar)").val;
    println!(la.find("foobar").val.full());        // foo

    // Negative lookahead.
    let nla: *Regex = Regex::compile("\\d+(?!px)").val;
    if nla.matches("10em") { println!("nla ok"); }

    // Lookbehind: digits preceded by '$'.
    let lb: *Regex = Regex::compile("(?<=\\$)\\d+").val;
    println!(lb.find("price $42 yen").val.full());  // 42

    // Backreference: a doubled word.
    let dup: *Regex = Regex::compile("\\b(\\w+)\\s+\\1\\b").val;
    if dup.matches("the the cat") { println!("dup"); }

    // Named backreference: matching open/close tags.
    let tag: *Regex = Regex::compile("<(?<t>\\w+)>.*?</\\k<t>>").val;
    if tag.matches("<b>hi</b>") { println!("tag ok"); }
    return 0;
}

Flags

Pass as a flag string to compile_with, or inline in the pattern via (?flags) (sets flags from that point) or (?flags:subpattern) (scoped); (?flags-flags:...) turns flags off within the scope.

Flag Constant Effect
i RX_FLAG_I case-insensitive (ASCII only)
m RX_FLAG_M ^/$ match at line breaks
s RX_FLAG_S . matches \n (dot-all)
x RX_FLAG_X extended: unescaped whitespace and # comments ignored
U RX_FLAG_U ungreedy: flip default greediness
import stdlib::regex::*;

fn main() -> i32 {
    // (?i:...) — case-insensitive only inside the group.
    let scoped: *Regex = Regex::compile("(?i:hello) world").val;
    if scoped.matches("HELLO world")   { println!("scoped i"); }
    if !scoped.matches("HELLO WORLD")  { println!("outside still sensitive"); }

    // (?x) — extended: whitespace and # comments ignored.
    let ext: *Regex = Regex::compile("(?x) \\d+  # the number \n -  \\d+").val;
    if ext.matches("12-34") { println!("extended ok"); }
    return 0;
}
import stdlib::regex::*;

fn main() -> i32 {
    // Inline flags at top level + lookahead.
    let re: *Regex = Regex::compile("(?i)foo(?=bar)").val;
    if re.matches("FOObar") { println!("lookahead ok"); }
    return 0;
}

Lower-level building blocks

These public items back the engine. Most programs never touch them directly, but they are exported and documented here for completeness.

CharClass

A set of inclusive byte ranges, optionally negated. Used internally for classes; you can build one by hand.

pub struct CharClass {
    ranges: *Vector<i32>,   // [lo1, hi1, lo2, hi2, ...] inclusive
    negated: bool,
}
Method Signature Description
new fn new() -> *CharClass Empty, non-negated class.
add fn add(self: *CharClass, lo: i32, hi: i32) Add an inclusive byte range.
contains fn contains(self: *CharClass, b: i32) -> bool Test byte b (respects negated).
import stdlib::regex::*;

fn main() -> i32 {
    let cc: *CharClass = CharClass::new();
    cc.add(48, 57);   // '0'..'9'
    cc.add(65, 70);   // 'A'..'F'
    if cc.contains(53)  { println!("'5' in class"); }   // true
    if !cc.contains(103) { println!("'g' not in class"); } // true
    return 0;
}

Exported constants

These name the internal opcodes, AST node kinds, flag bits, assertion kinds, and limits. They are mainly of interest when inspecting re.flags or extending the engine.

Group Constants
Flag bits RX_FLAG_I (1), RX_FLAG_M (2), RX_FLAG_S (4), RX_FLAG_X (8), RX_FLAG_U (16)
Assertion kinds RX_AHEAD_POS (0), RX_AHEAD_NEG (1), RX_BEHIND_POS (2), RX_BEHIND_NEG (3)
Limits RX_MAX_GROUPS (64)
VM opcodes RX_OP_CHAR, RX_OP_ANY, RX_OP_ANY_NL, RX_OP_CLASS, RX_OP_BOL, RX_OP_EOL, RX_OP_STR_BEG, RX_OP_STR_END, RX_OP_WORDB, RX_OP_NWORDB, RX_OP_JMP, RX_OP_SPLIT, RX_OP_SAVE, RX_OP_BACKREF, RX_OP_ASSERT, RX_OP_ASRT_END, RX_OP_MATCH
AST node kinds RX_AST_LIT, RX_AST_ANY, RX_AST_CLASS, RX_AST_CONCAT, RX_AST_ALT, RX_AST_QUANT, RX_AST_GROUP, RX_AST_ANCHOR, RX_AST_BACK, RX_AST_LOOK

The flag bits are OR-able, so you can test re.flags:

import stdlib::regex::*;

fn main() -> i32 {
    let re: *Regex = Regex::compile_with("abc", "is").val;
    if (re.flags & RX_FLAG_I) != 0 { println!("case-insensitive"); }
    if (re.flags & RX_FLAG_S) != 0 { println!("dot-all"); }
    return 0;
}

Internal types

RxNode (AST node), RxParser (pattern parser), and RxCompiler (AST-to-bytecode compiler) are exported structs used internally by compile. They have no stable public method surface and should be treated as implementation detail — do not depend on their fields.

Unsupported / known limitations

Documented honestly so you do not reach for features that silently behave differently:

Feature Status
Unicode property classes (\p{...}), Unicode-aware \w/./case-fold Not supported — engine is byte-oriented, i folds ASCII only.
POSIX bracket classes ([[:alpha:]]) Not supported — use explicit ranges or \d/\w/\s.
Atomic groups (?>...), possessive quantifiers (a++, a*+) Not supported.
Conditionals (?(1)...), recursion (?R), subroutine calls (?1) Not supported.
Comments via (?#...) Not supported (use the x flag with #).
\Z semantics (before a trailing newline) Treated identically to \z (absolute end).
Octal escapes beyond \0, \cX control escapes, \Q...\E Not supported.
${name} replacement token as a bare Glide string literal Collides with Glide string interpolation — see the callout under Replacing.

See also

  • stringssplit, contains, substring, to_lower for non-regex text work (cheaper when you don't need a pattern).
  • vectors — the *Vector<*Match> / *Vector<string> returned by find_all / split.
  • collections — the HashMap that backs the named-group name→index table.
  • prelude — the ?*Match / !*Regex (some/none/ok/err) conventions used throughout.