Regex
stdlib::regex is a pure-Glide, PCRE-like regular-expression engine built on a backtracking bytecode VM. You compile a pattern into a *Regex, then test, search, capture, replace, or split strings with it. Patterns operate on raw bytes.
Import
import stdlib::regex::*;
At a glance
| Item | Kind | Summary |
|---|---|---|
Regex |
struct | A compiled pattern (pattern, flags, n_groups are pub). |
Match |
struct | One match + its capture spans (input, start, end are pub). |
CharClass |
struct | A set of inclusive byte ranges, optionally negated. |
Regex::compile / compile_with |
fn | Build a *Regex (returns !*Regex). |
matches / matches_full |
method | Boolean test (substring / whole-string). |
find / find_at / find_all |
method | Locate matches. |
replace / replace_all / replace_with |
method | Substitute matches. |
split |
method | Split on matches. |
full / group / group_opt / named / named_opt / n_groups |
method | Read a *Match. |
regex_matches / regex_find / regex_replace_all |
fn | One-shot helpers (compile every call). |
RxNode / RxParser / RxCompiler |
struct | Internal AST/parser/compiler (exported, no stable API). |
RX_* |
const | Opcode / flag / AST / assertion-kind / limit constants. |
Compiling patterns
A pattern is compiled once into a *Regex and reused for every operation. Compilation parses the pattern, reports syntax errors as the err of the !*Regex, and produces bytecode for the VM. The whole match is wrapped in an implicit group 0, so start/end/full() are always available.
Regex::compile / Regex::compile_with
| Function | Signature | Description |
|---|---|---|
compile |
fn compile(pattern: string) -> !*Regex |
Compile with no flags. |
compile_with |
fn compile_with(pattern: string, flag_str: string) -> !*Regex |
Compile with a flag string (any subset of imsxU). |
import stdlib::regex::*;
fn main() -> i32 {
let r: !*Regex = Regex::compile("(\\d+)-(\\d+)");
if !r.ok {
println!("bad pattern:", r.err);
return 1;
}
let re: *Regex = r.val;
if re.matches("phone: 555-1234") {
println!("hit");
}
return 0;
}
compile_with turns each flag character into a flag bit; an unknown character returns err("unknown flag in flag string"). Compilation also rejects malformed patterns (err("expected ')'"), "unterminated character class", "trailing characters in pattern", and so on).
// Case-insensitive (i) + multiline (m).
let r: !*Regex = Regex::compile_with("^err", "im");
import stdlib::regex::*;
fn main() -> i32 {
// A malformed pattern surfaces in .err.
let bad: !*Regex = Regex::compile("(unclosed");
if !bad.ok { println!("err:", bad.err); } // expected ')'
// Unknown flag.
let bf: !*Regex = Regex::compile_with("x", "q");
if !bf.ok { println!("flagerr:", bf.err); } // unknown flag in flag string
return 0;
}
The Regex struct
pub struct Regex {
pub pattern: string, // the original source pattern
pub flags: i32, // resolved flag bits (see RX_FLAG_*)
pub n_groups: i32, // number of capturing groups (excludes group 0)
// ... internal bytecode/classes/name table
}
The three pub fields are read-only metadata you can inspect after compiling. flags reflects any flags resolved during parsing, including inline (?i)-style settings at top level.
import stdlib::regex::*;
fn main() -> i32 {
let re: *Regex = Regex::compile("(?<a>\\d)(?<b>\\d)").val;
println!("pattern", re.pattern); // (?<a>\d)(?<b>\d)
println!("flags", re.flags); // 0
println!("n_groups", re.n_groups); // 2
return 0;
}
Testing for a match
| Method | Signature | Description |
|---|---|---|
matches |
fn matches(self: *Regex, s: string) -> bool |
true if any substring matches. |
matches_full |
fn matches_full(self: *Regex, s: string) -> bool |
true only if the pattern matches the entire string. |
import stdlib::regex::*;
fn main() -> i32 {
let re: *Regex = Regex::compile("\\d+").val;
if re.matches("hi 42 there") { println!("matches substring"); } // true
if !re.matches("nothing") { println!("no digits"); } // true
let rf: *Regex = Regex::compile("[a-z]+").val;
if rf.matches_full("hello") { println!("full ok"); } // true
if !rf.matches_full("hello world") { println!("full no"); } // space/'world' unmatched
return 0;
}
matches is find(s).has. matches_full succeeds only when the first match starts at offset 0 and ends at s.len() — it does not anchor the pattern, so a leftmost-shorter match can make it return false even when a full-length match exists. Anchor explicitly with ^...$ (or \A...\z) if you need that.
Searching and finding
| Method | Signature | Description |
|---|---|---|
find |
fn find(self: *Regex, s: string) -> ?*Match |
First match, or none(). |
find_at |
fn find_at(self: *Regex, s: string, from: i32) -> ?*Match |
First match starting at byte offset from. |
find_all |
fn find_all(self: *Regex, s: string) -> *Vector<*Match> |
All non-overlapping matches (empty matches advance by one byte). |
import stdlib::regex::*;
fn main() -> i32 {
let re: *Regex = Regex::compile("\\d+").val;
// find_all: every non-overlapping match, with spans.
let all: *Vector<*Match> = re.find_all("a12 b3 c456");
for let i: i32 = 0; i < all.len(); i++ {
let m: *Match = all.get(i);
println!("match", i, m.full(), m.start, m.end);
}
// find_at: start searching at an offset.
match re.find_at("a12 b3", 3) {
some(m) => println!("from 3:", m.full(), m.start), // 3 at offset 4
none() => println!("none"),
}
// find: no match -> none().
match re.find("no digits here") {
some(m) => println!("found", m.full()),
none() => println!("no match"),
}
return 0;
}
find is find_at(s, 0). Both scan forward from the start offset, trying the VM at each byte position (leftmost match wins). find_all repeatedly calls find_at, advancing past each match; a zero-width match advances by one byte so the loop terminates.
The Match result and captures
A *Match describes one match and its capture groups. Group 0 is the whole match; groups 1..n are the parenthesized captures in order.
pub struct Match {
pub input: string, // the string that was searched
pub start: i32, // byte offset of the match start
pub end: i32, // byte offset just past the match
// ... capture spans + name table
}
| Method | Signature | Description |
|---|---|---|
full |
fn full(self: *Match) -> string |
The full matched substring (same as group(0)). |
group |
fn group(self: *Match, i: i32) -> string |
Capture i (0 = full match); "" for missing/uncaptured. |
group_opt |
fn group_opt(self: *Match, i: i32) -> ?string |
some if group i participated, else none(). |
named |
fn named(self: *Match, name: string) -> string |
Named-capture lookup; "" if absent. |
named_opt |
fn named_opt(self: *Match, name: string) -> ?string |
some if the named group participated, else none(). |
n_groups |
fn n_groups(self: *Match) -> i32 |
Number of capturing groups (excluding group 0). |
import stdlib::regex::*;
fn main() -> i32 {
let re: *Regex = Regex::compile("(?<year>\\d{4})-(?<month>\\d{2})").val;
match re.find("date 2026-05 end") {
some(m) => {
println!("full:", m.group(0), m.full()); // 2026-05 2026-05
println!("g1:", m.group(1), "g2:", m.group(2)); // 2026 05
println!("year:", m.named("year"), m.named("month"));
println!("groups:", m.n_groups()); // 2
println!("span:", m.start, m.end);
println!("input:", m.input);
}
none() => println!("no match"),
}
return 0;
}
import stdlib::regex::*;
fn main() -> i32 {
// (b)? did not participate when matching just "a".
let re: *Regex = Regex::compile("(a)(b)?").val;
match re.find("a") {
some(m) => {
let g2: ?string = m.group_opt(2);
if g2.has { println!("g2 present"); } else { println!("g2 missing"); }
println!("g1", m.group(1)); // a
}
none() => println!("no match"),
}
// (x*) DID participate but captured the empty string.
let re2: *Regex = Regex::compile("(a)(x*)").val;
match re2.find("a") {
some(m) => {
let g2: ?string = m.group_opt(2);
if g2.has { println!("g2 empty-but-present:", g2.val.len()); } // 0
}
none() => {}
}
return 0;
}
Replacing
| Method | Signature | Description |
|---|---|---|
replace |
fn replace(self: *Regex, s: string, repl: string) -> string |
Replace the first match. |
replace_all |
fn replace_all(self: *Regex, s: string, repl: string) -> string |
Replace every non-overlapping match. |
replace_with |
fn replace_with(self: *Regex, s: string, f: fn(*Match) -> string) -> string |
Replace each match with the result of f(match). |
The replacement string (replace / replace_all) supports backreferences and a literal-$ escape:
| Token | Expands to |
|---|---|
$0 .. $9 |
The corresponding capture group ($0 = full match). |
${name} |
The named capture group name. |
$$ |
A literal $. |
Numbered and $$ tokens are easy to write as Glide string literals:
import stdlib::regex::*;
fn main() -> i32 {
// $$ escapes a literal '$'; $1 is the first group.
let price: *Regex = Regex::compile("(\\d+)").val;
println!(price.replace_all("a5 b6", "$$$1")); // a$5 b$6
// Reorder groups.
let pair: *Regex = Regex::compile("(\\w+)=(\\w+)").val;
println!(pair.replace("k=v rest", "$2:$1")); // v:k rest
// $0 = whole match. replace touches only the first hit.
let w: *Regex = Regex::compile("\\w+").val;
println!(w.replace("hi there", "[$0]")); // [hi] there
return 0;
}
import stdlib::regex::*;
fn main() -> i32 {
let date: *Regex =
Regex::compile("(?<y>\\d{4})-(?<m>\\d{2})-(?<d>\\d{2})").val;
let d: string = "$";
let repl: string =
d.concat("{d}/").concat(d).concat("{m}/").concat(d).concat("{y}");
println!(date.replace("on 2026-05-30 ok", repl)); // on 30/05/2026 ok
return 0;
}
Numbered groups ($1) and $$ have no such conflict. If named substitution is awkward, prefer replace_with and read m.named(...) in the callback.
replace_with calls your function for each match and substitutes the returned string verbatim (no $-expansion):
import stdlib::regex::*;
fn mask(m: *Match) -> string {
let n: i32 = m.full().len();
let mut s: string = "";
for let i: i32 = 0; i < n; i++ { s = s.concat("*"); }
return s;
}
fn main() -> i32 {
let digits: *Regex = Regex::compile("\\d+").val;
println!(digits.replace_with("card 4242 9999", mask)); // card **** ****
return 0;
}
If there is no match, replace/replace_all/replace_with all return the input string unchanged.
Splitting
split
fn split(self: *Regex, s: string) -> *Vector<string>
Splits s on every non-overlapping match of the regex, returning the pieces between matches. The result always has at least one element (the whole string when nothing matches).
import stdlib::regex::*;
fn main() -> i32 {
let sep: *Regex = Regex::compile(",|;").val;
let parts: *Vector<string> = sep.split("alpha,beta;gamma");
for let i: i32 = 0; i < parts.len(); i++ {
println!("part", i, parts.get(i)); // alpha / beta / gamma
}
// \s+ collapses runs of whitespace into single separators.
let ws: *Regex = Regex::compile("\\s+").val;
let words: *Vector<string> = ws.split("the quick fox");
println!("nwords", words.len()); // 3
// No match -> single-element vector (the whole string).
let z: *Regex = Regex::compile("z").val;
let one: *Vector<string> = z.split("abc");
println!("one", one.len(), one.get(0)); // 1 abc
return 0;
}
Free-function convenience API
One-shot helpers that compile the pattern on every call. Prefer keeping a *Regex around for hot loops.
| Function | Signature | Description |
|---|---|---|
regex_matches |
fn regex_matches(pat: string, s: string) -> bool |
Regex::compile(pat).val.matches(s); false on bad pattern. |
regex_find |
fn regex_find(pat: string, s: string) -> ?*Match |
Regex::compile(pat).val.find(s); none() on bad pattern. |
regex_replace_all |
fn regex_replace_all(pat: string, s: string, repl: string) -> string |
Regex::compile(pat).val.replace_all(s, repl); returns s on bad pattern. |
import stdlib::regex::*;
fn main() -> i32 {
if regex_matches("\\d+", "abc 42") { println!("matched"); }
match regex_find("(\\w+)@(\\w+)", "x@y") {
some(m) => println!(m.group(1), m.group(2)), // x y
none() => {},
}
println!(regex_replace_all("a1b2c3", "\\d", "#")); // a#b#c#
// Bad pattern: silently returns false / none() / the input.
if !regex_matches("(unclosed", "x") { println!("bad pattern swallowed"); }
return 0;
}
Supported syntax
Literals and escapes
Most characters match themselves. Backslash escapes:
| Escape | Matches |
|---|---|
\n \t \r \f \v |
newline, tab, return, form-feed, vertical-tab |
\a \e \0 |
bell (0x07), escape (0x1B), NUL (0x00) |
\xHH |
the byte with hex value HH (e.g. \x41 = A) |
\. \\ \( ... |
a literal metacharacter |
Character classes
| Syntax | Matches |
|---|---|
[abc] |
any of a, b, c |
[a-z] |
a byte range |
[^a-z] |
negation (any byte not in the set) |
[a-zA-Z0-9_] |
union of ranges |
[\d\s] |
predefined classes are allowed inside [...] |
[]a] |
a ] placed first is a literal ] |
Predefined classes (usable bare or inside [...]):
| Class | Matches | Negated |
|---|---|---|
\d |
digits 0-9 |
\D |
\w |
word bytes [A-Za-z0-9_] |
\W |
\s |
whitespace \t\n\v\f\r and space |
\S |
. |
any byte except \n (any byte with the s flag) |
— |
import stdlib::regex::*;
fn main() -> i32 {
let hex: *Regex = Regex::compile("[0-9a-fA-F]+").val;
println!(hex.find("DEADbeef!").val.full()); // DEADbeef
let neg: *Regex = Regex::compile("[^aeiou ]+").val;
println!(neg.find("the fox").val.full()); // th
let byte: *Regex = Regex::compile("\\x41\\x42").val; // matches "AB"
if byte.matches("xABy") { println!("hex byte ok"); }
let mix: *Regex = Regex::compile("[\\d.]+").val; // digits or dot
println!(mix.find("v3.14!").val.full()); // 3.14
return 0;
}
Quantifiers
| Quantifier | Repetitions | Lazy form |
|---|---|---|
* |
0 or more | *? |
+ |
1 or more | +? |
? |
0 or 1 | ?? |
{n} |
exactly n |
{n}? |
{n,} |
n or more |
{n,}? |
{n,m} |
between n and m |
{n,m}? |
Quantifiers are greedy by default; appending ? makes them lazy. The U flag (or inline (?U)) flips greediness globally.
import stdlib::regex::*;
fn main() -> i32 {
let greedy: *Regex = Regex::compile("<.+>").val;
println!(greedy.find("<a><b>").val.full()); // <a><b>
let lazy: *Regex = Regex::compile("<.+?>").val;
println!(lazy.find("<a><b>").val.full()); // <a>
let bounded: *Regex = Regex::compile("a{2,3}").val;
println!(bounded.find("aaaa").val.full()); // aaa
let exact: *Regex = Regex::compile("\\d{3}").val;
println!(exact.find("12345").val.full()); // 123
// The U flag flips default greediness: <.+> behaves like <.+?>.
let ung: *Regex = Regex::compile_with("<.+>", "U").val;
println!(ung.find("<a><b>").val.full()); // <a>
return 0;
}
Groups, alternation, anchors
| Syntax | Meaning | ||
|---|---|---|---|
(...) |
capturing group | ||
(?:...) |
non-capturing group | ||
(?<name>...) / (?P<name>...) |
named capturing group | ||
| `a\ | b\ | c` | alternation |
^ $ |
start / end of string (or line with the m flag) |
||
\A \z \Z |
absolute start / end of string (\Z is treated as \z) |
||
\b \B |
word boundary / non-boundary |
import stdlib::regex::*;
fn main() -> i32 {
let alt: *Regex = Regex::compile("cat|dog|bird").val;
println!(alt.find("I have a dog").val.full()); // dog
let anch: *Regex = Regex::compile("^\\d+$").val;
if anch.matches("12345") { println!("all digits"); }
if !anch.matches("12a45") { println!("not all"); }
// \b: whole word, not a substring.
let word: *Regex = Regex::compile("\\bcat\\b").val;
if word.matches("a cat sat") { println!("whole word"); }
if !word.matches("category") { println!("no substring"); }
// m flag: ^ matches after each newline.
let ml: *Regex = Regex::compile_with("^x", "m").val;
let hits: *Vector<*Match> = ml.find_all("ax\nx\nx");
println!("ml hits", hits.len()); // 2
return 0;
}
Backreferences
| Syntax | Meaning |
|---|---|
\1 .. \9 |
match the same text a numbered group captured |
\k<name> |
match the same text a named group captured |
Lookaround
| Syntax | Meaning |
|---|---|
(?=...) |
positive lookahead |
(?!...) |
negative lookahead |
(?<=...) |
positive lookbehind |
(?<!...) |
negative lookbehind |
import stdlib::regex::*;
fn main() -> i32 {
// Positive lookahead: foo only when followed by bar (bar not consumed).
let la: *Regex = Regex::compile("foo(?=bar)").val;
println!(la.find("foobar").val.full()); // foo
// Negative lookahead.
let nla: *Regex = Regex::compile("\\d+(?!px)").val;
if nla.matches("10em") { println!("nla ok"); }
// Lookbehind: digits preceded by '$'.
let lb: *Regex = Regex::compile("(?<=\\$)\\d+").val;
println!(lb.find("price $42 yen").val.full()); // 42
// Backreference: a doubled word.
let dup: *Regex = Regex::compile("\\b(\\w+)\\s+\\1\\b").val;
if dup.matches("the the cat") { println!("dup"); }
// Named backreference: matching open/close tags.
let tag: *Regex = Regex::compile("<(?<t>\\w+)>.*?</\\k<t>>").val;
if tag.matches("<b>hi</b>") { println!("tag ok"); }
return 0;
}
Flags
Pass as a flag string to compile_with, or inline in the pattern via (?flags) (sets flags from that point) or (?flags:subpattern) (scoped); (?flags-flags:...) turns flags off within the scope.
| Flag | Constant | Effect |
|---|---|---|
i |
RX_FLAG_I |
case-insensitive (ASCII only) |
m |
RX_FLAG_M |
^/$ match at line breaks |
s |
RX_FLAG_S |
. matches \n (dot-all) |
x |
RX_FLAG_X |
extended: unescaped whitespace and # comments ignored |
U |
RX_FLAG_U |
ungreedy: flip default greediness |
import stdlib::regex::*;
fn main() -> i32 {
// (?i:...) — case-insensitive only inside the group.
let scoped: *Regex = Regex::compile("(?i:hello) world").val;
if scoped.matches("HELLO world") { println!("scoped i"); }
if !scoped.matches("HELLO WORLD") { println!("outside still sensitive"); }
// (?x) — extended: whitespace and # comments ignored.
let ext: *Regex = Regex::compile("(?x) \\d+ # the number \n - \\d+").val;
if ext.matches("12-34") { println!("extended ok"); }
return 0;
}
import stdlib::regex::*;
fn main() -> i32 {
// Inline flags at top level + lookahead.
let re: *Regex = Regex::compile("(?i)foo(?=bar)").val;
if re.matches("FOObar") { println!("lookahead ok"); }
return 0;
}
Lower-level building blocks
These public items back the engine. Most programs never touch them directly, but they are exported and documented here for completeness.
CharClass
A set of inclusive byte ranges, optionally negated. Used internally for classes; you can build one by hand.
pub struct CharClass {
ranges: *Vector<i32>, // [lo1, hi1, lo2, hi2, ...] inclusive
negated: bool,
}
| Method | Signature | Description |
|---|---|---|
new |
fn new() -> *CharClass |
Empty, non-negated class. |
add |
fn add(self: *CharClass, lo: i32, hi: i32) |
Add an inclusive byte range. |
contains |
fn contains(self: *CharClass, b: i32) -> bool |
Test byte b (respects negated). |
import stdlib::regex::*;
fn main() -> i32 {
let cc: *CharClass = CharClass::new();
cc.add(48, 57); // '0'..'9'
cc.add(65, 70); // 'A'..'F'
if cc.contains(53) { println!("'5' in class"); } // true
if !cc.contains(103) { println!("'g' not in class"); } // true
return 0;
}
Exported constants
These name the internal opcodes, AST node kinds, flag bits, assertion kinds, and limits. They are mainly of interest when inspecting re.flags or extending the engine.
| Group | Constants |
|---|---|
| Flag bits | RX_FLAG_I (1), RX_FLAG_M (2), RX_FLAG_S (4), RX_FLAG_X (8), RX_FLAG_U (16) |
| Assertion kinds | RX_AHEAD_POS (0), RX_AHEAD_NEG (1), RX_BEHIND_POS (2), RX_BEHIND_NEG (3) |
| Limits | RX_MAX_GROUPS (64) |
| VM opcodes | RX_OP_CHAR, RX_OP_ANY, RX_OP_ANY_NL, RX_OP_CLASS, RX_OP_BOL, RX_OP_EOL, RX_OP_STR_BEG, RX_OP_STR_END, RX_OP_WORDB, RX_OP_NWORDB, RX_OP_JMP, RX_OP_SPLIT, RX_OP_SAVE, RX_OP_BACKREF, RX_OP_ASSERT, RX_OP_ASRT_END, RX_OP_MATCH |
| AST node kinds | RX_AST_LIT, RX_AST_ANY, RX_AST_CLASS, RX_AST_CONCAT, RX_AST_ALT, RX_AST_QUANT, RX_AST_GROUP, RX_AST_ANCHOR, RX_AST_BACK, RX_AST_LOOK |
The flag bits are OR-able, so you can test re.flags:
import stdlib::regex::*;
fn main() -> i32 {
let re: *Regex = Regex::compile_with("abc", "is").val;
if (re.flags & RX_FLAG_I) != 0 { println!("case-insensitive"); }
if (re.flags & RX_FLAG_S) != 0 { println!("dot-all"); }
return 0;
}
Internal types
RxNode (AST node), RxParser (pattern parser), and RxCompiler (AST-to-bytecode compiler) are exported structs used internally by compile. They have no stable public method surface and should be treated as implementation detail — do not depend on their fields.
Unsupported / known limitations
Documented honestly so you do not reach for features that silently behave differently:
| Feature | Status |
|---|---|
Unicode property classes (\p{...}), Unicode-aware \w/./case-fold |
Not supported — engine is byte-oriented, i folds ASCII only. |
POSIX bracket classes ([[:alpha:]]) |
Not supported — use explicit ranges or \d/\w/\s. |
Atomic groups (?>...), possessive quantifiers (a++, a*+) |
Not supported. |
Conditionals (?(1)...), recursion (?R), subroutine calls (?1) |
Not supported. |
Comments via (?#...) |
Not supported (use the x flag with #). |
\Z semantics (before a trailing newline) |
Treated identically to \z (absolute end). |
Octal escapes beyond \0, \cX control escapes, \Q...\E |
Not supported. |
${name} replacement token as a bare Glide string literal |
Collides with Glide string interpolation — see the callout under Replacing. |
See also
- strings —
split,contains,substring,to_lowerfor non-regex text work (cheaper when you don't need a pattern). - vectors — the
*Vector<*Match>/*Vector<string>returned byfind_all/split. - collections — the
HashMapthat backs the named-group name→index table. - prelude — the
?*Match/!*Regex(some/none/ok/err) conventions used throughout.