Chapter 13 13 min read

HashMap & Set

Glide's standard library ships two associative containers, both keyed by string: a generic HashMap<V> (open addressing, linear probing, automatic 75%-load resize) and a StringSet built on top of it. Both are heap- or arena-allocated and own their slot buffers; pair every construction with defer x.free().

Import

import stdlib::hashmap::*;   // HashMap<V>, Pair<K, V>
import stdlib::set::*;       // StringSet (also re-exports hashmap)

HashMap<V>

A hash map from string keys to V values. Keys are stored by value, so the caller's strings can be freed independently. All operations except resize are amortised O(1); resize is O(n).

pub struct HashMap<V> {
    keys: *string,      // slot keys (valid where occupied[i])
    values: *V,         // slot values (valid where occupied[i])
    occupied: *bool,    // per-slot occupancy flag for probing
    len: i32,           // number of live entries
    cap: i32,           // allocation size in slots (power of two once non-zero)
    is_arena: bool,     // true when bump-allocated by the active arena
}

The struct fields are internal; interact with the map through its methods.

Construction & lifetime

Method	Signature	Description
`new`	`fn new() -> *HashMap<V>`	Empty map with no allocation. The first `insert` allocates the table.
`free`	`fn free(self: *HashMap<V>)`	Releases the three slot buffers and the struct. The pointer is dangling afterward. No-op for arena-backed maps.

import stdlib::hashmap::*;

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();

    m.insert("one", 1);
    println!("one ->", m.get("one"));   // 1
    return 0;
}

Inserting & reading

Method	Signature	Description
`insert`	`fn insert(self: *HashMap<V>, k: string, v: V)`	Bind `k` to `v`, replacing any existing value. Triggers a resize at 75% load.
`get`	`fn get(self: *HashMap<V>, k: string) -> V`	Read the value bound to `k`. Undefined behaviour if `k` is absent — guard with `contains` first.
`contains`	`fn contains(self: *HashMap<V>, k: string) -> bool`	`true` when `k` is bound to a value.
`remove`	`fn remove(self: *HashMap<V>, k: string) -> bool`	Remove the entry for `k`; returns `true` if anything was removed.

m.insert("count", 1);
m.insert("count", 2);          // overwrite — keys are unique
println!(m.get("count"));      // 2

if m.contains("count") {
    let n: i32 = m.get("count");
}

let was_present: bool = m.remove("count");   // true
let again: bool      = m.remove("count");    // false

Since HashMap<V> has no get-with-default, wrap the contains/get guard in a helper for the common "read or fall back" case:

import stdlib::hashmap::*;

fn get_or(m: *HashMap<i32>, k: string, fallback: i32) -> i32 {
    if m.contains(k) { return m.get(k); }
    return fallback;
}

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();
    m.insert("hits", 5);

    println!("hits:", get_or(m, "hits", 0));     // 5
    println!("misses:", get_or(m, "misses", 0)); // 0
    return 0;
}

A common pattern — accumulating counts — combines contains, get, and insert:

import stdlib::hashmap::*;

fn main() -> i32 {
    let counts: *HashMap<i32> = HashMap::new();
    defer counts.free();

    let text: string = "the cat the dog the cat";
    let words: *Vector<string> = text.split(" ");
    for let i: i32 = 0; i < words.len(); i++ {
        let w: string = words.get(i);
        if counts.contains(w) {
            counts.insert(w, counts.get(w) + 1);
        } else {
            counts.insert(w, 1);
        }
    }
    println!("the:", counts.get("the"));   // 3
    println!("cat:", counts.get("cat"));   // 2
    return 0;
}

Size & state

Method	Signature	Description
`len`	`fn len(self: *HashMap<V>) -> i32`	Number of live entries.
`is_empty`	`fn is_empty(self: *HashMap<V>) -> bool`	`true` when the map holds zero entries.
`cap`	`fn cap(self: *HashMap<V>) -> i32`	Allocated slot count (power of two once non-zero). `0` before the first insert, `8` after. For diagnostics — most callers want `len()`.
`clear`	`fn clear(self: *HashMap<V>)`	Drop every entry. Capacity is preserved, so subsequent inserts don't re-allocate.

let m: *HashMap<i32> = HashMap::new();   defer m.free();
m.cap();                 // 0  (no allocation yet)
m.insert("a", 1);
m.cap();                 // 8  (initial table size)
m.insert("b", 2);
m.len();                 // 2
m.is_empty();            // false
m.clear();
m.len();                 // 0
m.is_empty();            // true

The table doubles whenever an insert would push the load past 75% (len * 4 >= cap * 3). The first insert allocates 8 slots; the 7th distinct key trips the threshold (7*4 >= 8*3) and grows the table to 16:

import stdlib::hashmap::*;

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();

    println!("cap before insert:", m.cap());   // 0
    m.insert("a", 1);
    println!("cap after 1:", m.cap());          // 8
    m.insert("b", 2);
    m.insert("c", 3);
    m.insert("d", 4);
    m.insert("e", 5);
    m.insert("f", 6);
    m.insert("g", 7);   // 7th of 8 slots trips the resize -> 16
    println!("cap after 7:", m.cap());          // 16
    println!("len:", m.len());                  // 7
    return 0;
}

Iteration snapshots

These methods each return a freshly allocated Vector. Order is slot order, which is unspecified relative to insertion order. keys(), values(), and entries() share the same slot order, so index-paired iteration is meaningful within a single moment in time.

Method	Signature	Description
`keys`	`fn keys(self: HashMap<V>) -> Vector<string>`	Snapshot every live key.
`values`	`fn values(self: HashMap<V>) -> Vector<V>`	Snapshot every live value (same order as `keys`).
`entries`	`fn entries(self: HashMap<V>) -> Vector<*Pair<string, V>>`	Snapshot every `(key, value)` as a `Pair`.

import stdlib::hashmap::*;

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();
    m.insert("a", 1);
    m.insert("b", 2);

    let ks: *Vector<string> = m.keys();
    for let i: i32 = 0; i < ks.len(); i++ { println!(ks.get(i)); }

    let vs: *Vector<i32> = m.values();
    println!("sum:", vs.sum());          // 3

    let es: *Vector<*Pair<string, i32>> = m.entries();
    for let i: i32 = 0; i < es.len(); i++ {
        let p: *Pair<string, i32> = es.get(i);
        println!(p.first, "->", p.second);
    }
    return 0;
}

keys() and values() are emitted in the same slot order, so a single index loop walks both as matched pairs (as long as you don't mutate the map between snapshots):

import stdlib::hashmap::*;

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();
    m.insert("a", 1);
    m.insert("b", 2);
    m.insert("c", 3);

    let ks: *Vector<string> = m.keys();
    let vs: *Vector<i32> = m.values();
    for let i: i32 = 0; i < ks.len(); i++ {
        println!(ks.get(i), "=", vs.get(i));
    }
    return 0;
}

entries() rolls both into one Pair per slot — handy when you want to pass (key, value) around as a unit or reduce over them:

import stdlib::hashmap::*;

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();
    m.insert("a", 1);
    m.insert("b", 2);
    m.insert("c", 3);

    let es: *Vector<*Pair<string, i32>> = m.entries();
    let mut total: i32 = 0;
    for let i: i32 = 0; i < es.len(); i++ {
        let p: *Pair<string, i32> = es.get(i);
        total = total + p.second;
    }
    println!("sum of values:", total);   // 6
    return 0;
}

Transforming values

Method	Signature	Description
`map_values`	`fn map_values<U>(self: HashMap<V>, f: fn(V) -> U) -> HashMap<U>`	Apply `f` to every value, producing a new map with the same keys and transformed values. The original is unchanged.

import stdlib::hashmap::*;

fn square(x: i32) -> i32 { return x * x; }

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();
    m.insert("a", 2);
    m.insert("b", 3);

    let m2: *HashMap<i32> = m.map_values(square);
    defer m2.free();
    println!("b squared:", m2.get("b"));   // 9
    return 0;
}

map_values is generic over the output type U, so the result map can hold a different value type than the source. Here fn(i32) -> string turns a count map into a label map while leaving the original untouched:

import stdlib::hashmap::*;

fn label(x: i32) -> string { return format!("v={}", x); }

fn main() -> i32 {
    let m: *HashMap<i32> = HashMap::new();
    defer m.free();
    m.insert("a", 10);
    m.insert("b", 20);

    let labels: *HashMap<string> = m.map_values(label);
    defer labels.free();

    println!(labels.get("a"));   // v=10
    println!(labels.get("b"));   // v=20
    println!(m.get("a"));        // 10  (source unchanged)
    return 0;
}

Worked example: grouping

A HashMap whose value type is itself a *Vector<V> is the canonical "group-by" structure. Initialise an empty bucket the first time a key appears, then push into it:

import stdlib::hashmap::*;

fn main() -> i32 {
    let groups: *HashMap<*Vector<string>> = HashMap::new();
    defer groups.free();

    let words: *Vector<string> = "apple ant bee bird cat".split(" ");
    for let i: i32 = 0; i < words.len(); i++ {
        let w: string = words.get(i);
        let key: string = w.substring(0, 1);   // group by first letter
        if !groups.contains(key) {
            groups.insert(key, Vector::new());
        }
        let bucket: *Vector<string> = groups.get(key);
        bucket.push(w);
    }

    let ks: *Vector<string> = groups.keys();
    for let i: i32 = 0; i < ks.len(); i++ {
        let k: string = ks.get(i);
        let b: *Vector<string> = groups.get(k);
        println!(k, "->", b.len());   // a -> 2, b -> 2, c -> 1
    }
    return 0;
}

StringSet

A set of strings, backed by HashMap<bool>. Insertion and removal report whether the set actually changed, which makes deduplication a one-liner.

pub struct StringSet {
    m: *HashMap<bool>,   // internal backing map
}

Method	Signature	Description
`new`	`fn new() -> *StringSet`	Create an empty string set.
`insert`	`fn insert(self: *StringSet, k: string) -> bool`	Insert `k`; returns `true` when it was not already present.
`contains`	`fn contains(self: *StringSet, k: string) -> bool`	`true` when `k` is present.
`remove`	`fn remove(self: *StringSet, k: string) -> bool`	Remove `k`; returns `true` when it was present.
`len`	`fn len(self: *StringSet) -> i32`	Number of keys currently held.
`clear`	`fn clear(self: *StringSet)`	Remove every key while preserving the internal map allocation.
`free`	`fn free(self: *StringSet)`	Release the internal map and the set struct. Pair with `defer`.

import stdlib::set::*;

fn main() -> i32 {
    let s: *StringSet = StringSet::new();
    defer s.free();

    let added: bool = s.insert("apple");   // true
    let again: bool = s.insert("apple");   // false (already there)
    println!(added, again);

    s.insert("banana");
    println!("has apple:", s.contains("apple"));   // true
    println!("has grape:", s.contains("grape"));   // false
    println!("len:", s.len());                     // 2

    let removed: bool = s.remove("banana");        // true
    let removed2: bool = s.remove("banana");       // false
    println!(removed, removed2);

    s.clear();
    println!("len:", s.len());                     // 0
    return 0;
}

The boolean return from insert makes set-based deduplication concise — increment a counter only when the value is genuinely new:

import stdlib::set::*;

fn main() -> i32 {
    let seen: *StringSet = StringSet::new();
    defer seen.free();

    let words: *Vector<string> = "the cat the dog the cat".split(" ");
    let unique: i32 = 0;
    for let i: i32 = 0; i < words.len(); i++ {
        if seen.insert(words.get(i)) { unique = unique + 1; }
    }
    println!("unique words:", unique);   // 3
    return 0;
}

Set membership shines for intersection / difference. Because StringSet has no iterator, drive the comparison from a known list of candidates and test each with contains:

import stdlib::set::*;

fn main() -> i32 {
    let a: *StringSet = StringSet::new();
    defer a.free();
    let b: *StringSet = StringSet::new();
    defer b.free();

    a.insert("x"); a.insert("y"); a.insert("z");
    b.insert("y"); b.insert("z"); b.insert("w");

    let common: *StringSet = StringSet::new();
    defer common.free();

    let items: *Vector<string> = "x y z".split(" ");
    for let i: i32 = 0; i < items.len(); i++ {
        let it: string = items.get(i);
        if a.contains(it) && b.contains(it) {
            common.insert(it);
        }
    }
    println!("intersection size:", common.len());   // 2 (y, z)
    return 0;
}

Pair<K, V>

Returned by HashMap::entries. A minimal two-field tuple (Glide has no anonymous tuples yet).

pub struct Pair<K, V> {
    pub first: K,
    pub second: V,
}

pub fn new(first: K, second: V) -> *Pair<K, V>   // Pair::new(...)

Both fields are public; read them directly:

let p: *Pair<i32, string> = Pair::new(42, "answer");
p.first;     // 42
p.second;    // "answer"

Import

HashMap&lt;V&gt;

Construction & lifetime

Inserting & reading

Size & state

Iteration snapshots

Transforming values

Worked example: grouping

StringSet

Pair&lt;K, V&gt;

Lifetime & allocation notes

HashMap<V>

Pair<K, V>