Capítulo 11 14 min de leitura

Strings

string é um primitivo embutido do Glide. Uma string é uma sequência de bytes imutável, codificada em UTF-8 e respaldada por um const char*. Todos os métodos listados aqui são declarados com impl string { ... } e operam sobre bytes brutos — deslocamentos de índice, len e cmp são baseados em bytes, portanto o ASCII se comporta de forma intuitiva, enquanto sequências UTF-8 de múltiplos bytes são ordenadas e indexadas pelos seus bytes brutos.

Importação

Nenhuma importação necessária — string é um tipo primitivo embutido. Os métodos estão sempre no escopo.

fn main() -> i32 {
    let s: string = "hello";
    println!(s.len());   // 5
    return 0;
}

Catálogo de métodos

Toda a superfície pública de impl string, incluindo os auxiliares de runtime (len/at/eq/concat/substring) sobre os quais os métodos são construídos:

Método	Assinatura	Retorna
`is_empty`	`pub fn is_empty(self: string) -> bool`	`bool`
`len`	runtime — `self.len()`	`i32` (bytes)
`eq`	runtime — `self.eq(other)`	`bool`
`at`	runtime — `self.at(i)`	`char`
`cmp`	`pub fn cmp(self: string, other: string) -> i32`	`i32` (sinal)
`contains`	`pub fn contains(self: string, sub: string) -> bool`	`bool`
`index_of`	`pub fn index_of(self: string, sub: string) -> i32`	`i32` (`-1` se ausente)
`starts_with`	`pub fn starts_with(self: string, pre: string) -> bool`	`bool`
`ends_with`	`pub fn ends_with(self: string, suf: string) -> bool`	`bool`
`substring`	runtime — `self.substring(a, b)`	`string`
`split`	`pub fn split(self: string, sep: string) -> *Vector<string>`	`*Vector<string>`
`replace`	`pub fn replace(self: string, find: string, repl: string) -> string`	`string`
`trim`	`pub fn trim(self: string) -> string`	`string`
`trim_left`	`pub fn trim_left(self: string) -> string`	`string`
`trim_right`	`pub fn trim_right(self: string) -> string`	`string`
`to_upper`	`pub fn to_upper(self: string) -> string`	`string`
`to_lower`	`pub fn to_lower(self: string) -> string`	`string`
`repeat`	`pub fn repeat(self: string, n: i32) -> string`	`string`
`concat`	runtime — `self.concat(other)`	`string`
`parse_int`	`pub fn parse_int(self: string) -> i32`	`i32` (`0` em falha)
`try_parse_int`	`pub fn try_parse_int(self: string) -> !i32`	`!i32`
`try_parse_float`	`pub fn try_parse_float(self: string) -> !f64`	`!f64`
`try_parse_bool`	`pub fn try_parse_bool(self: string) -> !bool`	`!bool`

Inspeção e comparação

Método	Assinatura	Descrição
`is_empty`	`pub fn is_empty(self: string) -> bool`	`true` quando a string tem zero bytes.
`len`	auxiliar de runtime — `self.len()`	Comprimento da string em bytes (não em code points).
`eq`	auxiliar de runtime — `self.eq(other)`	Igualdade byte a byte; retorna `bool`.
`at`	auxiliar de runtime — `self.at(i)`	Byte no deslocamento `i` como um `char`; use `.to_int()` para obter seu valor numérico.
`cmp`	`pub fn cmp(self: string, other: string) -> i32`	Comparação lexicográfica byte a byte: negativo se `self` vem primeiro, `0` se igual, positivo se vem depois.

is_empty é apenas self.len() == 0:

"".is_empty();        // true
" ".is_empty();       // false  (um espaço é um byte)
"abc".is_empty();     // false

cmp compara byte a byte. Para ASCII, isso equivale à ordem por code point; em caso de prefixo comum, a string mais longa vem depois da mais curta. A implementação retorna exatamente -1, 0 ou 1:

"apple".cmp("banana");   // -1
"banana".cmp("apple");   //  1
"apple".cmp("apple");    //  0
"apple".cmp("app");      //  1   (mais longa vence no prefixo)

Ramificando pelo sinal de cmp para determinar uma ordenação:

fn order(a: string, b: string) -> string {
    let c: i32 = a.cmp(b);
    if c < 0 { return a.concat(" < ").concat(b); }
    if c > 0 { return a.concat(" > ").concat(b); }
    return a.concat(" == ").concat(b);
}

fn main() -> i32 {
    println!(order("apple", "banana"));   // apple < banana
    println!(order("pear", "pear"));      // pear == pear
    return 0;
}

Um programa de inspeção completo — note que at(i) retorna um char, portanto chame .to_int() para obter seu valor em bytes:

fn main() -> i32 {
    let s: string = "hello";
    println!(s.len());            // 5
    println!(s.is_empty());       // false
    println!("".is_empty());      // true
    println!(s.eq("hello"));      // true
    println!("apple".cmp("banana"));  // -1
    let c: char = s.at(0);
    println!(c.to_int());         // 104 ('h')
    return 0;
}

Busca

Método	Assinatura	Descrição
`contains`	`pub fn contains(self: string, sub: string) -> bool`	`true` quando `sub` aparece em qualquer posição de `self`.
`index_of`	`pub fn index_of(self: string, sub: string) -> i32`	Deslocamento em bytes da primeira ocorrência de `sub`, ou `-1` se ausente. Uma needle vazia retorna `0`.
`starts_with`	`pub fn starts_with(self: string, pre: string) -> bool`	`true` quando `self` começa com `pre` byte a byte.
`ends_with`	`pub fn ends_with(self: string, suf: string) -> bool`	`true` quando `self` termina com `suf` byte a byte.

Os quatro métodos diferenciam maiúsculas de minúsculas. contains é definido como self.index_of(sub) >= 0. index_of faz uma varredura ingênua da esquerda para a direita e retorna o deslocamento da primeira ocorrência.

fn main() -> i32 {
    let url: string = "https://example.com/report.pdf";
    println!(url.contains("example"));        // true
    println!(url.index_of("example"));        // 8
    println!(url.index_of("missing"));        // -1
    println!(url.starts_with("https://"));    // true
    println!(url.ends_with(".pdf"));          // true
    println!(url.ends_with(".PDF"));          // false (diferencia maiúsculas)
    return 0;
}

fn icontains(hay: string, needle: string) -> bool {
    return hay.to_lower().contains(needle.to_lower());
}

fn main() -> i32 {
    println!(icontains("Report.PDF", "pdf"));   // true
    println!(icontains("Report.PDF", "xml"));   // false
    return 0;
}

Fatiamento e divisão

Método	Assinatura	Descrição
`substring`	auxiliar de runtime — `self.substring(a, b)`	Bytes no intervalo semi-aberto `[a, b)`.
`split`	`pub fn split(self: string, sep: string) -> *Vector<string>`	Divide a cada ocorrência de `sep`. Um `sep` vazio produz um fragmento por byte.

substring(a, b) retorna os bytes do deslocamento a até (não incluindo) b.

split retorna um *Vector<string> (consulte a referência de Vector para .len(), .get(i), iteração etc.). Dois separadores adjacentes produzem um fragmento vazio entre eles; um separador vazio divide em um fragmento por byte, o que é útil para percorrer caracteres.

fn main() -> i32 {
    let s: string = "a,b,c";
    println!(s.substring(0, 1));   // "a"
    println!(s.substring(2, 3));   // "b"

    let parts: *Vector<string> = s.split(",");
    println!(parts.len());         // 3
    println!(parts.get(1));        // "b"

    let chars: *Vector<string> = "abc".split("");
    println!(chars.len());         // 3   ["a", "b", "c"]
    return 0;
}

output

"a,,b".split(",").len();   // 3   (o fragmento do meio é "")

`split` → loop

O padrão mais comum é usar split e depois iterar o vetor resultante, aplicando trim ou parsing em cada campo:

fn main() -> i32 {
    let csv: string = "  alice , bob ,carol  ";
    let parts: *Vector<string> = csv.split(",");
    for let i: i32 = 0; i < parts.len(); i++ {
        let field: string = parts.get(i).trim();
        println!(i, field);
    }
    return 0;
}

output

0 alice
1 bob
2 carol

Percorrendo bytes com `at`

Para uma única passagem sobre os bytes, não é necessário usar split("") — indexe com at(i) e inspecione o valor numérico. O exemplo abaixo conta dígitos ASCII:

fn main() -> i32 {
    let s: string = "ab12c3";
    let mut digits: i32 = 0;
    for let i: i32 = 0; i < s.len(); i++ {
        let c: i32 = s.at(i).to_int();
        if c >= 48 && c <= 57 { digits = digits + 1; }
    }
    println!(digits);   // 3
    return 0;
}

Transformação

Todos os métodos desta seção retornam uma string completamente nova.

Método	Assinatura	Descrição
`replace`	`pub fn replace(self: string, find: string, repl: string) -> string`	Substitui cada ocorrência não sobreposta de `find` por `repl`. Um `find` vazio retorna `self` sem alteração.
`trim`	`pub fn trim(self: string) -> string`	Remove espaços em branco (espaço, tab, CR, LF, FF, VT) de ambas as extremidades.
`trim_left`	`pub fn trim_left(self: string) -> string`	Remove apenas os espaços em branco do início.
`trim_right`	`pub fn trim_right(self: string) -> string`	Remove apenas os espaços em branco do final.
`to_upper`	`pub fn to_upper(self: string) -> string`	Converte letras ASCII para maiúsculas; bytes não-ASCII passam sem alteração.
`to_lower`	`pub fn to_lower(self: string) -> string`	Converte letras ASCII para minúsculas; bytes não-ASCII passam sem alteração.
`repeat`	`pub fn repeat(self: string, n: i32) -> string`	Concatena `self` consigo mesmo `n` vezes. Retorna `""` quando `n <= 0`.
`concat`	auxiliar de runtime — `self.concat(other)`	Une duas strings em uma nova string.

replace casa da esquerda para a direita e não sobrepõe ocorrências, portanto "aaaa".replace("aa", "b") produz "bb". O conjunto de espaços em branco reconhecido por trim* é: espaço (0x20), tab (0x09), LF (0x0A), CR (0x0D), FF (0x0C) e VT (0x0B).

fn main() -> i32 {
    println!("hello world".replace("world", "there"));  // "hello there"
    println!("aaaa".replace("aa", "b"));                // "bb"

    println!("  hello  ".trim());        // "hello"
    println!("   hello".trim_left());    // "hello"
    println!("hello   ".trim_right());   // "hello"

    println!("Hello, World!".to_upper());  // "HELLO, WORLD!"
    println!("Hello, World!".to_lower());  // "hello, world!"

    println!("ab".repeat(3));   // "ababab"
    println!("-".repeat(10));   // "----------"

    println!("foo".concat("bar"));  // "foobar"
    return 0;
}

Construindo strings

concat é o primitivo para montar texto. Para unir elementos com um separador, pule o separador no primeiro elemento:

fn join(parts: *Vector<string>, sep: string) -> string {
    let mut out: string = "";
    for let i: i32 = 0; i < parts.len(); i++ {
        if i > 0 { out = out.concat(sep); }
        out = out.concat(parts.get(i));
    }
    return out;
}

fn main() -> i32 {
    let words: *Vector<string> = "a,b,c".split(",");
    println!(join(words, "-"));   // "a-b-c"
    return 0;
}

Parsing

Método	Assinatura	Descrição
`parse_int`	`pub fn parse_int(self: string) -> i32`	Parsing de inteiro decimal com sinal `+`/`-` opcional. Retorna `0` em qualquer falha.
`try_parse_int`	`pub fn try_parse_int(self: string) -> !i32`	Como `parse_int`, mas reporta falha como `err(...)`.
`try_parse_float`	`pub fn try_parse_float(self: string) -> !f64`	Faz parsing de um float decimal (sinal, partes inteira/fracionária, expoente `e` opcional).
`try_parse_bool`	`pub fn try_parse_bool(self: string) -> !bool`	Faz parsing exato de `"true"` ou `"false"` (diferencia maiúsculas de minúsculas).

`parse_int` vs `try_parse_int`

parse_int retorna 0 para entrada vazia, um sinal isolado ou qualquer byte não-dígito — o que é ambíguo com um legítimo "0". Use try_parse_int quando for preciso distinguir "entrada inválida" de "o valor realmente era 0".

"42".parse_int();      // 42
"-7".parse_int();      // -7
"abc".parse_int();     // 0   (assim como "0" — cuidado)

try_parse_int expõe o motivo da falha — "empty string", "no digits" (um sinal isolado) ou "non-digit byte" — por meio do resultado !i32. Leia-o via .ok / .val / .err, propague com ? pós-fixo, ou forneça um valor padrão com ??:

fn main() -> i32 {
    let r: !i32 = "42".try_parse_int();
    if r.ok { println!(r.val); }   // 42

    let bad: !i32 = "abc".try_parse_int();
    if !bad.ok { println!(bad.err); }   // "non-digit byte"
    return 0;
}

Você também pode usar match no resultado para tratar os dois casos de uma vez:

fn classify(s: string) -> string {
    let r: !i32 = s.try_parse_int();
    match r {
        ok(v) => format!("ok: {}", v),
        err(e) => format!("bad: {}", e),
    }
}

fn main() -> i32 {
    println!(classify("42"));    // ok: 42
    println!(classify(""));      // bad: empty string
    println!(classify("-"));     // bad: no digits
    println!(classify("4x"));    // bad: non-digit byte
    return 0;
}

Em uma função (ou main) que retorna !T, o ? pós-fixo propaga o erro e ?? fornece um valor padrão:

fn main() -> !i32 {
    let n: i32 = "100".try_parse_int()?;
    let m: i32 = "23".try_parse_int()?;
    println!(n + m);   // 123

    let port: i32 = "8080".try_parse_int() ?? 80;
    println!(port);    // 8080
    return ok(0);
}

`try_parse_float`

Aceita um sinal opcional, uma parte inteira, uma parte fracionária opcional (.NNN) e um expoente opcional (e[+-]NNN / E[+-]NNN). A parte inteira ou a fracionária podem estar vazias, mas ao menos um dígito deve estar presente no total. Modos de falha: "empty string", "no digits", "malformed exponent" e "non-digit byte".

"3.14".try_parse_float();    // ok(3.14)
".5".try_parse_float();      // ok(0.5)
"1e3".try_parse_float();     // ok(1000.0)
"1e".try_parse_float();      // err("malformed exponent")
"abc".try_parse_float();     // err("non-digit byte")

`try_parse_bool`

Aceita estritamente "true" ou "false". Qualquer outra coisa — incluindo "True", "1" ou "yes" — é rejeitada com err("not a bool").

"true".try_parse_bool();    // ok(true)
"false".try_parse_bool();   // ok(false)
"yes".try_parse_bool();     // err("not a bool")

Um programa que exercita try_parse_float e try_parse_bool com .ok/.val/.err:

fn main() -> i32 {
    let f: !f64 = "3.14".try_parse_float();
    if f.ok { println!(f.val); }     // 3.14

    let g: !f64 = "1e3".try_parse_float();
    if g.ok { println!(g.val); }     // 1000

    let bad: !f64 = "1e".try_parse_float();
    if !bad.ok { println!(bad.err); } // malformed exponent

    let b: !bool = "true".try_parse_bool();
    if b.ok { println!(b.val); }     // true

    let nb: !bool = "yes".try_parse_bool();
    if !nb.ok { println!(nb.err); }  // not a bool
    return 0;
}

Semântica de bytes (UTF-8)

len, at, substring, index_of e cmp operam sobre bytes brutos, não sobre code points Unicode. Texto ASCII se comporta de forma intuitiva, mas sequências UTF-8 de múltiplos bytes contam como mais de um byte, e fatiar ou indexar no meio de uma sequência a divide.

fn main() -> i32 {
    let s: string = "café";          // 'é' tem 2 bytes em UTF-8
    println!(s.len());                // 5 bytes, não 4 code points
    println!(s.substring(0, 3));      // "caf"
    println!("naive".to_upper());     // "NAIVE" (somente ASCII)
    return 0;
}

Casos extremos em resumo

fn main() -> i32 {
    println!("a,,b".split(",").len());   // 3 (o fragmento do meio é "")
    println!("x".repeat(0));             // "" (n <= 0)
    println!("abc".replace("z", "?"));   // "abc" (sem correspondência)
    println!("abc".replace("", "?"));    // "abc" (find vazio é no-op)
    println!("abc".index_of(""));        // 0 (needle vazia)
    println!("aaaa".replace("aa", "b")); // "bb" (não sobreposto)
    return 0;
}

Veja também

Vector — o *Vector<string> retornado por split.
Formatting — format!/println! para montar saída a partir de valores

mistos.

Importação

Catálogo de métodos

Inspeção e comparação

Busca

Fatiamento e divisão

split → loop

Percorrendo bytes com at

Transformação

Construindo strings

Parsing

parse_int vs try_parse_int

try_parse_float

try_parse_bool

Semântica de bytes (UTF-8)

Casos extremos em resumo

Veja também

`split` → loop

Percorrendo bytes com `at`

`parse_int` vs `try_parse_int`

`try_parse_float`

`try_parse_bool`