⌘+k ctrl+k
Search Shortcut cmd + k | ctrl + k
Text Functions

This section describes functions and operators for examining and manipulating string values. The symbol denotes a space character.

Function Description Example Result Alias
string ^@ search_string Alias for starts_with. 'abc' ^@ 'a' true  
string || string String concatenation 'Duck' || 'DB' DuckDB  
string[index] Alias for array_extract. 'DuckDB'[4] 'k'  
string[begin:end] Alias for array_slice. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. 'DuckDB'[:4] 'Duck'  
array_extract(list, index) Extract a single character using a (1-based) index. array_extract('DuckDB', 2) 'u' list_element, list_extract
array_slice(list, begin, end) Extract a string using slice conventions. Negative values are accepted. array_slice('DuckDB', 5, NULL) 'DB'  
ascii(string) Returns an integer that represents the Unicode code point of the first character of the string ascii('Ω') 937  
bar(x, min, max[, width]) Draw a band whose width is proportional to (x - min) and equal to width characters when x = max. width defaults to 80. bar(5, 0, 20, 10) ██▌  
bit_length(string) Number of bits in a string. bit_length('abc') 24  
chr(x) returns a character which is corresponding the ASCII code value or Unicode code point chr(65) A  
concat(string, ...) Concatenate many strings together concat('Hello', ' ', 'World') Hello World  
concat_ws(separator, string, ...) Concatenate strings together separated by the specified separator concat_ws(', ', 'Banana', 'Apple', 'Melon') Banana, Apple, Melon  
contains(string, search_string) Return true if search_string is found within string contains('abc', 'a') true  
ends_with(string, search_string) Return true if string ends with search_string ends_with('abc', 'c') true suffix
format(format, parameters...) Formats a string using the fmt syntax format('Benchmark "{}" took {} seconds', 'CSV', 42) Benchmark "CSV" took 42 seconds  
from_base64(string) Convert a base64 encoded string to a character string. from_base64('QQ==') 'A'  
hash(value) Returns an integer with the hash of the value hash('🦆') 2595805878642663834  
instr(string, search_string) Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. instr('test test', 'es') 2  
left(string, count) Extract the left-most count characters left('Hello🦆', 2) He  
left_grapheme(string, count) Extract the left-most grapheme clusters left_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1) 🤦🏼‍♂️  
length(string) Number of characters in string length('Hello🦆') 6  
length_grapheme(string) Number of grapheme clusters in string length_grapheme('🤦🏼‍♂️🤦🏽‍♀️') 2  
string LIKE target Returns true if the string matches the like specifier (see Pattern Matching) 'hello' LIKE '%lo' true  
like_escape(string, like_specifier, escape_character) Returns true if the string matches the like_specifier (see Pattern Matching). escape_character is used to search for wildcard characters in the string. like_escape('a%c', 'a$%c', '$') true  
lower(string) Convert string to lower case lower('Hello') hello lcase
lpad(string, count, character) Pads the string with the character from the left until it has count characters lpad('hello', 10, '>') >>>>>hello  
ltrim(string) Removes any spaces from the left side of the string ltrim('␣␣␣␣test␣␣') test␣␣  
ltrim(string, characters) Removes any occurrences of any of the characters from the left side of the string ltrim('>>>>test<<', '><') test<<  
md5(value) Returns the MD5 hash of the value md5('123') '202cb962ac59075b964b07152d234b70'  
nfc_normalize(string) Convert string to Unicode NFC normalized string. Useful for comparisons and ordering if text data is mixed between NFC normalized and not. nfc_normalize('ardèch') arde`ch  
not_like_escape(string, like_specifier, escape_character) Returns false if the string matches the like_specifier (see Pattern Matching). escape_character is used to search for wildcard characters in the string. like_escape('a%c', 'a$%c', '$') true  
ord(string) Return ASCII character code of the leftmost character in a string. ord('ü') 252  
position(search_string in string) Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. position('b' in 'abc') 2  
printf(format, parameters...) Formats a string using printf syntax printf('Benchmark "%s" took %d seconds', 'CSV', 42) Benchmark "CSV" took 42 seconds  
regexp_full_match(string, regex) Returns true if the entire string matches the regex (see Pattern Matching) regexp_full_match('anabanana', '(an)*') false  
regexp_matches(string, regex) Returns true if a part of string matches the regex (see Pattern Matching) regexp_matches('anabanana', '(an)*') true  
regexp_replace(string, regex, replacement, modifiers) Replaces the first occurrence of regex with the replacement, use 'g' modifier to replace all occurrences instead (see Pattern Matching) select regexp_replace('hello', '[lo]', '-') he-lo  
regexp_extract(string, regex[, group = 0]) Split the string along the regex and extract first occurrence of group regexp_extract('hello_world', '([a-z ]+)_?', 1) hello  
regexp_extract_all(string, regex[, group = 0]) Split the string along the regex and extract all occurrences of group regexp_extract_all('hello_world', '([a-z ]+)_?', 1) [hello, world]  
repeat(string, count) Repeats the string count number of times repeat('A', 5) AAAAA  
replace(string, source, target) Replaces any occurrences of the source with target in string replace('hello', 'l', '-') he--o  
reverse(string) Reverses the string reverse('hello') olleh  
right(string, count) Extract the right-most count characters right('Hello🦆', 3) lo🦆  
right_grapheme(string, count) Extract the right-most count grapheme clusters right_grapheme('🤦🏼‍♂️🤦🏽‍♀️', 1) 🤦🏽‍♀️  
rpad(string, count, character) Pads the string with the character from the right until it has count characters rpad('hello', 10, '<') hello<<<<<  
rtrim(string) Removes any spaces from the right side of the string rtrim('␣␣␣␣test␣␣') ␣␣␣␣test  
rtrim(string, characters) Removes any occurrences of any of the characters from the right side of the string rtrim('>>>>test<<', '><') >>>>test  
split_part(string, separator, index) Split the string along the separator and return the data at the (1-based) index of the list. If the index is outside the bounds of the list, return an empty string (to match PostgreSQL's behavior). split_part('a|b|c', '|', 2) b  
starts_with(string, search_string) Return true if string begins with search_string starts_with('abc', 'a') true  
string SIMILAR TO regex Returns true if the string matches the regex; identical to regexp_full_match (see Pattern Matching) 'hello' SIMILAR TO 'l+' false  
strlen(string) Number of bytes in string strlen('🦆') 4  
strpos(string, search_string) Alias of instr. Return location of first occurrence of search_string in string, counting from 1. Returns 0 if no match found. strpos('test test', 'es') 2  
strip_accents(string) Strips accents from string strip_accents('mühleisen') muhleisen  
string_split(string, separator) Splits the string along the separator string_split('hello␣world', '␣') ['hello', 'world'] str_split, string_to_array
string_split_regex(string, regex) Splits the string along the regex string_split_regex('hello␣world; 42', ';?␣') ['hello', 'world', '42'] regexp_split_to_array, str_split_regex
substring(string, start, length) Extract substring of length characters starting from character start. Note that a start value of 1 refers to the first character of the string. substring('Hello', 2, 2) el substr
substring_grapheme(string, start, length) Extract substring of length grapheme clusters starting from character start. Note that a start value of 1 refers to the first character of the string. substring_grapheme('🦆🤦🏼‍♂️🤦🏽‍♀️🦆', 3, 2) 🤦🏽‍♀️🦆  
to_base64(blob) Convert a blob to a base64 encoded string. to_base64('A'::blob) QQ== base64
trim(string) Removes any spaces from either side of the string trim('␣␣␣␣test␣␣') test  
trim(string, characters) Removes any occurrences of any of the characters from either side of the string trim('>>>>test<<', '><') test  
unicode(string) Returns the unicode code of the first character of the string unicode('ü') 252  
upper(string) Convert string to upper case upper('Hello') HELLO ucase

Text Similarity Functions

These functions are used to measure the similarity of two strings using various metrics.

Function Description Example Result
editdist3(string, string) Alias of levenshtein for SQLite compatibility. The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Different case is considered different. editdist3('duck', 'db') 3
hamming(string, string) The number of positions with different characters for 2 strings of equal length. Different case is considered different. hamming('duck', 'luck') 1
jaccard(string, string) The Jaccard similarity between two strings. Different case is considered different. Returns a number between 0 and 1. jaccard('duck', 'luck') 0.6
jaro_similarity(string, string) The Jaro similarity between two strings. Different case is considered different. Returns a number between 0 and 1. jaro_similarity('duck', 'duckdb') 0.88
jaro_winkler_similarity(string, string) The Jaro-Winkler similarity between two strings. Different case is considered different. Returns a number between 0 and 1. jaro_winkler_similarity('duck', 'duckdb') 0.93
levenshtein(string, string) The minimum number of single-character edits (insertions, deletions or substitutions) required to change one string to the other. Different case is considered different. levenshtein('duck', 'db') 3
damerau_levenshtein(string, string) Extension of Levenshtein distance to also include transposition of adjacent characters as an allowed edit operation. In other words, the minimum number of edit operations (insertions, deletions, substitutions or transpositions) required to change one string to another. Different case is considered different. damerau_levenshtein('duckdb', 'udckbd') 2
mismatches(string, string) The number of positions with different characters for 2 strings of equal length. Different case is considered different. mismatches('duck', 'luck') 1