A collection of UDFs for MariaDB & MySQL, written using the rust udf library. For instructions on how to use these libraries, jump to the Installation section.
New function contributions are welcome!
The following UDFs are includes:
- UUIDs: generate and convert v1, v2, v6, and v7 UUIDs
- Hash Algorithms: run a wide variety of hash algorithms, including the following families:
blake,sha,keccak,sha3, andxxhash - IP Functions for interop:
ip_validate,ip_to_canonical,ip_to_ipv4_mapped - String Operations: Calculations such as Levenshtein edit distance, including limited and normalized versions.
- Jsonify: convert any data to JSON
- Lipsum: generate random text
See the relevant section for more information.
Provide UUID functions similar to the Postges uuid-osp package:
- Generate v1 and v4 UUIDs (v3 & v5 coming soon)
- Generate the new v6 and v7 UUIDs
- Validate UUIDs
- Create namespace UUIDs
uuid_to_binanduuid_from_bin/bin_to_uuidfunctions, including bit rearranging options
See the UUID Readme for more information
note: type uuid is type string uuid_generate_v1() -> uuid uuid_generate_v1mc() -> uuid uuid_generate_v4() -> uuid uuid_generate_v6([node_addr: string]) -> uuid uuid_generate_v7() -> uuid uuid_nil() -> uuid uuid_max() -> uuid uuid_ns_dns() -> string uuid_ns_url() -> string uuid_ns_oid() -> string uuid_ns_x500() -> string uuid_is_valid(uuid: uuid) -> boolean uuid_to_bin(uuid: uuid) -> bytes uuid_from_bin() -> uuid bin_from_uuid() -> uuid MariaDB [(none)]> select uuid_generate_v6(); +--------------------------------------+ | uuid_generate_v6() | +--------------------------------------+ | 1ede5b09-ea01-6208-bca8-8809c0dd8e70 | +--------------------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> select hex(uuid_to_bin(uuid_generate_v4())); +--------------------------------------+ | hex(uuid_to_bin(uuid_generate_v4())) | +--------------------------------------+ | B1B3AB9D490A4D20BFBD026AB1C045FB | +--------------------------------------+ 1 row in set (0.002 sec) This library provides the following functions:
blake2b512,blake2s256,blake3,blake3_thd.blake3_thdprovides a multithreaded hasher that can be much faster for large data; per the docs, 128 KiB is about the minimum size to see any signifcant improvement overblake3.sha224,sha256,sha384,sha512(these are also built in)keccak224,keccak256sha3_224,sha3_256,sha3_384,sha3_512xxhash3,xxhash32,xxhash64,xxhash(xxhashis an alias forxxhash64)
All of these return hex strings by default. _bin functions are also provided that return the binary result without going through hexification, suitable for storage in a BINARY(X) column.
blake2b512(a: any [, ...]) -> string blake2b512_bin(a: any [, ...]) -> bytes blake2s512(a: any [, ...]) -> string blake2s512_bin(a: any [, ...]) -> bytes blake3(a: any [, ...]) -> string blake3_bin(a: any [, ...]) -> bytes blake3_thd(a: any [, ...]) -> string blake3_thd_bin(a: any [, ...]) -> bytes md5_u(a: any [, ...]) -> string md5_u_bin(a: any [, ...]) -> bytes sha1_u(a: any [, ...]) -> string sha1_u_bin(a: any [, ...]) -> bytes sha224(a: any [, ...]) -> string sha224_bin(a: any [, ...]) -> bytes sha256(a: any [, ...]) -> string sha256_bin(a: any [, ...]) -> bytes sha384(a: any [, ...]) -> string sha384_bin(a: any [, ...]) -> bytes sha512(a: any [, ...]) -> string sha512_bin(a: any [, ...]) -> bytes keccak224(a: any [, ...]) -> string keccak224_bin(a: any [, ...]) -> bytes keccak256(a: any [, ...]) -> string keccak256_bin(a: any [, ...]) -> bytes sha3_224(a: any [, ...]) -> string sha3_224_bin(a: any [, ...]) -> bytes sha3_256(a: any [, ...]) -> string sha3_256_bin(a: any [, ...]) -> bytes sha3_384(a: any [, ...]) -> string sha3_384_bin(a: any [, ...]) -> bytes sha3_512(a: any [, ...]) -> string sha3_512_bin(a: any [, ...]) -> bytes xxhash(a: any [, ...]) -> integer xxhash3(a: any [, ...]) -> integer xxhash32(a: any [, ...]) -> integer xxhash64(a: any [, ...]) -> integer MariaDB [(none)]> select blake3("Hello, world!"); +------------------------------------------------------------------+ | blake3("Hello, world!") | +------------------------------------------------------------------+ | EDE5C0B10F2EC4979C69B52F61E42FF5B413519CE09BE0F14D098DCFE5F6F98D | +------------------------------------------------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> select sha3_256("Hello, world!"); +------------------------------------------------------------------+ | sha3_256("Hello, world!") | +------------------------------------------------------------------+ | F345A219DA005EBE9C1A1EAAD97BBF38A10C8473E41D0AF7FB617CAA0C6AA722 | +------------------------------------------------------------------+ 1 row in set (0.000 sec) MariaDB [(none)]> select blake3_bin("Hello, world!"); +----------------------------------+ | blake3_bin("Hello, world!") | +----------------------------------+ | ����.ė�i�/a�/��Q����M ������ | +----------------------------------+ 1 row in set (0.000 sec) For all hash functions, multiple arguments are combined to produce a single hash output:
MariaDB [(none)]> select xxhash('Hello, ', 0x77, 'orld', '!'); +--------------------------------------+ | xxhash('Hello, ', 0x77, 'orld', '!') | +--------------------------------------+ | -755700219241327498 | +--------------------------------------+ 1 row in set (0.000 sec) Note that in SQL, all integers are an i64, all floats are a f64, and all decimals are represented as a string to the UDF API. This library hashes these types as their little endian representation on all platforms. (You only need to worry about this if you have very obscure platform compatibility requirements. Strings and blobs are always unambiguous).
Provide the function levenshtein, which calculates the levenshtein edit distance between two strings. There is also levenshtein_normalized that returns a value between 0.0 (identical) and 1.0 (significantly different).
If a limit is provided as a third argument, the operation will terminate if that limit is exceeded. This can help to improve performance if filtering dissimilar strings.
These algorithms provide a byte edit distance, rather than unicode chars or graphemes. These options may be added in the future.
These algorithms are implemented by the rapidfuzz crate.
levenshtein(a: str, b: str [, limit: integer]) -> integer; levenshtein_normalized(a: str, b: str [, limit: real]) -> real; MariaDB [(none)]> SELECT levenshtein('foo', 'moose'), levenshtein_normalized('foo', 'moos'); +-----------------------------+---------------------------------------+ | levenshtein('foo', 'moose') | levenshtein_normalized('foo', 'moos') | +-----------------------------+---------------------------------------+ | 3 | 0.5 | +-----------------------------+---------------------------------------+ 1 row in set (0.001 sec) MariaDB [(none)]> SELECT levenshtein('foo', 'moose', 2), levenshtein_normalized('foo', 'moos', 0.3); +--------------------------------+--------------------------------------------+ | levenshtein('foo', 'moose', 2) | levenshtein_normalized('foo', 'moos', 0.3) | +--------------------------------+--------------------------------------------+ | 2 | 0.3 | +--------------------------------+--------------------------------------------+ 1 row in set (0.001 sec) Provide the function jsonify, which quickly creates JSON output for any given inputs.
jsonify(a: any [, ...]) -> string MariaDB [db]> select jsonify(qty, cost, class) from t1 limit 4; +-------------------------------------+ | jsonify(qty, cost, class) | +-------------------------------------+ | {"class":"a","cost":50.0,"qty":10} | | {"class":"c","cost":5.6,"qty":8} | | {"class":"a","cost":20.7,"qty":5} | | {"class":"b","cost":12.78,"qty":10} | +-------------------------------------+ 4 rows in set (0.000 sec) Aliasing also works to change key names:
MariaDB [db]> select jsonify(uuid() as uuid, qty as quantity, cost) from t1 limit 4; +----------------------------------------------------------------------------+ | jsonify(uuid() as uuid, qty as quantity, cost) | +----------------------------------------------------------------------------+ | {"cost":50.0,"quantity":10,"uuid":"45952863-5b4d-11ed-b214-0242ac110002"} | | {"cost":5.6,"quantity":8,"uuid":"4595291b-5b4d-11ed-b214-0242ac110002"} | | {"cost":20.7,"quantity":5,"uuid":"45952953-5b4d-11ed-b214-0242ac110002"} | | {"cost":12.78,"quantity":10,"uuid":"4595297a-5b4d-11ed-b214-0242ac110002"} | +----------------------------------------------------------------------------+ 4 rows in set (0.001 sec) Uses the lipsum crate to generate lipsum strings with a specified word count.
lipsum(count: integer [, seed: integer]) -> string MariaDB [(none)]> select lipsum(10); +------------------------------------------------------------------+ | lipsum(10) | +------------------------------------------------------------------+ | Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do. | +------------------------------------------------------------------+ 1 row in set (0.000 sec) We provide three IP functions:
ip_validatewhich will return eitheripv4oripv6if the format is valid,NULLotherwise.ip_to_ipv6_mappedwhich converts ipv4 addresses to their ipv6 form (e.g. for interop with theINET6data type)ip_to_canonicalwhich reverses the mapping operation
ip_validate(ip: string) -> string ip_to_canonical(ip: string) -> string ip_to_ipv6_mapped(ip: string) -> string MariaDB [db]> select -> input, -> ip_validate(input), -> ip_to_ipv6_mapped(input), -> ip_to_canonical(input) -> from t1; +--------------------------------------+--------------------+--------------------------------------+--------------------------------------+ | input | ip_validate(input) | ip_to_ipv6_mapped(input) | ip_to_canonical(input) | +--------------------------------------+--------------------+--------------------------------------+--------------------------------------+ | 203.0.113.0 | ipv4 | ::ffff:203.0.113.0 | 203.0.113.0 | | 127.0.0.1 | ipv4 | ::ffff:127.0.0.1 | 127.0.0.1 | | ::ffff:127.0.0.1 | ipv6 | ::ffff:127.0.0.1 | 127.0.0.1 | | 2001:db8::1:0:0:1 | ipv6 | 2001:db8::1:0:0:1 | 2001:db8::1:0:0:1 | | 2001:db8:85a3:8d3:1319:8a2e:370:7348 | ipv6 | 2001:db8:85a3:8d3:1319:8a2e:370:7348 | 2001:db8:85a3:8d3:1319:8a2e:370:7348 | | hello! | NULL | NULL | NULL | | NULL | NULL | NULL | NULL | +--------------------------------------+--------------------+--------------------------------------+--------------------------------------+ 7 rows in set (0.000 sec) Compiled library binaries can be downloaded from this library's releases page. The desired files can be copied to the plugin directory (usually /usr/lib/mysql/plugin) and selectively loaded:
-- **** Hash functions **** CREATE OR REPLACE FUNCTION blake2b512 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION blake2s256 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION blake3 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION blake3_thd RETURNS string SONAME 'libudf_hash.so'; -- the md5 and sha functions have builtin versions, hence the `_u` suffix CREATE OR REPLACE FUNCTION md5_u RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha1_u RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha224 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha256 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha384 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha512 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION keccak224 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION keccak256 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_224 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_256 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_384 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_512 RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION xxhash RETURNS integer SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION xxhash3 RETURNS integer SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION xxhash32 RETURNS integer SONAME 'libudf_hash.so'; -- `xxhash` and `xxhash64` are aliases CREATE OR REPLACE FUNCTION xxhash64 RETURNS integer SONAME 'libudf_hash.so'; -- binary-returning versions of hash algorithms, as a convenience alternative to -- `unhex(blake3(...))` CREATE OR REPLACE FUNCTION blake2b512_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION blake2s256_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION blake3_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION blake3_thd_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION md5_u_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha1_u_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha224_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha256_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha384_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha512_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION keccak224_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION keccak256_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_224_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_256_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_384_bin RETURNS string SONAME 'libudf_hash.so'; CREATE OR REPLACE FUNCTION sha3_512_bin RETURNS string SONAME 'libudf_hash.so'; -- **** JSON creation function **** CREATE OR REPLACE FUNCTION jsonify RETURNS string SONAME 'libudf_jsonify.so'; -- **** IP functions **** CREATE OR REPLACE FUNCTION ip_validate RETURNS string SONAME 'libudf_net.so'; CREATE OR REPLACE FUNCTION ip_to_canonical RETURNS string SONAME 'libudf_net.so'; CREATE OR REPLACE FUNCTION ip_to_ipv6_mapped RETURNS string SONAME 'libudf_net.so'; -- **** string operation functions **** CREATE OR REPLACE FUNCTION levenshtein RETURNS integer SONAME 'libudf_stringops.so' CREATE OR REPLACE FUNCTION levenshtein_normalized RETURNS real SONAME 'libudf_stringops.so' -- **** random string generation **** CREATE OR REPLACE FUNCTION lipsum RETURNS string SONAME 'libudf_lipsum.so'; -- **** UUID interfaces **** CREATE OR REPLACE FUNCTION uuid_generate_v1 RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_generate_v1mc RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_generate_v4 RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_generate_v6 RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_generate_v7 RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_nil RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_max RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_ns_dns RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_ns_url RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_ns_oid RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_ns_x500 RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_is_valid RETURNS integer SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_to_bin RETURNS string SONAME 'libudf_uuid.so'; CREATE OR REPLACE FUNCTION uuid_from_bin RETURNS string SONAME 'libudf_uuid.so'; -- `bin_to_uuid` and 'uuid_from_bin' are aliases CREATE OR REPLACE FUNCTION bin_to_uuid RETURNS string SONAME 'libudf_uuid.so';Note that Windows .dlls are built but have not been tested - please open an issue if you encounter any errors.
To build the binaries yourself, you can clone this repository and run:
cargo build --releaseWhich will produce the desired dynamic library files in target/release. Specific functions can also be specified with -p (e.g. cargo build --release -p udf-uuid).
This repository also comes with a docker file that simplifies getting an image up and running:
# build the image docker build . --tag mdb-udf-suite-img # run it in the background docker run --rm -d \ -e MARIADB_ROOT_PASSWORD=example \ --name mdb-udf-suite \ mdb-udf-suite-img # Enter a SQL shell docker exec -it mdb-udf-suite mariadb -pexample # Stop the server when done docker stop mdb-udf-suiteThe UDFs can then be loaded using the CREATE FUNCTION statements above.
This project has a MSRV of 1.65, but makes no commitment to uphold this.
This work is dual-licensed under Apache 2.0 and GPL 2.0 (or any later version). You can choose either of them if you use this work.
SPDX-License-Identifier: Apache-2.0 OR GPL-2.0-or-later