Skip to content

Latest commit

 

History

History
82 lines (60 loc) · 2.29 KB

README.md

File metadata and controls

82 lines (60 loc) · 2.29 KB

UME

A Rust implementation of the UME Character Encoding

Specification

UME has no real specification yet. For now, this implementation serves as the primary definition.

Binary representation of sequences with 1-4 bytes (data displayed as "x"):

Byte 1 Byte 2 Byte 3 Byte 4
0xxxxxxx
11xxxxxx 101xxxxx
11xxxxxx 100xxxxx 101xxxxx
11xxxxxx 100xxxxx 100xxxxx 101xxxxx

Byte order of data: Big-Endian
Bit order of data: most significant bit first (MSB 0)

Example:

Char Unicode code point Binary data UME encoded
a U+0061 01100001 01100001
ӕ U+04D5 00000100 11010101 11100110 10110101

Installation

This crate is not available on crates.io. To use it you will have to link the source directly:

[dependencies]
ume = { git = "https://github.com/into-the-v0id/ume.rs" }

Usage

Strings:

use ume::ume8::Ume8String;
use ume::ume8::Ume8Str;

pub fn main() {
    let string: Ume8String = Ume8String::from("aöӕธ💻");
    let str: &Ume8Str = &string;

    assert_eq!(str.chars().count(), 5);
    assert_eq!(str.contains(&Ume8String::from('ӕ')), true);
}

Streams:

use ume::ume8::DecodeUnchecked;
use ume::ume8::EncodeUnchecked;

pub fn main() {
    let data = vec![
        'a' as u32,
        'ö' as u32,
        'ӕ' as u32,
        'ธ' as u32,
        '💻' as u32,
    ];

    let encoded_data = EncodeUnchecked::new(data.iter().cloned())
        .collect::<Vec<u8>>();

    let decoded_data = DecodeUnchecked::new(encoded_data.iter().cloned())
        .collect::<Vec<u32>>();

    assert_eq!(decoded_data, data);
}

Limitations

In theory, a single sequence can contain an unlimited amount of bytes. For performance reasons, this implementation limits the size of a single sequence to 4 bytes and thus 21 bits of data.

License

Copyright (C) Oliver Amann

This project is licensed under the MIT License (MIT) or the Apache License Version 2.0 (Apache-2.0). Please see LICENSE-MIT and LICENSE-APACHE for more information.