HMT: Hierarchical Memory Transformer

Hierarchical Memory Transformer (HMT) is a novel framework that enables and improves models' long-context processing ability by imitating human memorization behavior. Leveraging memory-augmented segment-level recurrence, we organize the memory hierarchy by preserving tokens from early input tokens segments, passing memory embeddings along the sequence, and recalling relevant information from history.

Features

Easy to use: If you pretrained a new LLM and want to augment with HMT, simply push the model checkpoints to huggingface and use the --model_name argument to pull your model.
Command line centric: To play with different configurations of HMT during training, simply modify the argument in the command line. There is no need to modify the source code.
Memory efficient: With small and fixed segment lengths for inputs, HMT can still achieve comparable or better effectiveness than models inferencing with longer context, which consumes more GPU VRAM.
Long context with topic switching: HMT is equipped with a memory recall mechanism, which can handle multiple topics in a single long document to filter distractive information.

Updates

[2024/05/12]: The paper is available on Arxiv

Code Structure

hmt_src/main.py: Entrypoint of HMT command line.
hmt_src/pubmedqa_ds_preprocess.py: Script used to preprocess PubMedQA dataset.
modeling_rmt/language_modeling.py: The HMT model source code.
modeling_rmt/long_mem_cross_attn.py: The memory recall module.

Instructions

The code adapts the recurrent memory transformer repository (https://github.com/booydar/recurrent-memory-transformer). To use HMT, please install the dependencies in requirement.txt.

conda create -n hmt python=3.10 -y
conda activate hmt
pip install -r requirement.txt

For AMD GPUs: Please install the ROCm version of pytorch

pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/rocm5.6

Run accelerate config and set up the distributed training/evaluation environment based on your cluster configuration. If you use AMD MI210 GPUs, here is an example configuration with DeepSpeed for 4 GPU cluster:

- `Accelerate` version: 0.25.0
- Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Numpy version: 1.24.4
- PyTorch version (GPU?): 2.1.0+rocm5.6 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 503.71 GB
- GPU type: AMD Instinct MI210
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: DEEPSPEED
        - mixed_precision: bf16
        - use_cpu: False
        - debug: False
        - num_processes: 4
        - machine_rank: 0
        - num_machines: 1
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - deepspeed_config: {'gradient_accumulation_steps': 1, 'gradient_clipping': 1.0, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2}
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Run the following command to try PG-19 dataset with Llama 2 7B as the backbone:

accelerate launch hmt_src/main.py --task_name=emozilla/pg19 --use_lora --learning_rate=1e-4 --model_name=meta-llama/Llama-2-7b-hf --lr_decay --lr_decay_gamma=0.6 --training_step=600 --num_sensory=32 --bptt_depth=6 --train_set_split=50 --num_seg_save=8 --batch_size=2 --test_length=30000

You can use --inference_only flag to only perform inferencing for GPU profiling.

For more functionalities of HMT, please check accelerate launch hmt_src/main.py --help.

Examples on Training and Evaluating

The scripts folder contains the some examples to to train and evaluate the models on pretrained dataset and downstream tasks.

Author's Note

This work is developed and experimented on AMD GPUs, which is a relatively new attempt considering that a majority of deep learning/natural language processing works use NVIDIA GPUs. It is possible that we can utilize tools that currently do not support ROCm (e.g., FlashAttention, bitsandbytes, etc.) to further optimize memory utilization, inference speed, and effectiveness. If you have any new ideas in improving the HMT framework on AMD/NVIDIA GPUs or new evaluation results using these tools, please don't hesitate creating a Github issue and we will look into it.

If you encounter bugs or have any questions, you can send an email to Zifan ([email protected]) or submit a Github issue.

Name		Name	Last commit message	Last commit date
Latest commit History 464 Commits
hmt_src		hmt_src
img		img
modeling_rmt		modeling_rmt
scripts		scripts
tools		tools
unittests		unittests
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
accelerate.yaml		accelerate.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMT: Hierarchical Memory Transformer

Features

Updates

Code Structure

Instructions

Examples on Training and Evaluating

Author's Note

About

Releases

Packages

Contributors 2

Languages

License

OswaldHe/HMT-pytorch

Folders and files

Latest commit

History

Repository files navigation

HMT: Hierarchical Memory Transformer

Features

Updates

Code Structure

Instructions

Examples on Training and Evaluating

Author's Note

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages