The Offical Code for Our Paper:On The Inadequacy of Optimizing Alignment and Uniformity in Contrastive Learning of Sentence Representations, ICLR2023.
Most of our experiments were done in the changed version of SimCSE's codebase, thanks for their great work!
-
Create virtual environment and necessary dependence
conda create -n temp python=3.7 pip install -r requirement.txt
-
Install torch 1.12.0 for suitable CUDA version (higher versions should also be compatible)
# CUDA 11.6 pip install torch==1.12.0+cu116 --extra-index-url https://download.pytorch.org/whl/cu116 # CUDA 11.3 pip install torch==1.12.0+cu113 --extra-index-url https://download.pytorch.org/whl/cu113 # CUDA 10.2 pip install torch==1.12.0+cu102 --extra-index-url https://download.pytorch.org/whl/cu102
- Obtain 1m corpus for fine-tuning from the link provided by SimCSE
- Obtain the datasets of all downstream tasks in SentEval for evaluating from the official repository of SentEval
- Obtain pretrained BERT-base, BERT-large, RoBERTa-base and RoBERTa-large checkpoints provided by Hugging Face
- Change
TRAIN_FILE_PATH
inrun_unsup_example.sh
to the path of the pre-trained corpus - Change
EVAL_FILE_PATH
inrun_unsup_example.sh
to the path of the eval_dataset - Change
MODEL_PATH
inrun_unsup_example.sh
to the path of the pretrained model - Change
PATH_TO_SENTEVAL
inevaluation.py #17
andtrainer.py #85
to the path of SentEval - Change
PATH_TO_DATA
inevaluation.py #18
andtrainer.py #86
to the path of the datasets in SentEval
- replicate the results of the met loss function on BERT-base
bash run_unsup_example.sh
- Edit the hyper-parameter in the shell to replicate the results of other experiments
- The implementation of all loss functions
- The codes based on SimCSE
- The checkpoints in the paper
@inproceedings{nie2023inadequacy,
title={On The Inadequacy of Optimizing Alignment and Uniformity in Contrastive Learning of Sentence Representations},
author={Nie, Zhijie and Zhang, Richong and Mao, Yongyi},
booktitle={The Eleventh International Conference on Learning Representations},
year={2023}
}