diff --git a/README.md b/README.md
index bb3f6de..eb59629 100644
--- a/README.md
+++ b/README.md
@@ -24,9 +24,8 @@
   </p>
 
 
-[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
-
 
+[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
 [**Text2SQL**](README.zh.md) | [**Text2GQL**](src/dbgpt-hub-gql/README.zh.md) | [**Text2NLU**](src/dbgpt-hub-nlu/README.zh.md)
 
 </div>
@@ -684,14 +683,16 @@ Our work is primarily based on the foundation of numerous open-source contributi
 Thanks to all the contributors, especially @[JBoRu](https://github.com/JBoRu) who raised the [issue](https://github.com/eosphoros-ai/DB-GPT-Hub/issues/119) which reminded us to add a new promising evaluation way, i.e. Test Suite. As the paper 《SQL-PALM: IMPROVED LARGE LANGUAGE MODEL ADAPTATION FOR TEXT-TO-SQL》 mentioned, "We consider two commonly-used evaluation metrics: execution accuracy (EX) and test-suite accuracy (TS). EX measures whether the SQL execution outcome matches ground truth (GT), whereas TS measures whether the SQL passes all EX evaluations for multiple tests, generated by database augmentation. Since EX contains false positives, we consider TS as a more reliable evaluation metric".
 
 ## 7. Citation
-Please consider citing our project if you find it useful:
+If you find `DB-GPT-Hub` useful for your research or development, please cite the following <a href="https://arxiv.org/abs/2406.11434" target="_blank">paper</a>:
 
 ```bibtex
-@software{db-gpt-hub,
-    author = {DB-GPT-Hub Team},
-    title = {{DB-GPT-Hub}},
-    url = {https://github.com/eosphoros-ai/DB-GPT-Hub},
-    year = {2023}
+@misc{zhou2024dbgpthub,
+      title={DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models}, 
+      author={Fan Zhou and Siqiao Xue and Danrui Qi and Wenhui Shi and Wang Zhao and Ganglin Wei and Hongyang Zhang and Caigai Jiang and Gangwei Jiang and Zhixuan Chu and Faqiang Chen},
+      year={2024},
+      eprint={2406.11434},
+      archivePrefix={arXiv},
+      primaryClass={id='cs.DB' full_name='Databases' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.'}
 }
 ```
 
diff --git a/README.zh.md b/README.zh.md
index 902d6f7..3c6ada8 100644
--- a/README.zh.md
+++ b/README.zh.md
@@ -23,9 +23,8 @@
   </p>
 
 
-[**英文**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
-
 
+[**英文**](README.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)
 [**Text2SQL**](README.zh.md) | [**Text2GQL**](src/dbgpt-hub-gql/README.zh.md) | [**Text2NLU**](src/dbgpt-hub-nlu/README.zh.md)
 </div>
 
@@ -655,14 +654,16 @@ poetry run python dbgpt_hub_sql/eval/evaluation.py --plug_value --input  Your_mo
  **20231104** ,尤其感谢 @[JBoRu](https://github.com/JBoRu) 提的[issue](https://github.com/eosphoros-ai/DB-GPT-Hub/issues/119)， 指出我们的之前按照官方网站的95M的数据库去评估的方式的不足，如论文《SQL-PALM: IMPROVED LARGE LANGUAGE MODEL ADAPTATION FOR TEXT-TO-SQL》 指出的 "We consider two commonly-used evaluation metrics: execution accuracy (EX) and test-suite accuracy (TS) [32]. EX measures whether SQL execution outcome matches ground truth (GT), whereas TS measures whether the SQL passes all EX evaluation for multiple tests, generated by database-augmentation. Since EX contains false positives, we consider TS as a more reliable evaluation metric" 。
 
 ## 七、引用
-如果您觉得我们的项目对您的科研项目或者实际生产项目有帮助，请考虑在您的参考文献里引用`DB-GPT-Hub`:
+如果您发现`DB-GPT-Hub`对您的研究或开发有用，请引用以下<a href="https://arxiv.org/abs/2406.11434" target="_blank">论文</a>：
 
 ```bibtex
-@software{db-gpt-hub,
-    author = {DB-GPT-Hub Team},
-    title = {{DB-GPT-Hub}},
-    url = {https://github.com/eosphoros-ai/DB-GPT-Hub},
-    year = {2023}
+@misc{zhou2024dbgpthub,
+      title={DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models}, 
+      author={Fan Zhou and Siqiao Xue and Danrui Qi and Wenhui Shi and Wang Zhao and Ganglin Wei and Hongyang Zhang and Caigai Jiang and Gangwei Jiang and Zhixuan Chu and Faqiang Chen},
+      year={2024},
+      eprint={2406.11434},
+      archivePrefix={arXiv},
+      primaryClass={id='cs.DB' full_name='Databases' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers database management, datamining, and data processing. Roughly includes material in ACM Subject Classes E.2, E.5, H.0, H.2, and J.1.'}
 }
 ```