Skip to content

Commit

Permalink
fix: remove poetry
Browse files Browse the repository at this point in the history
  • Loading branch information
csunny committed Aug 27, 2024
1 parent c42b2f1 commit 3baaf97
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 18 deletions.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@
[**简体中文**](README.zh.md) | [**Discord**](https://discord.gg/7uQnPuveTY) | [**Wechat**](https://github.com/eosphoros-ai/DB-GPT/blob/main/README.zh.md#%E8%81%94%E7%B3%BB%E6%88%91%E4%BB%AC) | [**Huggingface**](https://huggingface.co/eosphoros) | [**Community**](https://github.com/eosphoros-ai/community)

[**Text2SQL**](README.md) | [**Text2NLU**](src/dbgpt-hub-nlu/README.zh.md)

</div>

## 🔥🔥🔥 News
Expand Down Expand Up @@ -388,8 +387,9 @@ git clone https://github.com/eosphoros-ai/DB-GPT-Hub.git
cd DB-GPT-Hub
conda create -n dbgpt_hub python=3.10
conda activate dbgpt_hub
pip install poetry
poetry install
cd src/dbgpt_hub_sql
pip install -e .
```
### 3.2 Quick Start

Expand Down Expand Up @@ -491,7 +491,7 @@ Download the [Spider dataset]((https://drive.google.com/uc?export=download&id=1T
For the data preprocessing part, simply **run the following script** :
```bash
## generate train and dev(eval) data
poetry run sh dbgpt_hub_sql/scripts/gen_train_eval_data.sh
sh dbgpt_hub_sql/scripts/gen_train_eval_data.sh
```

In the directory `dbgpt_hub_sql/data/`, you will find the newly generated training file example_text2sql_train.json and testing file example_text2sql_dev.json, containing 8659 and 1034 entries respectively. For the data used in subsequent fine-tuning, set the parameter `file_name` value to the file name of the training set in dbgpt_hub_sql/data/dataset_info.json, such as example_text2sql_train.json
Expand All @@ -515,7 +515,7 @@ The model fine-tuning supports both LoRA and QLoRA methods. We can run the follo
Run the command:

```bash
poetry run sh dbgpt_hub_sql/scripts/train_sft.sh
sh dbgpt_hub_sql/scripts/train_sft.sh
```

After fine-tuning, the model weights will be saved by default in the adapter folder, specifically in the dbgpt_hub_sql/output/adapter directory.
Expand Down Expand Up @@ -585,7 +585,7 @@ In the script, during fine-tuning, different models correspond to key parameters
Under the project directory ./dbgpt_hub_sql/output/pred/, this folder is the default output location for model predictions(if not exist, just mkdir).

```bash
poetry run sh ./dbgpt_hub_sql/scripts/predict_sft.sh
sh ./dbgpt_hub_sql/scripts/predict_sft.sh
```

In the script, by default with the parameter `--quantization_bit`, it predicts using QLoRA. Removing it switches to the LoRA prediction method.
Expand All @@ -600,7 +600,7 @@ You can find the second corresponding model weights from Huggingface [hg-eospho
If you need to merge the weights of the trained base model and the fine-tuned Peft module to export a complete model, execute the following model export script:

```bash
poetry run sh ./dbgpt_hub_sql/scripts/export_merge.sh
sh ./dbgpt_hub_sql/scripts/export_merge.sh
```

Be sure to replace the parameter path values in the script with the paths corresponding to your project.
Expand All @@ -609,7 +609,7 @@ Be sure to replace the parameter path values in the script with the paths corres
To evaluate model performance on the dataset, default is spider dev dataset.
Run the following command:
```bash
poetry run python dbgpt_hub_sql/eval/evaluation.py --plug_value --input Your_model_pred_file
python dbgpt_hub_sql/eval/evaluation.py --plug_value --input Your_model_pred_file
```
You can find the results of our latest review and part of experiment results [here](docs/eval_llm_result.md)
**Note**: The database pointed to by the default code is a 95M database downloaded from [Spider official website] (https://yale-lily.github.io/spider). If you need to use Spider database (size 1.27G) in [test-suite](https://github.com/taoyds/test-suite-sql-eval), please download the database in the link to the custom directory first, and run the above evaluation command which add parameters and values ​​like `--db Your_download_db_path`.
Expand Down Expand Up @@ -651,13 +651,13 @@ We warmly invite more individuals to join us and actively engage in various aspe

Before submitting your code, please ensure that it is formatted according to the black style by using the following command:
```
poetry run black dbgpt_hub
black dbgpt_hub
```

If you have more time to execute more detailed type checking and style checking of your code, please use the following command:
```
poetry run pyright dbgpt_hub
poetry run pylint dbgpt_hub
pyright dbgpt_hub
pylint dbgpt_hub
```

If you have any questions or need further assistance, don't hesitate to reach out. We appreciate your involvement!
Expand Down
15 changes: 8 additions & 7 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -379,8 +379,9 @@ git clone https://github.com/eosphoros-ai/DB-GPT-Hub.git
cd DB-GPT-Hub
conda create -n dbgpt_hub python=3.10
conda activate dbgpt_hub
pip install poetry
poetry install
cd src/dbgpt_hub_sql
pip install -e .
```

### 3.2、数据准备
Expand All @@ -391,7 +392,7 @@ DB-GPT-Hub使用的是信息匹配生成法进行数据准备,即结合表信
数据预处理部分,**只需运行如下脚本**即可:
```bash
## 生成train数据 和dev(eval)数据,
poetry run sh dbgpt_hub_sql/scripts/gen_train_eval_data.sh
sh dbgpt_hub_sql/scripts/gen_train_eval_data.sh
```
`dbgpt_hub_sql/data/`目录你会得到新生成的训练文件example_text2sql_train.json 和测试文件example_text2sql_dev.json ,数据量分别为8659和1034条。 对于后面微调时的数据使用在dbgpt_hub_sql/data/dataset_info.json中将参数`file_name`值给为训练集的文件名,如example_text2sql_train.json。

Expand Down Expand Up @@ -504,7 +505,7 @@ start_evaluate(evaluate_args)
默认QLoRA微调,运行命令:

```bash
poetry run sh dbgpt_hub_sql/scripts/train_sft.sh
sh dbgpt_hub_sql/scripts/train_sft.sh
```
微调后的模型权重会默认保存到adapter文件夹下面,即dbgpt_hub_sql/output/adapter目录中。
**如果使用多卡训练,想要用deepseed** ,则将train_sft.sh中默认的内容进行更改,
Expand Down Expand Up @@ -571,7 +572,7 @@ deepspeed --include localhost:3,4 dbgpt_hub_sql/train/sft_train.py \
项目目录下`./dbgpt_hub_sql/`下的`output/pred/`,此文件路径为关于模型预测结果默认输出的位置(如果没有则建上)。
预测运行命令:
```bash
poetry run sh ./dbgpt_hub_sql/scripts/predict_sft.sh
sh ./dbgpt_hub_sql/scripts/predict_sft.sh
```
脚本中默认带着参数`--quantization_bit `为QLoRA的预测,去掉即为LoRA的预测方式。
其中参数`predicted_input_filename` 为要预测的数据集文件, `--predicted_out_filename` 的值为模型预测的结果文件名。默认结果保存在`dbgpt_hub_sql/output/pred`目录。
Expand All @@ -583,7 +584,7 @@ poetry run sh ./dbgpt_hub_sql/scripts/predict_sft.sh
#### 3.5.1 模型和微调权重合并
如果你需要将训练的基础模型和微调的Peft模块的权重合并,导出一个完整的模型。则运行如下模型导出脚本:
```bash
poetry run sh ./dbgpt_hub_sql/scripts/export_merge.sh
sh ./dbgpt_hub_sql/scripts/export_merge.sh
```
注意将脚本中的相关参数路径值替换为你项目所对应的路径。

Expand All @@ -593,7 +594,7 @@ poetry run sh ./dbgpt_hub_sql/scripts/export_merge.sh
运行以下命令来:

```bash
poetry run python dbgpt_hub_sql/eval/evaluation.py --plug_value --input Your_model_pred_file
python dbgpt_hub_sql/eval/evaluation.py --plug_value --input Your_model_pred_file
```
你可以在[这里](docs/eval_llm_result.md)找到我们最新的评估和实验结果。
**注意**: 默认的代码中指向的数据库为从[Spider官方网站](https://yale-lily.github.io/spider)下载的大小为95M的database,如果你需要使用基于Spider的[test-suite](https://github.com/taoyds/test-suite-sql-eval)中的数据库(大小1.27G),请先下载链接中的数据库到自定义目录,并在上述评估命令中增加参数和值,形如`--db Your_download_db_path`
Expand Down

0 comments on commit 3baaf97

Please sign in to comment.