[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

Sosycs · 2023-09-29T17:47:45Z

Thanks for the great work,
I am trying to implement the work of this paper on google colab with 166 G disk and T4. but at the training stage for both rationale generation and answer inference I got the output:

2023-09-29 17:27:49.955571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-large', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
  "data_root": "/content/mm-cot/data",
  "output_dir": "/content/mm-cot/experiments",
  "model": "declare-lab/flan-alpaca-large",
  "options": [
    "A",
    "B",
    "C",
    "D",
    "E"
  ],
  "epoch": 50,
  "lr": 5e-05,
  "bs": 2,
  "input_len": 512,
  "output_len": 512,
  "eval_bs": 4,
  "eval_acc": null,
  "train_split": "train",
  "val_split": "val",
  "test_split": "test",
  "use_generate": true,
  "final_eval": false,
  "user_msg": "rationale",
  "img_type": "vit",
  "eval_le": null,
  "test_le": null,
  "evaluate_dir": null,
  "caption_file": "data/instruct_captions.json",
  "use_caption": true,
  "prompt_format": "QCM-E",
  "seed": 42
}
img_features size:  torch.Size([11208, 145, 1024])
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large...

and the cell stop and the expermint folder is empty. can anyone explain what is the problem for me? (I am still a new learner in the field)

The text was updated successfully, but these errors were encountered:

cooelf · 2023-10-15T09:00:36Z

Hi, did you try to conduct a unit test to see if it is possible to load a pre-trained model using huggingface?

My guess is that the memory is not enough for loading the model.

from transformers import T5ForConditionalGeneration

# you may also try to change "declare-lab/flan-alpaca-large" to "declare-lab/flan-alpaca-base" to see if it goes well.
model = T5ForConditionalGeneration.from_pretrained("declare-lab/flan-alpaca-large")

Sosycs · 2023-10-17T16:53:10Z

after a unit test by loading the model from huggingface.

(…)an-alpaca-large/resolve/main/config.json: 100%
787/787 [00:00<00:00, 56.3kB/s]
model.safetensors: 100%
3.13G/3.13G [00:16<00:00, 261MB/s]
(…)arge/resolve/main/generation_config.json: 100%
142/142 [00:00<00:00, 13.6kB/s]

I have changed the model from large to base but I have encountered the same:

2023-10-17 16:48:58.529434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-base', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
  "data_root": "/content/mm-cot/data",
  "output_dir": "/content/mm-cot/experiments",
  "model": "declare-lab/flan-alpaca-base",
  "options": [
    "A",
    "B",
    "C",
    "D",
    "E"
  ],
  "epoch": 50,
  "lr": 5e-05,
  "bs": 2,
  "input_len": 512,
  "output_len": 512,
  "eval_bs": 4,
  "eval_acc": null,
  "train_split": "train",
  "val_split": "val",
  "test_split": "test",
  "use_generate": true,
  "final_eval": false,
  "user_msg": "rationale",
  "img_type": "vit",
  "eval_le": null,
  "test_le": null,
  "evaluate_dir": null,
  "caption_file": "data/instruct_captions.json",
  "use_caption": true,
  "prompt_format": "QCM-E",
  "seed": 42
}
img_features size:  torch.Size([11208, 145, 1024])
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[16:49:05] [Model]: Loading declare-lab/flan-alpaca-base...

I am using google colab T4 with high RAM

cooelf · 2024-05-19T06:45:05Z

The hanging may also be reasonable as the main process could be handling the data after loading the model (there is no signal for indicating the completion of model loading).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

Sosycs commented Sep 29, 2023

cooelf commented Oct 15, 2023 •

edited

Loading

Sosycs commented Oct 17, 2023

cooelf commented May 19, 2024

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

Comments

Sosycs commented Sep 29, 2023

cooelf commented Oct 15, 2023 • edited Loading

Sosycs commented Oct 17, 2023

cooelf commented May 19, 2024

cooelf commented Oct 15, 2023 •

edited

Loading