Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

Open
Sosycs opened this issue Sep 29, 2023 · 3 comments
Open

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... #62

Sosycs opened this issue Sep 29, 2023 · 3 comments

Comments

@Sosycs
Copy link

Sosycs commented Sep 29, 2023

Thanks for the great work,
I am trying to implement the work of this paper on google colab with 166 G disk and T4. but at the training stage for both rationale generation and answer inference I got the output:

2023-09-29 17:27:49.955571: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-large', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
  "data_root": "/content/mm-cot/data",
  "output_dir": "/content/mm-cot/experiments",
  "model": "declare-lab/flan-alpaca-large",
  "options": [
    "A",
    "B",
    "C",
    "D",
    "E"
  ],
  "epoch": 50,
  "lr": 5e-05,
  "bs": 2,
  "input_len": 512,
  "output_len": 512,
  "eval_bs": 4,
  "eval_acc": null,
  "train_split": "train",
  "val_split": "val",
  "test_split": "test",
  "use_generate": true,
  "final_eval": false,
  "user_msg": "rationale",
  "img_type": "vit",
  "eval_le": null,
  "test_le": null,
  "evaluate_dir": null,
  "caption_file": "data/instruct_captions.json",
  "use_caption": true,
  "prompt_format": "QCM-E",
  "seed": 42
}
img_features size:  torch.Size([11208, 145, 1024])
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[17:28:39] [Model]: Loading declare-lab/flan-alpaca-large... 

and the cell stop and the expermint folder is empty. can anyone explain what is the problem for me? (I am still a new learner in the field)

@cooelf
Copy link
Contributor

cooelf commented Oct 15, 2023

Hi, did you try to conduct a unit test to see if it is possible to load a pre-trained model using huggingface?

My guess is that the memory is not enough for loading the model.

from transformers import T5ForConditionalGeneration

# you may also try to change "declare-lab/flan-alpaca-large" to "declare-lab/flan-alpaca-base" to see if it goes well.
model = T5ForConditionalGeneration.from_pretrained("declare-lab/flan-alpaca-large")

@Sosycs
Copy link
Author

Sosycs commented Oct 17, 2023

after a unit test by loading the model from huggingface.

(…)an-alpaca-large/resolve/main/config.json: 100%
787/787 [00:00<00:00, 56.3kB/s]
model.safetensors: 100%
3.13G/3.13G [00:16<00:00, 261MB/s]
(…)arge/resolve/main/generation_config.json: 100%
142/142 [00:00<00:00, 13.6kB/s]

I have changed the model from large to base but I have encountered the same:

2023-10-17 16:48:58.529434: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
args Namespace(data_root='/content/mm-cot/data', output_dir='/content/mm-cot/experiments', model='declare-lab/flan-alpaca-base', options=['A', 'B', 'C', 'D', 'E'], epoch=50, lr=5e-05, bs=2, input_len=512, output_len=512, eval_bs=4, eval_acc=None, train_split='train', val_split='val', test_split='test', use_generate=True, final_eval=False, user_msg='rationale', img_type='vit', eval_le=None, test_le=None, evaluate_dir=None, caption_file='data/instruct_captions.json', use_caption=True, prompt_format='QCM-E', seed=42)
====Input Arguments====
{
  "data_root": "/content/mm-cot/data",
  "output_dir": "/content/mm-cot/experiments",
  "model": "declare-lab/flan-alpaca-base",
  "options": [
    "A",
    "B",
    "C",
    "D",
    "E"
  ],
  "epoch": 50,
  "lr": 5e-05,
  "bs": 2,
  "input_len": 512,
  "output_len": 512,
  "eval_bs": 4,
  "eval_acc": null,
  "train_split": "train",
  "val_split": "val",
  "test_split": "test",
  "use_generate": true,
  "final_eval": false,
  "user_msg": "rationale",
  "img_type": "vit",
  "eval_le": null,
  "test_le": null,
  "evaluate_dir": null,
  "caption_file": "data/instruct_captions.json",
  "use_caption": true,
  "prompt_format": "QCM-E",
  "seed": 42
}
img_features size:  torch.Size([11208, 145, 1024])
number of train problems: 12726

number of val problems: 4241

number of test problems: 4241

[16:49:05] [Model]: Loading declare-lab/flan-alpaca-base...    

I am using google colab T4 with high RAM

@cooelf
Copy link
Contributor

cooelf commented May 19, 2024

The hanging may also be reasonable as the main process could be handling the data after loading the model (there is no signal for indicating the completion of model loading).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants