RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory

(trl) [alex@compute-od-gpu-st-p4d-24xlarge-205 trl]$ accelerate launch --config_file configs/fsdp_config_local.yaml test_trl_accelerate.py Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7f589f413d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Parameter 'function'=<function <lambda> at 0x7f6e27aa0d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7f1c6fc74d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Parameter 'function'=<function <lambda> at 0x7f3decf69d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7fd92c634d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7fb7d634ed30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7f70ae93bd30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 247.47ba/s]

Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 248.12ba/s] Parameter 'function'=<function <lambda> at 0x7f3d2ca4ad30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 243.81ba/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 250.15ba/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 241.42ba/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 244.92ba/s] 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 241.98ba/s] /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( /home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores is now deprecated, use top_k=1 if you want similar functionnality warnings.warn( DEVICE: cuda:5 DEVICE: cuda:1 DEVICE: cuda:4 DEVICE: cuda:7 DEVICE: cuda:6 DEVICE: cuda:2 DEVICE: cuda:3 DEVICE: cuda:0 huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:

Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) wandb: Currently logged in as: dahoas. Use wandb login --relogin to force relogin huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks... To disable this warning, you can either:
Avoid using tokenizers before the fork if possible
Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false) wandb: Tracking run with wandb version 0.12.21 wandb: Run data is saved locally in /home/alex/trl/wandb/run-20220725_151337-3dxy07t9 wandb: Run wandb offline to turn off syncing. wandb: Syncing run trl-test wandb: ⭐️ View project at https://wandb.ai/dahoas/trl-test wandb: 🚀 View run at https://wandb.ai/dahoas/trl-test/runs/3dxy07t9 Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.2.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.0.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.10.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.7.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.5.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.weight', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.weight', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.0.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.10.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.1.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.2.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.0.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.10.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.7.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.5.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.weight', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.weight', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.0.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 0%| | 0/24895 [00:00<?, ?ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 0%| | 0/24895 [00:00<?, ?ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 0%| | 0/24895 [00:00<?, ?ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 0%|▌ | 104/24895 [00:00<00:23, 1038.64ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 1%|█▎ | 226/24895 [00:00<00:21, 1142.10ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 1%|██ | 359/24895 [00:00<00:20, 1206.24ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 2%|███▍ | 597/24895 [00:00<00:20, 1165.65ex/s]Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.10.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.1.attn.masked_bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 5%|██████▊ | 1167/24895 [00:01<00:20, 1137.49ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 10%|██████████████▍ | 2498/24895 [00:02<00:19, 1171.65ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1128.46ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1131.39ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:21<00:00, 1131.74ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:21<00:00, 1134.49ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1127.52ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1102.52ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1131.45ex/s] 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1130.70ex/s] NUM EPOCHS: 313 0it [00:00, ?it/s]Traceback (most recent call last): Traceback (most recent call last): File "test_trl_accelerate.py", line 141, in <module> File "test_trl_accelerate.py", line 141, in <module> Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "test_trl_accelerate.py", line 141, in <module> File "test_trl_accelerate.py", line 141, in <module> Traceback (most recent call last): Traceback (most recent call last): File "test_trl_accelerate.py", line 141, in <module> File "test_trl_accelerate.py", line 141, in <module> File "test_trl_accelerate.py", line 141, in <module> response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),

response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0), File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0), File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context

0it [00:05, ?it/s] File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) return func(*args, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate

    return func(*args, **kwargs)return func(*args, **kwargs)  File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate

return func(*args, **kwargs)

return func(*args, **kwargs)return func(*args, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate return self.sample( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample(return self.sample(

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample(return self.sample(

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self(outputs = self(

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)

File "/home/alex/trl/trl/gpt2.py", line 109, in forward File "/home/alex/trl/trl/gpt2.py", line 109, in forward return forward_call(*input, **kwargs) return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward File "/home/alex/trl/trl/gpt2.py", line 109, in forward transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer(transformer_outputs = self.transformer(

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward return forward_call(*input, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl hidden_states = self.ln_f(hidden_states)hidden_states = self.ln_f(hidden_states)

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl hidden_states = self.ln_f(hidden_states) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)

File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward return forward_call(*input, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward hidden_states = self.ln_f(hidden_states)return forward_call(*input, **kwargs)