(trl) [alex@compute-od-gpu-st-p4d-24xlarge-205 trl]$ accelerate launch --config_file configs/fsdp_config_local.yaml test_trl_accelerate.py Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7f589f413d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Parameter 'function'=<function <lambda> at 0x7f6e27aa0d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7f1c6fc74d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Parameter 'function'=<function <lambda> at 0x7f3decf69d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7fd92c634d30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7fb7d634ed30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 0%| | 0/25 [00:00<?, ?ba/s]Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1) Parameter 'function'=<function <lambda> at 0x7f70ae93bd30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed. 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 247.47ba/s]
Reusing dataset imdb (/home/alex/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 248.12ba/s]
Parameter 'function'=<function <lambda> at 0x7f3d2ca4ad30> of the transform [email protected] couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 243.81ba/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 250.15ba/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 241.42ba/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 244.92ba/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 241.98ba/s]
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/pipelines/text_classification.py:89: UserWarning: return_all_scores
is now deprecated, use top_k=1
if you want similar functionnality
warnings.warn(
DEVICE: cuda:5
DEVICE: cuda:1
DEVICE: cuda:4
DEVICE: cuda:7
DEVICE: cuda:6
DEVICE: cuda:2
DEVICE: cuda:3
DEVICE: cuda:0
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
tokenizers
before the fork if possiblewandb login --relogin
to force relogin
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:tokenizers
before the fork if possibletokenizers
before the fork if possibletokenizers
before the fork if possiblewandb offline
to turn off syncing.
wandb: Syncing run trl-test
wandb: ⭐️ View project at https://wandb.ai/dahoas/trl-test
wandb: 🚀 View run at https://wandb.ai/dahoas/trl-test/runs/3dxy07t9
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.2.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.0.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.10.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.7.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.5.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.weight', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.weight', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.0.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.10.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.1.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.2.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.0.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.10.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.3.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.7.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.5.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'v_head.summary.weight', 'v_head.summary.bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['v_head.summary.weight', 'transformer.h.4.attn.masked_bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'transformer.h.10.attn.masked_bias', 'transformer.h.1.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.8.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.0.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
0%| | 0/24895 [00:00<?, ?ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
0%| | 0/24895 [00:00<?, ?ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
0%| | 0/24895 [00:00<?, ?ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
0%|▌ | 104/24895 [00:00<00:23, 1038.64ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
1%|█▎ | 226/24895 [00:00<00:21, 1142.10ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
1%|██ | 359/24895 [00:00<00:20, 1206.24ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
2%|███▍ | 597/24895 [00:00<00:20, 1165.65ex/s]Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at lvwerra/gpt2-imdb and are newly initialized: ['transformer.h.10.attn.masked_bias', 'transformer.h.3.attn.masked_bias', 'v_head.summary.bias', 'transformer.h.11.attn.masked_bias', 'transformer.h.0.attn.masked_bias', 'transformer.h.7.attn.masked_bias', 'transformer.h.2.attn.masked_bias', 'v_head.summary.weight', 'transformer.h.8.attn.masked_bias', 'transformer.h.4.attn.masked_bias', 'transformer.h.9.attn.masked_bias', 'transformer.h.6.attn.masked_bias', 'transformer.h.5.attn.masked_bias', 'transformer.h.1.attn.masked_bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
5%|██████▊ | 1167/24895 [00:01<00:20, 1137.49ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
10%|██████████████▍ | 2498/24895 [00:02<00:19, 1171.65ex/s]Token indices sequence length is longer than the specified maximum sequence length for this model (1168 > 1024). Running this sequence through the model will result in indexing errors
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1128.46ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1131.39ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:21<00:00, 1131.74ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:21<00:00, 1134.49ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1127.52ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1102.52ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1131.45ex/s]
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24895/24895 [00:22<00:00, 1130.70ex/s]
NUM EPOCHS: 313
0it [00:00, ?it/s]Traceback (most recent call last):
Traceback (most recent call last):
File "test_trl_accelerate.py", line 141, in <module>
File "test_trl_accelerate.py", line 141, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "test_trl_accelerate.py", line 141, in <module>
File "test_trl_accelerate.py", line 141, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
File "test_trl_accelerate.py", line 141, in <module>
File "test_trl_accelerate.py", line 141, in <module>
File "test_trl_accelerate.py", line 141, in <module>
response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),
response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0),response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0), File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
response = gpt2_model.generate(query_tensors[i].unsqueeze(dim=0), File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
0it [00:05, ?it/s] File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) return func(*args, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate
return func(*args, **kwargs)return func(*args, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate
return func(*args, **kwargs)
return func(*args, **kwargs)return func(*args, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1320, in generate return self.sample( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample(return self.sample(
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample return self.sample(return self.sample(
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/generation_utils.py", line 1938, in sample outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self(outputs = self(
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl outputs = self( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)
File "/home/alex/trl/trl/gpt2.py", line 109, in forward File "/home/alex/trl/trl/gpt2.py", line 109, in forward return forward_call(*input, **kwargs) return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward File "/home/alex/trl/trl/gpt2.py", line 109, in forward transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer(transformer_outputs = self.transformer(
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) return forward_call(*input, **kwargs) File "/home/alex/trl/trl/gpt2.py", line 109, in forward File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward return forward_call(*input, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl hidden_states = self.ln_f(hidden_states)hidden_states = self.ln_f(hidden_states)
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl transformer_outputs = self.transformer( File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) hidden_states = self.ln_f(hidden_states) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl hidden_states = self.ln_f(hidden_states) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward return forward_call(*input, **kwargs)return forward_call(*input, **kwargs)
File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward File "/home/alex/.envs/trl/lib64/python3.8/site-packages/torch/nn/modules/normalization.py", line 189, in forward return forward_call(*input, **kwargs) File "/home/alex/.envs/trl/lib64/python3.8/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 917, in forward hidden_states = self.ln_f(hidden_states)return forward_call(*input, **kwargs)