Welcome to model-recycling page

Hardly anyone trains from scratch anymore, we all finetune over a pretrained model.

Research slowly reaches consensus that some finetuned models are better base models than the pretrained models themselves.

This site presents a dynamic view of the best models to choose for a given model size and architecture. We follow the findings and methodology from our paper: We download finetuned models found in HuggingFace per architecture and efficiently rank them over a representative task. We then evaluate the top ranked models by finetuning over a large set of 36 target tasks, and report the average performance of each base model.

Tested so far: 2685 (and counting)

Best models per architectures

Pretrained	Best model	Avg.	Pretrained Avg.	Ranking
roberta-base	ibm/ColD-Fusion	78.47	76.22	link
bert-base-uncased	ibm/ColD-Fusion-bert-base-uncased-itr23-seed0	75.64	72.20	link
bert-base-cased	skim945/bert-finetuned-squad	74.43	72.43	link
t5-base	adit94/nlpcharade	78.23	75.45	link
google/t5-v1_1-base	shaiman12/flan-t5-base-samsum	78.18	68.82	link
microsoft/deberta-v3-base	sileod/deberta-v3-base-tasksource-nli	80.73	79.04	link

To learn more see our FAQ or read the paper. See detailed evaluation results on each architecture here. If you have any feedback or question please contact us.

This work was performed in IBM Research by Leshem Choshen, Elad Venezian, Shachar Don-Yehiya, Noam Slonim and Yoav Katz.