Ranking and performance of all 140 ranked google_t5-v1_1-base models (full table). The top 45 models were fully tested.
Notes:
- The baseline results can be found here
- While the average improvement is small, many datasets show large gains
model_name | avg | mnli_lp | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
baseline | google/t5-v1_1-base | 68.82 | nan | 82.88 | 88.18 | 66.91 | 38.06 | 65.57 | 55.45 | 70.18 | 40.50 | 70.77 | 85.58 | 66.74 | 92.99 | 71.06 | 75.51 | 72.83 | 56.14 | 68.08 | 89.37 | 83.60 | 86.05 | 60.58 | 93.72 | 51.84 | 68.79 | 93.25 | 82.07 | 33.46 | 75.61 | 51.52 | 67.62 | 82.61 | 69.88 | 55.84 | 46.90 | 48.32 | 69.26 |
1 | shaiman12/flan-t5-base-samsum | 78.18 | 84.77 | 86.92 | 89.87 | 66.60 | 52.91 | 82.29 | 80.36 | 80.35 | 67.00 | 76.53 | 90.20 | 86.40 | 93.26 | 72.95 | 87.19 | 89.46 | 62.27 | 84.62 | 93.39 | 89.49 | 90.06 | 85.56 | 94.38 | 57.92 | 89.60 | 97.40 | 92.60 | 46.85 | 80.86 | 48.62 | 74.62 | 83.84 | 70.95 | 70.06 | 56.34 | 69.23 | 73.43 |
2 | shaiman12/flan-t5-base-samsum | 78.18 | 0.00 | 86.92 | 89.87 | 66.60 | 52.91 | 82.29 | 80.36 | 80.35 | 67.00 | 76.53 | 90.20 | 86.40 | 93.26 | 72.95 | 87.19 | 89.46 | 62.27 | 84.62 | 93.39 | 89.49 | 90.06 | 85.56 | 94.38 | 57.92 | 89.60 | 97.40 | 92.60 | 46.85 | 80.86 | 48.62 | 74.62 | 83.84 | 70.95 | 70.06 | 56.34 | 69.23 | 73.43 |
3 | emozilla/flan-t5-base-sat-reading-comprehension | 78.07 | 85.01 | 87.11 | 89.97 | 86.90 | 52.75 | 82.29 | 78.57 | 80.35 | 69.00 | 76.83 | 52.75 | 46.49 | 93.27 | 72.82 | 93.45 | 87.50 | 62.27 | 88.46 | 86.64 | 89.71 | 89.59 | 56.34 | 67.14 | 93.46 | 89.61 | 97.80 | 92.40 | 80.93 | 50.91 | 76.28 | 83.95 | 70.74 | 87.10 | 69.75 | 90.01 | 64.42 | 72.87 |
4 | google/flan-t5-base | 77.98 | 84.92 | 86.22 | 89.67 | 67.12 | 51.97 | 82.32 | 78.57 | 80.15 | 75.00 | 77.67 | 90.95 | 85.40 | 93.32 | 72.43 | 87.25 | 89.46 | 62.38 | 82.69 | 92.79 | 89.77 | 89.02 | 84.84 | 94.38 | 57.29 | 89.48 | 97.20 | 92.80 | 46.85 | 80.23 | 54.98 | 76.66 | 84.30 | 70.64 | 70.06 | 56.34 | 53.85 | 73.40 |
5 | shri07/babi_qa | 77.87 | 84.92 | 87.28 | 89.93 | 87.30 | 53.09 | 82.17 | 76.79 | 81.02 | 71.00 | 76.27 | 53.09 | 47.24 | 93.24 | 73.27 | 93.36 | 85.54 | 62.25 | 86.54 | 84.48 | 89.79 | 89.31 | 60.56 | 67.20 | 94.38 | 89.57 | 97.60 | 92.20 | 81.56 | 49.26 | 72.58 | 83.49 | 70.92 | 87.40 | 69.75 | 90.26 | 59.62 | 74.00 |
6 | andreaparker/flan-t5-base-samsum | 77.86 | 84.76 | 86.43 | 89.83 | 67.10 | 52.59 | 82.17 | 80.36 | 80.54 | 66.00 | 76.50 | 90.89 | 86.70 | 93.04 | 71.64 | 87.25 | 88.73 | 62.13 | 91.35 | 93.30 | 89.14 | 89.59 | 84.48 | 93.58 | 56.97 | 89.37 | 97.40 | 93.00 | 46.33 | 81.63 | 51.48 | 74.74 | 84.77 | 69.88 | 67.87 | 56.34 | 57.69 | 72.30 |
7 | talhaa/flant5 | 77.86 | 84.74 | 87.07 | 89.53 | 67.14 | 52.19 | 82.84 | 78.57 | 80.15 | 70.00 | 77.27 | 90.70 | 84.90 | 93.51 | 72.49 | 87.48 | 86.27 | 61.84 | 87.50 | 93.12 | 90.72 | 89.68 | 85.92 | 93.81 | 56.56 | 89.44 | 97.40 | 91.60 | 47.05 | 80.51 | 52.59 | 74.87 | 84.77 | 71.76 | 68.81 | 56.34 | 55.77 | 72.63 |
8 | mrm8488/flan-t5-base-finetuned-gsm8k | 77.84 | 84.55 | 86.62 | 89.57 | 66.84 | 53.12 | 82.69 | 78.57 | 79.96 | 67.00 | 75.93 | 90.36 | 87.40 | 93.31 | 72.29 | 87.93 | 88.97 | 61.96 | 86.54 | 93.08 | 90.07 | 89.21 | 84.84 | 94.15 | 56.70 | 89.40 | 97.20 | 90.80 | 47.34 | 81.77 | 49.76 | 78.32 | 85.35 | 71.16 | 69.44 | 56.34 | 55.77 | 72.63 |
9 | ybagoury/flan-t5-base-tldr_news | 77.83 | 84.80 | 86.79 | 89.90 | 66.70 | 51.44 | 81.96 | 76.79 | 81.21 | 70.00 | 77.23 | 90.98 | 87.90 | 93.43 | 73.01 | 87.17 | 87.75 | 61.82 | 84.62 | 93.34 | 90.29 | 89.49 | 83.75 | 94.27 | 57.15 | 89.67 | 97.20 | 92.80 | 47.40 | 80.37 | 50.00 | 75.77 | 83.72 | 71.26 | 68.81 | 56.34 | 58.65 | 72.97 |
10 | spacemanidol/flan-t5-base-cnndm | 77.75 | 84.22 | 85.91 | 89.83 | 66.98 | 51.38 | 81.93 | 80.36 | 80.54 | 64.00 | 76.63 | 90.38 | 84.70 | 93.20 | 72.82 | 86.65 | 89.22 | 61.61 | 89.42 | 93.34 | 89.85 | 89.49 | 80.51 | 94.04 | 55.66 | 88.89 | 97.80 | 91.40 | 46.51 | 80.79 | 50.64 | 75.51 | 84.42 | 70.10 | 68.65 | 56.34 | 66.35 | 73.30 |
Download full models ranking table: csv