This page contains the baseline scores of finetuning google_t5-v1_1-base pretrained model over the 36 tasks, aggregated over runs of 20 random initializations.
avg | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mean | 68.82 | 82.88 | 88.18 | 66.91 | 38.06 | 65.57 | 55.45 | 70.18 | 40.50 | 70.77 | 85.58 | 66.74 | 92.99 | 71.06 | 75.51 | 72.83 | 56.14 | 68.08 | 89.37 | 83.60 | 86.05 | 60.58 | 93.72 | 51.84 | 68.79 | 93.25 | 82.07 | 33.46 | 75.61 | 51.52 | 67.62 | 82.61 | 69.88 | 55.84 | 46.90 | 48.32 | 69.26 |
std | 3.03 | 18.29 | 1.96 | 0.31 | 8.24 | 12.47 | 30.90 | 3.20 | 18.54 | 15.35 | 15.33 | 28.95 | 0.90 | 6.36 | 19.60 | 18.57 | 2.18 | 23.38 | 10.08 | 8.94 | 9.78 | 8.16 | 1.48 | 9.51 | 30.15 | 19.01 | 23.22 | 13.37 | 15.15 | 2.78 | 11.19 | 3.37 | 4.69 | 13.97 | 13.86 | 17.86 | 15.83 |
Download full repetitions table: csv