This page contains the baseline scores of finetuning t5-base pretrained model over the 36 tasks, aggregated over runs of 20 random initializations.
avg | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mean | 75.45 | 85.12 | 89.42 | 66.54 | 47.05 | 76.66 | 75.54 | 81.91 | 49.65 | 76.41 | 89.72 | 85.30 | 92.33 | 71.28 | 83.80 | 85.66 | 60.28 | 74.42 | 90.38 | 88.94 | 88.61 | 73.68 | 93.84 | 55.55 | 85.31 | 97.21 | 92.33 | 44.88 | 79.51 | 52.74 | 73.74 | 84.03 | 70.21 | 67.19 | 55.35 | 60.00 | 71.59 |
std | 1.32 | 1.70 | 0.47 | 0.45 | 4.21 | 5.89 | 19.57 | 0.64 | 4.58 | 1.16 | 2.59 | 1.29 | 1.14 | 1.25 | 8.47 | 4.78 | 1.52 | 26.17 | 8.39 | 4.78 | 0.37 | 6.02 | 0.46 | 0.59 | 9.02 | 2.66 | 0.70 | 2.82 | 9.72 | 1.06 | 8.74 | 0.57 | 0.39 | 3.91 | 2.33 | 5.01 | 0.39 |
Download full repetitions table: csv