This page contains the baseline scores of finetuning microsoft_deberta-v3-base pretrained model over the 36 tasks, aggregated over runs of 20 random initializations.
avg | 20_newsgroup | ag_news | amazon_reviews_multi | anli | boolq | cb | cola | copa | dbpedia | esnli | financial_phrasebank | imdb | isear | mnli | mrpc | multirc | poem_sentiment | qnli | qqp | rotten_tomatoes | rte | sst2 | sst_5bins | stsb | trec_coarse | trec_fine | tweet_ev_emoji | tweet_ev_emotion | tweet_ev_hate | tweet_ev_irony | tweet_ev_offensive | tweet_ev_sentiment | wic | wnli | wsc | yahoo_answers | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mean | 79.04 | 86.41 | 90.44 | 66.86 | 58.78 | 82.99 | 75.00 | 86.57 | 58.40 | 79.43 | 91.93 | 84.48 | 94.49 | 71.86 | 89.78 | 89.20 | 62.26 | 86.73 | 93.51 | 91.79 | 90.42 | 82.35 | 95.06 | 56.98 | 90.28 | 97.76 | 91.02 | 46.19 | 83.95 | 56.21 | 79.82 | 85.06 | 71.80 | 71.21 | 70.21 | 64.09 | 72.03 |
std | 0.29 | 0.39 | 0.24 | 0.35 | 1.38 | 4.61 | 5.43 | 0.80 | 6.28 | 0.40 | 0.20 | 1.60 | 0.21 | 0.76 | 0.25 | 0.65 | 2.45 | 2.68 | 0.28 | 0.30 | 0.53 | 1.54 | 0.44 | 1.23 | 0.76 | 0.38 | 0.76 | 0.97 | 0.62 | 1.92 | 2.17 | 0.58 | 0.83 | 1.06 | 5.51 | 1.40 | 0.35 |
Download full repetitions table: csv