Model Recycling

This page contains the baseline scores of finetuning bert-base-uncased pretrained model over the 36 tasks, aggregated over runs of 20 random initializations.

	avg	20_newsgroup	ag_news	amazon_reviews_multi	anli	boolq	cb	cola	copa	dbpedia	esnli	financial_phrasebank	imdb	isear	mnli	mrpc	multirc	poem_sentiment	qnli	qqp	rotten_tomatoes	rte	sst2	sst_5bins	stsb	trec_coarse	trec_fine	tweet_ev_emoji	tweet_ev_emotion	tweet_ev_hate	tweet_ev_irony	tweet_ev_offensive	tweet_ev_sentiment	wic	wnli	wsc	yahoo_answers
mean	72.20	83.05	89.59	65.92	46.95	68.96	64.38	81.83	49.45	78.16	89.70	68.53	91.58	69.07	83.73	81.99	59.97	66.68	89.88	90.27	84.85	59.98	91.97	52.80	85.86	96.06	68.33	36.01	79.91	52.85	67.76	85.37	69.48	63.25	50.56	62.12	72.32
std	0.55	0.43	0.29	0.31	0.52	1.20	10.01	0.49	5.36	0.67	1.33	10.67	0.14	0.46	0.23	1.61	1.40	0.90	0.75	0.54	0.47	2.04	0.42	0.56	0.45	0.63	2.83	0.60	0.65	1.20	1.41	0.63	0.73	1.63	6.41	4.55	0.24

Download full repetitions table: csv

Home