D2A Leaderboard

Vulnerability or Defect detection is a major problem in software engineering. Of late, many new ML models have been proposed to solve this problem. To help research in this area, we recently released the D2A dataset which is based on Infer Static Analyzer bug reports for real-world C programming language data. D2A goes one step further and performs a differential analysis by comparing the before (potential vulnerability) and after (fix of the vulnerability) versions to make an assessment and label the before version as vulnerable if it detects a change compared to the after version. Through this leaderboard we explore multiple ways to solve the vulnerability detection problem by identifying the real vulnerabilities amongst the many possibilities generated by static analysis.

From the D2A dataset we have extracted three types of data:
1. Infer Bug Reports (Trace)
2. Bug function source code (Function)
3. Bug function source code, trace functions source code and bug function file URL (Code).

From these three types of data we have created 4 real-bug prediction tasks:
1. Code + Trace
2. Trace
3. Code
4. Function.

Since the Function dataset is balanced, we use Accuracy to evaluate the results. For the other unbalanced datasets we use F1 and AUROC. More details on the data and the metrics can be found here.
To correctly identify a defect or vulnerability we believe that the model must be able to extract features from different types of data. And since most vulnerabilities are spread across multiple functions, it's also important for the model to be able to perform inter-procedural feature extraction. The Trace data is a combination of natural language and source code. Code data contains source code for the multiple functions found in the Trace. This leaderboard should demonstrate the success of models to detect the real vulnerabilities in each of these tasks.

Overall Leaderboard

					Code + Trace		Trace		Code		Function	Overall Score
Rank	Model	Team	Organization	Date	F1	AUC	F1	AUC	F1	AUC	Accuracy	Average
1	AugSA-S	AI4VA	IBM Research	03/26/2021	63.4	83.6	61.1	81.2	65.8	85.2	55.2	70.8
2	C-BERT	AI4VA	IBM Research	03/26/2021	66.1	81.7	62.4	80.4	62.4	80.2	60.2	70.5
3	AugSA-V	AI4VA	IBM Research	03/26/2021	64.3	85.0	61.3	80.2	65.2	85.7	45.6	69.6
-	VulBERTa-MLP	ML4CSec	Imperial College London	11/10/2021	-	-	-	-	-	-	62.3	-
-	VulBERTa-CNN	ML4CSec	Imperial College London	11/10/2021	-	-	-	-	-	-	60.7	-

Rank	Model	Team	Organization	Date	F1	AUC	Average
1	AugSA-V	AI4VA	IBM Research	03/26/2021	64.3	85.0	74.6
2	C-BERT	AI4VA	IBM Research	03/26/2021	66.1	81.7	73.9
3	AugSA-S	AI4VA	IBM Research	03/26/2021	63.4	83.6	73.5

Rank	Model	Team	Organization	Date	F1	AUC	Average
1	C-BERT	AI4VA	IBM Research	03/26/2021	62.4	80.4	71.4
2	AugSA-S	AI4VA	IBM Research	03/26/2021	61.1	81.2	71.1
3	AugSA-V	AI4VA	IBM Research	03/26/2021	61.3	80.2	70.7

Rank	Model	Team	Organization	Date	F1	AUC	Average
1	AugSA-S	AI4VA	IBM Research	03/26/2021	65.8	85.2	75.5
2	AugSA-V	AI4VA	IBM Research	03/26/2021	65.2	85.7	75.4
3	C-BERT	AI4VA	IBM Research	03/26/2021	62.4	80.2	71.3

Rank	Model	Team	Organization	Date	Accuracy
1	VulBERTa-MLP	ML4CSec	Imperial College London	11/10/2021	62.3
2	VulBERTa-CNN	ML4CSec	Imperial College London	11/10/2021	60.7
3	C-BERT	AI4VA	IBM Research	03/26/2021	60.2
4	AugSA-S	AI4VA	IBM Research	03/26/2021	55.2
5	AugSA-V	AI4VA	IBM Research	03/26/2021	45.6

Submission Instructions

The expected output is a probability score for each example of the test/dev set. The probability score is that probability that the example has label 1. Once your model is fully trained, you can check your model's performance on the dev set using the evaluation script here. To get the performance on test set follow the below steps:

Generate your prediction output for the dev set.
Evaluate dev set predictions aaccording to the evaluation script from the link above.
Generate your prediction output for the test set.
Submit the following information by emailing to saurabh.pujar@ibm.com

Your email must include:

Prediction results on test set.
Individual/Team Name: Name of the individual or the team to appear in the leaderboard.
Model information: Name of the model/technique to appear in the leaderboard.

We recommend your email should also include:

Prediction results on dev set.
Individual/Team Institution: Name of the institution of the individual or the team to appear in the leaderboard.
Model code: Training code for the model.
Publication Information: Name, Citation, URL of the paper if model is from a published work to appear in the leaderboard.

Overall Leaderboard

Code + Trace Leaderboard

Trace Leaderboard

Code Leaderboard

Function Leaderboard

Submission Instructions

Relevant Links

Other Links

How to cite