https://github.com/cqfn/aibolit

A Static Analyzer for Java Powered by Machine Learning: Identifies Anti-Patterns Begging for Refactoring

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Keywords

code-quality machine-learning machine-learning-algorithms python quality-control refactoring static-analysis

Keywords from Contributors

refactorings quality

Last synced: 5 months ago · JSON representation

Repository

A Static Analyzer for Java Powered by Machine Learning: Identifies Anti-Patterns Begging for Refactoring

Basic Info

Host: GitHub
Owner: cqfn
License: mit
Language: Java
Default Branch: master
Homepage: https://pypi.org/project/aibolit/
Size: 155 MB

Statistics

Stars: 87
Watchers: 7
Forks: 36
Open Issues: 78
Releases: 28

Topics

code-quality machine-learning machine-learning-algorithms python quality-control refactoring static-analysis

Created about 6 years ago · Last pushed 6 months ago

Metadata Files

Readme License

ML-Based Static Analyzer for Java

Learn how Aibolit works in our White Paper.

First, you install it (you must have Python 3.11+ and Pip installed):

bash pip3 install aibolit~=1.3.0

To analyze your Java sources, located at src/java (for example), run:

bash aibolit check --filenames src/java/File.java src/java/AnotherFile.java

bash aibolit recommend --filenames src/java/File.java src/java/AnotherFile.java

Also, you can set a folder with Java files:

bash aibolit recommend --folder src/java

It will run recommendation function for the model (model is located in aibolit/binary_files/model.pkl). The model finds a pattern which contribution is the largest to the Cyclomatic Complexity. If anything is found, you will see all recommendations for the mentioned patterns. You can see the list of all patterns in Patterns.md. The output of recommendation will be redirected to the stdout. If the program has the 0 exit code, it means that all analyzed files do not have any issues. If the program has the 1 exit code, it means that at least 1 analyzed file has an issue. If the program has the 2 exit code, it means that program crash occurred.

You can suppress certain patterns (comma separated value) and they will be ignored. They won't be included into the report, also their importance will be set to 0.

bash aibolit recommend --folder src/java --suppress=P12,P13

You can change the format, using the --format parameter. The default value is --format=compact.

bash aibolit recommend --folder src/java --format=compact --full

It will output sorted patterns by importance in descending order and grouped by a pattern name:

text Show all patterns Configuration.java score: 127.67642529949538 Configuration.java[3840]: Var in the middle (P21: 30.95612931128819 1/4) Configuration.java[3844]: Var in the middle (P21: 30.95612931128819 1/4) Configuration.java[3848]: Var in the middle (P21: 30.95612931128819 1/4) Configuration.java[2411]: Null Assignment (P28: 10.76 2/4) Configuration.java[826]: Many primary constructors (P9: 10.76 3/4) Configuration.java[840]: Many primary constructors (P9: 10.76 3/4) Configuration.java[829]: Partial synchronized (P14: 0.228 4/4) Configuration.java[841]: Partial synchronized (P14: 0.228 4/4) Configuration.java[865]: Partial synchronized (P14: 0.228 4/4) Configuration.java[2586]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3230]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3261]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3727]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3956]: Partial synchronized (P14: 0.228 4/4) ErrorExample.java: error when calculating patterns: Can't count P1 metric: Total score: 127.67642529949538

(P21: 30.95612931128819 1/4) means the following:

text 30.95612931128819 is the score of this pattern 1 is the position of this pattern in the total list of patterns found in the file 4 is the total number of found patterns

You can use format=long. In this case all results will be sorted by a line number:

text Show all patterns Configuration.java: some issues found Configuration.java score: 127.67642529949538 Configuration.java[826]: Many primary constructors (P9: 10.76 3/4) Configuration.java[829]: Partial synchronized (P14: 0.228 4/4) Configuration.java[840]: Many primary constructors (P9: 10.76 3/4) Configuration.java[841]: Partial synchronized (P14: 0.228 4/4) Configuration.java[865]: Partial synchronized (P14: 0.228 4/4) Configuration.java[2411]: Null Assignment (P28: 10.76 2/4) Configuration.java[2586]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3230]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3261]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3727]: Partial synchronized (P14: 0.228 4/4) Configuration.java[3840]: Var in the middle (P21: 30.95612931128819 1/4) Configuration.java[3844]: Var in the middle (P21: 30.95612931128819 1/4) Configuration.java[3848]: Var in the middle (P21: 30.95612931128819 1/4) Configuration.java[3956]: Partial synchronized (P14: 0.228 4/4) ErrorExample.java: error when calculating patterns: Can't count P1 metric: MavenSlice.java: your code is perfect in aibolit's opinion Total score: 127.67642529949538

You can also choose xml format. It will have the same format as compact mode, but xml will be created:

xml <report> <score>127.67642529949538</score>  <files> <file> <path>Configuration.java</path> <summary>Some issues found</summary> <score>127.67642529949538</score> <patterns> <pattern code="P13"> <details>Null check</details> <lines> <number>294</number> <number>391</number> </lines> <score>30.95612931128819</score> <order>1/4</order> </pattern> <pattern code="P12"> <details>Non final attribute</details> <lines> <number>235</number> </lines> <score>10.76</score> <order>2/4</order> </pattern> <pattern code="P21"> <details>Var in the middle</details> <lines> <number>235</number> </lines> <score>2.056</score> <order>3/4</order> </pattern> <pattern code="P28"> <details>Null Assignment</details> <lines> <number>2411</number> </lines> <score>0.228</score> <order>4/4</order> </pattern> </patterns> </file> <file> <path>ErrorExample.java</path> <summary>Error when calculating patterns: Can't count P1 metric:</summary> </file> <file> <path>MavenSlice.java</path> <summary>Your code is perfect in aibolit's opinion</summary> </file> </files> </report>

The score is the relative importance of the pattern (there is no range for it). The larger score is, the most important pattern is. E.g., if you have several patterns, first you need to fix the pattern with the score 5.45:

text SampleTests.java[43]: Non final attribute (P12: 5.45 1/10) SampleTests.java[44]: Non final attribute (P12: 5.45 1/10) SampleTests.java[80]: Var in the middle (P21: 3.71 2/10) SampleTests.java[121]: Var in the middle (P21: 3.71 2/10) SampleTests.java[122]: Var declaration distance for 5 lines (P20_5: 2.13 3/10) SampleTests.java[41]: Non final class (P24: 1.95 4/10) SampleTests.java[59]: Force Type Casting (P5: 1.45 5/10) SampleTests.java[122]: Var declaration distance for 7 lines (P20_7: 1.07 6/10) SampleTests.java[122]: Var declaration distance for 11 lines (P20_11: 0.78 7/10) SampleTests.java[51]: Protected Method (P30: 0.60 8/10) SampleTests.java[52]: Super Method (P18: 0.35 9/10) SampleTests.java[100]: Partial synchronized (P14: 0.08 10/10) SampleTests.java[106]: Partial synchronized (P14: 0.08 10/10) SampleTests.java[113]: Partial synchronized (P14: 0.08 10/10)

The score per class is the sum of all patterns scores.

text SampleTests.java score: 17.54698560768407

The total score is an average among all java files in a project (folder you've set to analyze)

text Total average score: 4.0801854775508914

If you have 2 scores of different projects, the worst project is that one which has the highest score.

Model is automatically installed with aibolit package, but you can also try your own model

bash aibolit recommend --folder src/java --model /mnt/d/some_folder/model.pkl

You can get full report with --full command, then all patterns will be included to the output:

bash aibolit recommend --folder src/java --full

You can exclude files with --exclude command. You to set glob patterns to ignore:

bash aibolit recommend --folder src/java \ --exclude=**/*Test*.java --exclude=**/*Impl*.java

If you need help, run

bash aibolit recommend --help

How to retrain it?

Train command does the following:

Calculates patterns and metrics
Creates a dataset
Trains model and save it

Train works only with cloned git repository.

Clone aibolit repository
Go to cloned_aibolit_path
Run pip install .
Set env variable export HOME_AIBOLIT=cloned_aibolit_path (example for Linux).
Set env variable TARGET_FOLDER if you need to save all dataset files to another directory.
You have to specify train and test dataset: set the HOME_TRAIN_DATASET environment variable for train dataset and the HOME_TEST_DATASET environment variable for test dataset.

Usually, these files are in scripts/target/08 directory after dataset collection (if you have not skipped it). But you can use your own datasets.

Please notice, that if you set TARGET_FOLDER, your dataset files will be in TARGET_FOLDER/target. That is why it is necessary to set HOMETRAINDATASET=TARGET_FOLDER\target\08\08-train.csv, HOMETESTDATASET =TARGET_FOLDER\target\08\08-test.csv 7. If you need to set up own directory where model will be saved, set up also SAVE_MODEL_FOLDER environment variable. Otherwise model will be saved into cloned_aibolit_path/aibolit/binary_files/model.pkl 8. If you need to set up own folder with Java files, use --java_folder parameter, the default value will be scripts/target/01 of aibolit cloned repo

Or you can use our docker image (link will be soon here)

Run train pipeline:

bash aibolit train --java_folder=src/java [--max_classes=100] [--dataset_file]

If you need to save the dataset with all calculated metrics to a different directory, you need to use dataset_file parameter

bash aibolit train --java_folder=src/java --dataset_file /mnt/d/new_dir/dataset.csv

You can skip dataset collection with skip_collect_dataset parameter. In this case the model will be trained with predefined dataset (see 5 point):

bash aibolit train --java_folder=src/java --skip_collect_dataset

How to contribute?

First, you need to install:

Install the following packages if you don't have them:

bash apt-get install ruby-dev libz-dev libxml2

This project does not include a virtual environment by default. If you're using one (e.g., .venv, venv), update the .xcop file to exclude it:

bash --exclude=.venv/**

After forking and editing the repo, verify the build is clean by running:

bash make

To build white paper:

bash cd wp latexmk -c && latexmk -pdf wp.tex

If everything is fine, submit a pull request.

Using Docker recommendation pipeline

bash docker run --rm -it \ -v <absolute_path_to_folder_with_classes>:/in \ -v <absolute_path_to_out_dir>:/out \ cqfn/aibolit-image

Owner

Name: CQFN
Login: cqfn
Kind: organization
Email: team@cqfn.org

Website: https://www.cqfn.org
Repositories: 22
Profile: https://github.com/cqfn

Code Quality Foundation

GitHub Events

Total

Create event: 3
Commit comment event: 56
Release event: 1
Issues event: 148
Watch event: 32
Delete event: 6
Issue comment event: 495
Push event: 1,392
Pull request review event: 205
Pull request review comment event: 176
Pull request event: 125
Fork event: 15

Last Year

Create event: 3
Commit comment event: 56
Release event: 1
Issues event: 148
Watch event: 32
Delete event: 6
Issue comment event: 495
Push event: 1,392
Pull request review event: 205
Pull request review comment event: 176
Pull request event: 125
Fork event: 15

Committers

Last synced: about 2 years ago

All Time

Total Commits: 1,561
Total Committers: 25
Avg Commits per committer: 62.44
Development Distribution Score (DDS): 0.712

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Evgeny Maslov	l**r@g**m	450
Yaroslav Kishchenko	y**o@g**m	247
Vitaly Protasov	i**o@y**u	183
Anton Cheshkov	a**v@h**m	114
Evgeny Maslov	e**5@c**m	81
Yegor Bugayenko	y**6@g**m	81
ALEXEY ZORCHENKOV	z**y@h**m	65
paulodamaso	p**o@g**m	57
lukyanoffpashok	l**k@y**u	55
silverCase	s**p@g**m	47
Anton Siluev	b**7@y**u	43
Anton	a**v@g**m	37
Vitaly-Protasov	y**u@e**m	31
Pavel Lukianov	p**1@c**m	18
andrey gusev	4****v	18
Vitaly-Protasov	4****v	10
Andrey Gusev	g**a@g**m	5
Evgeniy.Maslov	E**v@a**m	5
silverCase	5****e	4
dz-s	s**t@g**m	3
ALEXEY ZORCHENKOV	z**v@g**m	3
lyriccoder	E****v	1
Andrei Gusev	a**7@c**m	1
lukyanoffpashok	4****k	1
Alexey Zorchenkov	z**r@1**m	1

Committer Domains (Top 20 + Academic)

china.huawei.com: 3 huawei.com: 2 yandex.ru: 2 163.com: 1 artezio.com: 1 ya.ru: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 109
Total pull requests: 95
Average time to close issues: about 2 months
Average time to close pull requests: 15 days
Total issue authors: 14
Total pull request authors: 15
Average comments per issue: 1.83
Average comments per pull request: 4.21
Merged pull requests: 68
Bot issues: 0
Bot pull requests: 16

Past Year

Issues: 54
Pull requests: 38
Average time to close issues: 2 days
Average time to close pull requests: about 18 hours
Issue authors: 5
Pull request authors: 6
Average comments per issue: 1.63
Average comments per pull request: 4.47
Merged pull requests: 33
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

0pdd (42)
ivanovmg (32)
aravij (18)
yegor256 (11)
literally-bug-creator (10)
acheshkov (9)
lyriccoder (8)
AntonProkopyev (8)
MMenshikh (3)
DvrkRain (3)
KatGarmash (2)
g4s8 (2)
Vitaly-Protasov (2)
Error10556 (2)
iliyasone (1)

Pull Request Authors

ivanovmg (60)
AntonProkopyev (23)
aravij (18)
dependabot[bot] (17)
lyriccoder (13)
dependabot-preview[bot] (10)
acheshkov (4)
literally-bug-creator (4)
KachanovYev (3)
Error10556 (3)
Vitaly-Protasov (3)
DvrkRain (2)
MMenshikh (2)
newspec (2)
CAN4red (2)

Top Labels

Issue Labels

bug (44) good-title (32) pdd (11) help wanted (10) role/DEV (6) Extract method (6) good first issue (4) scope (3) enhancement (2) documentation (1) discussion (1) refactoring (1)

Pull Request Labels

dependencies (27) python (16) Extract method (7) role/REV (2) scope (1)

Dependencies

aibolit/metrics/cc/pom.xml maven

pmd:pmd 4.2.4

aibolit/metrics/halsteadvolume/pom.xml maven

commons-codec:commons-codec 1.11
org.eclipse.jdt:org.eclipse.jdt.core 3.20.0
junit:junit 4.12 test

aibolit/metrics/npath/pom.xml maven

pmd:pmd 4.2.4

aibolit/metrics/cc/requirements.txt pypi

bs4 ==0.0.1
lxml ==4.5.0

requirements.txt pypi

beautifulsoup4 ==4.8.2
bs4 ==0.0.1
cached-property ==1.2.0
catboost ==0.22
cchardet ==2.1.6
codecov ==2.0.15
coverage ==5.0.3
dataclasses ==0.7
deprecated ==1.2.10
flake8 ==3.7.9
javalang ==0.13.0
lxml ==4.5.0
matplotlib ==3.2.1
mypy ==0.770
networkx ==2.4
numpy ==1.18.1
pandas ==1.0.0
pebble ==4.5.3
scikit-learn ==0.23.2
scipy ==1.4.1
sphinx ==2.3.1
tqdm ==4.32.1
typing-extensions *

setup.py pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cqfn/aibolit

Science Score: 26.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

ML-Based Static Analyzer for Java

How to retrain it?

How to contribute?

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies