https://github.com/cqfn/aibolit
A Static Analyzer for Java Powered by Machine Learning: Identifies Anti-Patterns Begging for Refactoring
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A Static Analyzer for Java Powered by Machine Learning: Identifies Anti-Patterns Begging for Refactoring
Basic Info
- Host: GitHub
- Owner: cqfn
- License: mit
- Language: Java
- Default Branch: master
- Homepage: https://pypi.org/project/aibolit/
- Size: 155 MB
Statistics
- Stars: 87
- Watchers: 7
- Forks: 36
- Open Issues: 78
- Releases: 28
Topics
Metadata Files
README.md
ML-Based Static Analyzer for Java
Learn how Aibolit works in our White Paper.
First, you install it (you must have Python 3.11+ and Pip installed):
bash
pip3 install aibolit~=1.3.0
To analyze your Java sources, located at src/java (for example), run:
bash
aibolit check --filenames src/java/File.java src/java/AnotherFile.java
or
bash
aibolit recommend --filenames src/java/File.java src/java/AnotherFile.java
Also, you can set a folder with Java files:
bash
aibolit recommend --folder src/java
It will run recommendation function for the model (model is located in
aibolit/binary_files/model.pkl).
The model finds a pattern which contribution is the largest to the
Cyclomatic Complexity.
If anything is found, you will see all recommendations for the mentioned
patterns.
You can see the list of all patterns in
Patterns.md.
The output of recommendation will be redirected to the stdout.
If the program has the 0 exit code, it means that all analyzed files do
not have any issues.
If the program has the 1 exit code, it means that at least 1 analyzed file
has an issue.
If the program has the 2 exit code, it means that program crash occurred.
You can suppress certain patterns (comma separated value) and they will be ignored. They won't be included into the report, also their importance will be set to 0.
bash
aibolit recommend --folder src/java --suppress=P12,P13
You can change the format, using the --format parameter. The default value
is --format=compact.
bash
aibolit recommend --folder src/java --format=compact --full
It will output sorted patterns by importance in descending order and grouped by a pattern name:
text
Show all patterns
Configuration.java score: 127.67642529949538
Configuration.java[3840]: Var in the middle (P21: 30.95612931128819 1/4)
Configuration.java[3844]: Var in the middle (P21: 30.95612931128819 1/4)
Configuration.java[3848]: Var in the middle (P21: 30.95612931128819 1/4)
Configuration.java[2411]: Null Assignment (P28: 10.76 2/4)
Configuration.java[826]: Many primary constructors (P9: 10.76 3/4)
Configuration.java[840]: Many primary constructors (P9: 10.76 3/4)
Configuration.java[829]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[841]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[865]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[2586]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3230]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3261]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3727]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3956]: Partial synchronized (P14: 0.228 4/4)
ErrorExample.java: error when calculating patterns: Can't count P1 metric:
Total score: 127.67642529949538
(P21: 30.95612931128819 1/4) means the following:
text
30.95612931128819 is the score of this pattern
1 is the position of this pattern in the total list of patterns
found in the file 4 is the total number of found patterns
You can use format=long. In this case all results will be sorted by a
line number:
text
Show all patterns
Configuration.java: some issues found
Configuration.java score: 127.67642529949538
Configuration.java[826]: Many primary constructors (P9: 10.76 3/4)
Configuration.java[829]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[840]: Many primary constructors (P9: 10.76 3/4)
Configuration.java[841]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[865]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[2411]: Null Assignment (P28: 10.76 2/4)
Configuration.java[2586]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3230]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3261]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3727]: Partial synchronized (P14: 0.228 4/4)
Configuration.java[3840]: Var in the middle (P21: 30.95612931128819 1/4)
Configuration.java[3844]: Var in the middle (P21: 30.95612931128819 1/4)
Configuration.java[3848]: Var in the middle (P21: 30.95612931128819 1/4)
Configuration.java[3956]: Partial synchronized (P14: 0.228 4/4)
ErrorExample.java: error when calculating patterns: Can't count P1 metric:
MavenSlice.java: your code is perfect in aibolit's opinion
Total score: 127.67642529949538
You can also choose xml format. It will have the same format as compact
mode, but xml will be created:
xml
<report>
<score>127.67642529949538</score>
<!--Show all patterns-->
<files>
<file>
<path>Configuration.java</path>
<summary>Some issues found</summary>
<score>127.67642529949538</score>
<patterns>
<pattern code="P13">
<details>Null check</details>
<lines>
<number>294</number>
<number>391</number>
</lines>
<score>30.95612931128819</score>
<order>1/4</order>
</pattern>
<pattern code="P12">
<details>Non final attribute</details>
<lines>
<number>235</number>
</lines>
<score>10.76</score>
<order>2/4</order>
</pattern>
<pattern code="P21">
<details>Var in the middle</details>
<lines>
<number>235</number>
</lines>
<score>2.056</score>
<order>3/4</order>
</pattern>
<pattern code="P28">
<details>Null Assignment</details>
<lines>
<number>2411</number>
</lines>
<score>0.228</score>
<order>4/4</order>
</pattern>
</patterns>
</file>
<file>
<path>ErrorExample.java</path>
<summary>Error when calculating patterns: Can't count P1 metric:</summary>
</file>
<file>
<path>MavenSlice.java</path>
<summary>Your code is perfect in aibolit's opinion</summary>
</file>
</files>
</report>
The score is the relative importance of the pattern (there is no range for it). The larger score is, the most important pattern is. E.g., if you have several patterns, first you need to fix the pattern with the score 5.45:
text
SampleTests.java[43]: Non final attribute (P12: 5.45 1/10)
SampleTests.java[44]: Non final attribute (P12: 5.45 1/10)
SampleTests.java[80]: Var in the middle (P21: 3.71 2/10)
SampleTests.java[121]: Var in the middle (P21: 3.71 2/10)
SampleTests.java[122]: Var declaration distance for 5 lines (P20_5: 2.13 3/10)
SampleTests.java[41]: Non final class (P24: 1.95 4/10)
SampleTests.java[59]: Force Type Casting (P5: 1.45 5/10)
SampleTests.java[122]: Var declaration distance for 7 lines (P20_7: 1.07 6/10)
SampleTests.java[122]: Var declaration distance for 11 lines (P20_11: 0.78 7/10)
SampleTests.java[51]: Protected Method (P30: 0.60 8/10)
SampleTests.java[52]: Super Method (P18: 0.35 9/10)
SampleTests.java[100]: Partial synchronized (P14: 0.08 10/10)
SampleTests.java[106]: Partial synchronized (P14: 0.08 10/10)
SampleTests.java[113]: Partial synchronized (P14: 0.08 10/10)
The score per class is the sum of all patterns scores.
text
SampleTests.java score: 17.54698560768407
The total score is an average among all java files in a project (folder you've set to analyze)
text
Total average score: 4.0801854775508914
If you have 2 scores of different projects, the worst project is that one which has the highest score.
Model is automatically installed with aibolit package, but you can also try your own model
bash
aibolit recommend --folder src/java --model /mnt/d/some_folder/model.pkl
You can get full report with --full command, then all patterns will be
included to the output:
bash
aibolit recommend --folder src/java --full
You can exclude files with --exclude command.
You to set glob patterns to ignore:
bash
aibolit recommend --folder src/java \
--exclude=**/*Test*.java --exclude=**/*Impl*.java
If you need help, run
bash
aibolit recommend --help
How to retrain it?
Train command does the following:
- Calculates patterns and metrics
- Creates a dataset
- Trains model and save it
Train works only with cloned git repository.
- Clone aibolit repository
- Go to
cloned_aibolit_path - Run
pip install . - Set env variable
export HOME_AIBOLIT=cloned_aibolit_path(example for Linux). - Set env variable
TARGET_FOLDERif you need to save all dataset files to another directory. - You have to specify train and test dataset: set the
HOME_TRAIN_DATASETenvironment variable for train dataset and theHOME_TEST_DATASETenvironment variable for test dataset.
Usually, these files are in scripts/target/08 directory after dataset
collection (if you have not skipped it).
But you can use your own datasets.
Please notice, that if you set TARGET_FOLDER, your dataset files will be
in TARGET_FOLDER/target.
That is why it is necessary to
set HOMETRAINDATASET=TARGET_FOLDER\target\08\08-train.csv,
HOMETESTDATASET =TARGET_FOLDER\target\08\08-test.csv
7. If you need to set up own directory where model will be saved, set up also
SAVE_MODEL_FOLDER environment variable.
Otherwise model will be saved into
cloned_aibolit_path/aibolit/binary_files/model.pkl
8. If you need to set up own folder with Java files, use --java_folder
parameter, the default value will be scripts/target/01 of aibolit cloned
repo
Or you can use our docker image (link will be soon here)
Run train pipeline:
bash
aibolit train --java_folder=src/java [--max_classes=100] [--dataset_file]
If you need to save the dataset with all calculated metrics to a different
directory, you need to use dataset_file parameter
bash
aibolit train --java_folder=src/java --dataset_file /mnt/d/new_dir/dataset.csv
You can skip dataset collection with skip_collect_dataset parameter. In
this case
the model will be trained with predefined dataset (see 5 point):
bash
aibolit train --java_folder=src/java --skip_collect_dataset
How to contribute?
First, you need to install:
Install the following packages if you don't have them:
bash
apt-get install ruby-dev libz-dev libxml2
This project does not include a virtual environment by default. If you're using one (e.g., .venv, venv), update the .xcop file to exclude it:
bash
--exclude=.venv/**
After forking and editing the repo, verify the build is clean by running:
bash
make
To build white paper:
bash
cd wp
latexmk -c && latexmk -pdf wp.tex
If everything is fine, submit a pull request.
Using Docker recommendation pipeline
bash
docker run --rm -it \
-v <absolute_path_to_folder_with_classes>:/in \
-v <absolute_path_to_out_dir>:/out \
cqfn/aibolit-image
Owner
- Name: CQFN
- Login: cqfn
- Kind: organization
- Email: team@cqfn.org
- Website: https://www.cqfn.org
- Repositories: 22
- Profile: https://github.com/cqfn
Code Quality Foundation
GitHub Events
Total
- Create event: 3
- Commit comment event: 56
- Release event: 1
- Issues event: 148
- Watch event: 32
- Delete event: 6
- Issue comment event: 495
- Push event: 1,392
- Pull request review event: 205
- Pull request review comment event: 176
- Pull request event: 125
- Fork event: 15
Last Year
- Create event: 3
- Commit comment event: 56
- Release event: 1
- Issues event: 148
- Watch event: 32
- Delete event: 6
- Issue comment event: 495
- Push event: 1,392
- Pull request review event: 205
- Pull request review comment event: 176
- Pull request event: 125
- Fork event: 15
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Evgeny Maslov | l****r@g****m | 450 |
| Yaroslav Kishchenko | y****o@g****m | 247 |
| Vitaly Protasov | i****o@y****u | 183 |
| Anton Cheshkov | a****v@h****m | 114 |
| Evgeny Maslov | e****5@c****m | 81 |
| Yegor Bugayenko | y****6@g****m | 81 |
| ALEXEY ZORCHENKOV | z****y@h****m | 65 |
| paulodamaso | p****o@g****m | 57 |
| lukyanoffpashok | l****k@y****u | 55 |
| silverCase | s****p@g****m | 47 |
| Anton Siluev | b****7@y****u | 43 |
| Anton | a****v@g****m | 37 |
| Vitaly-Protasov | y****u@e****m | 31 |
| Pavel Lukianov | p****1@c****m | 18 |
| andrey gusev | 4****v | 18 |
| Vitaly-Protasov | 4****v | 10 |
| Andrey Gusev | g****a@g****m | 5 |
| Evgeniy.Maslov | E****v@a****m | 5 |
| silverCase | 5****e | 4 |
| dz-s | s****t@g****m | 3 |
| ALEXEY ZORCHENKOV | z****v@g****m | 3 |
| lyriccoder | E****v | 1 |
| Andrei Gusev | a****7@c****m | 1 |
| lukyanoffpashok | 4****k | 1 |
| Alexey Zorchenkov | z****r@1****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 109
- Total pull requests: 95
- Average time to close issues: about 2 months
- Average time to close pull requests: 15 days
- Total issue authors: 14
- Total pull request authors: 15
- Average comments per issue: 1.83
- Average comments per pull request: 4.21
- Merged pull requests: 68
- Bot issues: 0
- Bot pull requests: 16
Past Year
- Issues: 54
- Pull requests: 38
- Average time to close issues: 2 days
- Average time to close pull requests: about 18 hours
- Issue authors: 5
- Pull request authors: 6
- Average comments per issue: 1.63
- Average comments per pull request: 4.47
- Merged pull requests: 33
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- 0pdd (42)
- ivanovmg (32)
- aravij (18)
- yegor256 (11)
- literally-bug-creator (10)
- acheshkov (9)
- lyriccoder (8)
- AntonProkopyev (8)
- MMenshikh (3)
- DvrkRain (3)
- KatGarmash (2)
- g4s8 (2)
- Vitaly-Protasov (2)
- Error10556 (2)
- iliyasone (1)
Pull Request Authors
- ivanovmg (60)
- AntonProkopyev (23)
- aravij (18)
- dependabot[bot] (17)
- lyriccoder (13)
- dependabot-preview[bot] (10)
- acheshkov (4)
- literally-bug-creator (4)
- KachanovYev (3)
- Error10556 (3)
- Vitaly-Protasov (3)
- DvrkRain (2)
- MMenshikh (2)
- newspec (2)
- CAN4red (2)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- pmd:pmd 4.2.4
- commons-codec:commons-codec 1.11
- org.eclipse.jdt:org.eclipse.jdt.core 3.20.0
- junit:junit 4.12 test
- pmd:pmd 4.2.4
- bs4 ==0.0.1
- lxml ==4.5.0
- beautifulsoup4 ==4.8.2
- bs4 ==0.0.1
- cached-property ==1.2.0
- catboost ==0.22
- cchardet ==2.1.6
- codecov ==2.0.15
- coverage ==5.0.3
- dataclasses ==0.7
- deprecated ==1.2.10
- flake8 ==3.7.9
- javalang ==0.13.0
- lxml ==4.5.0
- matplotlib ==3.2.1
- mypy ==0.770
- networkx ==2.4
- numpy ==1.18.1
- pandas ==1.0.0
- pebble ==4.5.3
- scikit-learn ==0.23.2
- scipy ==1.4.1
- sphinx ==2.3.1
- tqdm ==4.32.1
- typing-extensions *