cam

https://github.com/ruslangaliullin/cam

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: ieee.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: RuslanGaliullin
Language: Jupyter Notebook
Default Branch: main
Size: 66 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

This is a dataset of open source Java classes and some metrics on them. Every now and then I make a new version of it using the scripts in this repository. You are welcome to use it in your researches. Each release has a fixed version. By referring to it in your research you avoid ambiguity and guarantees repeatability of your experiments.

This is a more formal explanation of this project: in PDF.

The latest ZIP archive with the dataset is here: cam-2023-10-22.zip (2.19Gb). There are 33 metrics calculated for 862,517 Java classes from 1000 GitHub repositories, including: lines of code (reported by cloc); NCSS; cyclomatic and cognitive complexity (by PMD); Halstead volume, effort, and difficulty; maintainability index; number of attributes, constructors, methods; and others (see PDF).

Previous archives (took me a few days to build each of them, using a pretty big machine):

cam-2023-10-22.zip (2.19Gb): 1000 repos, 33 metrics, 863K classes
cam-2023-10-11.zip (3Gb): 959 repos, 29 metrics, 840K classes
cam-2021-08-04.zip (692Mb): 1000 repos, 15 metrics
cam-2021-07-08.zip (387Mb): 1000 repos, 11 metrics

If you want to create a new dataset, just run the following command and the entire dataset will be built in the current directory (you need to have Docker installed), where 1000 is the number of repositories to fetch from GitHub and XXX is your personal access token:

bash $ docker run --detach --name=cam --rm --volume "$(pwd):/dataset" \ -e "TOKEN=XXX" -e "TOTAL=1000" -e "TARGET=/dataset" \ yegor256/cam:0.8.1 "make -e >/dataset/make.log 2>&1"

This command will create a new Docker container, running in the background. (run docker ps -a, in order to see it). If you want to run docker interactively and see all the logs, you can just disable detached mode by removing the --detach option from the command.

The dataset will be created in the current directory (may take some time, maybe a few days!), and a .zip archive will also be there. Docker container will run in the background: you can safely close the console and come back when the dataset is ready and the container is deleted.

If the script fails at some point, you can restart it again, without deleting previously created files. The process is incremental --- it will understand where it stopped before.

You can also run it without Docker:

bash $ make wipe $ make TOTAL=100

Should work, if you have all the dependencies installed, as suggested in the Dockerfile.

In order to analyze just a single repository, do this (yegor256/tojos as an example):

bash $ make wipe $ make REPO=yegor256/tojos

How to Calculate Additional Metrics

You may want to use this dataset as a basis, with an intend of adding your own metrics on top of it. It should be easy:

Clone this repo into cam/ directory
Download ZIP archive
Unpack it to the cam/dataset/ directory
Add a new script to the cam/metrics/ directory (use ast.py as an example)
Delete all other files except yours from the cam/metrics/ directory
Run make in the cam/ directory: sudo make install; make all

The make should understand that a new metric was added. It will apply this new metric to all .java files, generate new .csv reports, aggregate them with existing reports (in the cam/dataset/data/ directory), and then the final .pdf report will also be updated.

How to Contribute

Fork repository, make changes, send us a pull request. We will review your changes and apply them to the master branch shortly, provided they don't violate our quality standards. To avoid frustration, before sending us your pull request please run full build:

bash $ sudo make install $ make test

This should take a few minutes to complete, without errors.

Owner

Name: rmgaliullin
Login: RuslanGaliullin
Kind: user

Repositories: 2
Profile: https://github.com/RuslanGaliullin

@R_Galiullin - tg

GitHub Events

Total

Push event: 2

Last Year

Push event: 2

Dependencies

.github/workflows/latexmk.yml actions

JamesIves/github-pages-deploy-action v4.4.3 composite
actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
yegor256/latexmk-action 0.8.1 composite

.github/workflows/make.yml actions

actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
yegor256/cam master composite

.github/workflows/up.yml actions

actions/checkout b4ffde65f46336ab88eb53be808477a3936bae11 composite
peter-evans/create-pull-request v5 composite

action.yml actions

Dockerfile * docker

Dockerfile docker

yegor256/cam latest build

fixtures/jaxec/pom.xml maven

com.jcabi:jcabi-log 0.23.0

requirements.txt pypi

chardet ==5.2.0
flake8 ==6.1.0
javalang ==0.13.0
multimetric ==2.0.5
pygments ==2.16.1
pylint ==3.0.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

cam