inspect4py

Static code analysis package for Python repositories

https://github.com/softwareunderstanding/inspect4py

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.8%) to scientific vocabulary

Keywords

ast software-classification software-mining static-analysis
Last synced: 6 months ago · JSON representation ·

Repository

Static code analysis package for Python repositories

Basic Info
Statistics
  • Stars: 31
  • Watchers: 2
  • Forks: 10
  • Open Issues: 43
  • Releases: 9
Topics
ast software-classification software-mining static-analysis
Created almost 5 years ago · Last pushed about 2 years ago
Metadata Files
Readme Contributing License Citation

README.md

inspect4py

PyPI DOI Project Status: Active – The project has reached a stable, usable state and is being actively developed.

logo

Library to allow users inspect a software project folder (i.e., a directory and its subdirectories) and extract all the most relevant information, such as class, method and parameter documentation, classes (and their methods), functions, etc.

Features:

Given a folder with code, inspect4py will:

  • Extract all imported modules and how each module is imported as (i.e., whether they are internal or external).
  • Extract all functions in the code, including their documentation, parameters, accepted values, and call list.
  • Extract all classes in the code, with all their methods and respective documentation
  • Extract the control flow of each file.
  • Extract the hierarchy of directories and files.
  • Extract the requirements used in the software project.
  • Classify which files are tests
  • Classify the main type of software project (script, package, library or service). Only one type is returned as main type (e.g., if a library has the option to be deployed as a service, inspect4py will return Library as its main type)
  • Return a ranking of the different ways in which a a software component can be run, ordered by relevance.

All metadata is extracted as a JSON file.

Inspect4py currently works only for Python 3 projects.

Background:

inspect4py added the functionality of capture Data Flow Graphs for each function inspired by GraphCodeBERT: Github & Paper. The illustration is given: |Source Code|List Output|Networkx Image| |:-:|:-:|:-:| |

def max(a, b):
x = 0
if a > b:
x = a
else:
x = b
return x
|
('a', 3, 'comesFrom', [], [])
('b', 5, 'comesFrom', [], [])
('x', 8, 'computedFrom', ['0'], [10])
('0', 10, 'comesFrom', [], [])
('a', 12, 'comesFrom', ['a'], [3])
('b', 14, 'comesFrom', ['b'], [5])
('x', 16, 'computedFrom', ['a'], [18])
('a', 18, 'comesFrom', ['a'], [3])
('x', 21, 'computedFrom', ['b'], [23])
('b', 23, 'comesFrom', ['b'], [5])
('x', 25, 'comesFrom', ['x'], [16, 21])
|image|

inspect4py uses ASTs, more specifically the ast module in Python, generating a tree of objects (per file) whose classes all inherit from ast.AST.

inspect4py parses each of the input file(s) as an AST tree, extracting the relevant information and storing it as a JSON file. Furthermore, it also captures the control flow of each input file(s), by using another two libraries:

  • staticfg: StatiCFG is a package that can be used to produce control flow graphs (CFGs) for Python 3 programs. The CFGs it generates can be easily visualised with graphviz and used for static analysis. We have a flag in the code (FLAG_PNG) to indicate if we want to generate this type of control flow graphs or not. Note: The original code of this package can be found here, which has been fixed it in our repository

We also use docstring_parser, which has support for ReST, Google, and Numpydoc-style docstrings. Some (basic) tests done using this library can be found at here.

Finally, we reuse Pigar for generating automatically the requirements of a given repository. This is an optional funcionality. In order to activate the argument (-r) has to be indicated when running inspect4py.

Cite inspect4py

Please cite our MSR 2022 demo paper: @inproceedings{FilgueiraG22, author = {Rosa Filgueira and Daniel Garijo}, title = {Inspect4py: {A} Knowledge Extraction Framework for Python Code Repositories}, booktitle = {{IEEE/ACM} 19th International Conference on Mining Software Repositories, {MSR} 2022, Pittsburgh, PA, USA, May 23-24, 2022}, pages = {232--236}, publisher = {{IEEE}}, year = {2022}, url = {https://dgarijo.com/papers/inspect4py_MSR2022.pdf}, doi = {10.1145/3524842.3528497} }

Install

Preliminaries

Make sure you have tree-sitter installed, C complier is needed, more info:

pip install tree-sitter Note that if the ".so" file is not working properly, it is recommended that run the following commeds to generate a so file for your OS: ``` git clone https://github.com/tree-sitter/tree-sitter-python

python inspect4py/build.py ```

Make sure you have graphviz installed:

sudo apt-get install graphviz

Python version

We have tested inspect4py in Python 3.7+. Our recommended version is Python 3.9.

Operative System

We have tested inspect4py in Unix, MacOS and Windows 11(22621.1265).

Installation from pypi

inspect4py is available in pypi! Just install it like a regular package:

pip install inspect4py

You are done!

Then try to update the python-dev utilities: sudo apt-get install python3.X-dev (where X is your python version)

Installation from code

Prepare a virtual Python3 enviroment, cd into the inspect4py folder and install the package as follows:

git clone https://github.com/SoftwareUnderstanding/inspect4py cd inspect4py pip install -e .

You are done!

Package dependencies:

docstring_parser==0.7 astor graphviz click pigar setuptools==54.2.0 json2html configparser bigcode_astgen GitPython tree-sitter

If you want to run the evaluations, do not forget to add pandas to the previous set.

Installation through Docker

You need to have Docker installed.

Next, clone the inspect4py repository:

git clone https://github.com/SoftwareUnderstanding/inspect4py/

Generate a Docker image for inspect4py:

docker build --tag inspect4py:1.0 .

Run the inspect4py image:

docker run -it --rm inspect4py:1.0 /bin/bash

Now you can run inspect4py: root@e04792563e6a:/# inspect4py --help

For more information about inspect4py execution options, please see the section below (Execution).

Note that when running inspect4py with Docker, you will need to need to provide a path to the target repository to analyze. You can do this by:

  1. Cloning the target repository. For example:

``` docker run -it --rm inspect4py:1.0 /bin/bash

Docker image starts

root@e04792563e6a:/# git clone https://github.com/repo/id root@e04792563e6a:/# inspect4py -i id ``` 2. Creating a volume. For example, for mounting the $PWD folder:

``` docker run -it -v -v $PWD:/out --rm inspect4py:1.0 /bin/bash

Docker image starts

root@e04792563e6a:/# inspect4py -i /out/path/to/repo ```

Other useful commands when using Docker: docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|- docker image rm -f inspect4py:1.0

Execution

The tool can be executed to inspect a file, or all the files of a given directory (and its subdirectories). For example, it can be used to inspect all the python files of a given GitHub repository (that has been previously cloned locally).

The tool by default stores the results in the OutputDir directory, but users can specify their own directory name by using -o or --output flags.

inspect4py --input_path <FILE.py | DIRECTORY> [--output_dir "OutputDir", --ignore_dir_pattern "__", ignore_file_pattern "__" --requirements --html_output]

For clarity, we have added a help command to explain each input parameter:

``` inspect4py --help

Usage: inspect4py [OPTIONS]

Options: --version Show the version and exit. -i, --inputpath TEXT input path of the file or directory to inspect. [required] -o, --outputdir TEXT output directory path to store results. If the directory does not exist, the tool will create it. -ignoredir, --ignoredirpattern TEXT ignore directories starting with a certain pattern. This parameter can be provided multiple times to ignore multiple directory patterns. -ignorefile, --ignorefilepattern TEXT ignore files starting with a certain pattern. This parameter can be provided multiple times to ignore multiple file patterns. -r, --requirements find the requirements of the repository. -html, --htmloutput generates an html file of the DirJson in the output directory. -cl, --calllist generates the call list in a separate html file. -cf, --controlflow generates the call graph for each file in a different directory. -dt, --directorytree captures the file directory tree from the root path of the target repository. -si, --softwareinvocation generates which are the software invocation commands to run and test the target repository. -ast, -—abstractsyntaxtree generates abstract syntax tree in json format. -sc, --sourcecode generates source code of each ast node. -ld, --licensedetection detects the license of the target repository. -rm, --readme extract all readme files in the target repository. -md, --metadata extract metadata of the target repository using Github API. -df, --dataflow extract data flow graph for every function, BOOL -st, --symbol_table symbol table file location. STR --help Show this message and exit. ```

Documentation

For additional documentation and examples, please have a look at our online documentation

Contribution guidelines

Contributions to address any of the current issues are welcome. In order to push your contribution, just push your pull request to the development branch (dev). The master branch has only the code associated to the latest release.

Acknowledgements

We would like to thank Laura Camacho, designer of the logo

Owner

  • Name: Software Understanding
  • Login: SoftwareUnderstanding
  • Kind: organization

Organization dedicated to make software understandable by extracting metadata and facilitate its reuse

Citation (CITATION.cff)

title: "Inspect4py: A Knowledge Extraction Framework for Python Code Repositories"
license: BSD-3-Clause license
authors:
  - family-names: Filgueira
    given-names: Rosa
    orcid: "0000-0002-5715-3046"
  - family-names: Garijo
    given-names: Daniel
    orcid: "http://orcid.org/0000-0003-0454-7145"
cff-version: 1.2.0
message: "If you use this software, please cite both the article from preferred-citation and the software itself."
preferred-citation:
  authors:
  - family-names: Filgueira
    given-names: Rosa
  - family-names: Garijo
    given-names: Daniel
  title: "Inspect4py: A Knowledge Extraction Framework for Python Code Repositories"
  type: article
  year: 2022
  doi: 10.1145/3524842.3528497
identifiers:
  - description: "Collection of archived snapshots for Inspect4py"
    type: doi
    value: 10.5281/zenodo.5907936

GitHub Events

Total
  • Watch event: 6
  • Fork event: 2
Last Year
  • Watch event: 6
  • Fork event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 44
  • Total pull requests: 58
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 1 day
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 1.43
  • Average comments per pull request: 0.67
  • Merged pull requests: 51
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • dgarijo (28)
  • OEG-Clark (6)
  • lazyhope (4)
  • WilliamsCJ (2)
  • rosafilgueira (2)
  • dakixr (1)
  • smith-co (1)
Pull Request Authors
  • rosafilgueira (28)
  • dgarijo (15)
  • OEG-Clark (9)
  • WilliamsCJ (3)
  • lazyhope (3)
Top Labels
Issue Labels
bug (19) enhancement (10) new feature (3) toDiscuss (2) documentation (2) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 108 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 1
  • Total versions: 8
  • Total maintainers: 2
pypi.org: inspect4py

Static code analysis package for Python repositories

  • Versions: 8
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 108 Last month
Rankings
Dependent packages count: 4.7%
Downloads: 10.6%
Forks count: 11.9%
Average: 12.3%
Stargazers count: 12.6%
Dependent repos count: 21.7%
Maintainers (2)
Last synced: 6 months ago

Dependencies

.github/workflows/pypi-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
Dockerfile docker
  • python 3.7 build
test/test_files/somef/Dockerfile docker
  • python 3.6 build
docs/requirements.txt pypi
  • mkdocs-material *
requirements.txt pypi
  • GitPython *
  • astor *
  • bigcode_astgen *
  • click *
  • configparser *
  • docstring_parser ==0.7
  • graphviz *
  • json2html *
  • pigar *
  • setuptools ==54.2.0
requirements_eval.txt pypi
  • astor *
  • cdmcfparser *
  • click *
  • configparser *
  • docstring_parser ==0.7
  • graphviz *
  • json2html *
  • pandas *
  • pigar *
  • setuptools ==54.2.0
test/test_files/BoostingMonocularDepth/requirements.txt pypi
  • cudnnenv * test
  • gdown * test
  • gradio * test
  • matplotlib * test
  • opencv-python * test
  • scikit-image * test
  • scipy * test
  • torch ==1.2 test
  • torchvision * test
test/test_files/BoostingMonocularDepth/structuredrl/models/syncbn/requirements.txt pypi
  • cffi * test
  • future * test
test/test_files/pylops/requirements-dev.txt pypi
  • PyWavelets * development
  • Sphinx * development
  • image * development
  • ipython * development
  • matplotlib * development
  • nbsphinx * development
  • numba * development
  • numpy >=1.15.0 development
  • numpydoc * development
  • pyfftw * development
  • pytest * development
  • pytest-runner * development
  • scikit-fmm * development
  • scipy >=1.4.0 development
  • setuptools_scm * development
  • spgl1 * development
  • sphinx-gallery * development
  • sphinx-rtd-theme * development
test/test_files/pylops/setup.py pypi
  • numpy *
  • scipy *
test/test_files/somef/docs/requirements.txt pypi
  • mkdocs-material * test
test/test_files/somef/setup.py pypi
  • Click *
  • bs4 *
  • click-option-group *
  • markdown *
  • matplotlib *
  • nltk *
  • numpy *
  • pandas *
  • rdflib *
  • rdflib-jsonld *
  • requests *
  • scikit-learn ==0.21.2
  • textblob *
test/test_files/somef/src/somef.egg-info/requires.txt pypi
  • Click * test
  • bs4 * test
  • click-option-group * test
  • markdown * test
  • matplotlib * test
  • nltk * test
  • numpy * test
  • pandas * test
  • rdflib * test
  • rdflib-jsonld * test
  • requests * test
  • scikit-learn ==0.21.2 test
  • textblob * test
pyproject.toml pypi
setup.py pypi
test/test_files/Chowlk/pyproject.toml pypi
test/test_files/pylops/environment.yml pypi
test/test_files/pylops/requirements.txt pypi