https://github.com/chains-project/dirty-waters

automatically detect software supply chain smells and issues http://arxiv.org/pdf/2410.16049

https://github.com/chains-project/dirty-waters

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, acm.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
    Organization chains-project has institutional domain (chains.proj.kth.se)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.1%) to scientific vocabulary

Keywords from Contributors

transformer
Last synced: 6 months ago · JSON representation

Repository

automatically detect software supply chain smells and issues http://arxiv.org/pdf/2410.16049

Basic Info
Statistics
  • Stars: 17
  • Watchers: 3
  • Forks: 4
  • Open Issues: 36
  • Releases: 3
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

dirty-waters

Dirty-waters automatically finds software supply chain issues in software projects by analyzing the available metadata of all dependencies, transitively.

Reference: Dirty-Waters: Detecting Software Supply Chain Smells, Proceedings of FSE Tool Track, 2025 (doi: 10.1145/3696630.3728578).

By using dirty-waters, you identify the shady areas of your supply chain, which would be natural target for attackers to exploit.

dirty-waters's static analyses report the following smells:

  • Dependencies with no/invalid* link to source code repositories (high severity)
  • Dependencies with no tag/commit SHA for release, impossible to have reproducible builds (medium severity)
  • Deprecated Dependencies (medium severity)
  • Depends on a fork (low severity), disabled by default
  • Dependencies without/with invalid code signature (medium severity)
  • Dependencies with no build attestation (low severity)
  • Dependencies with alias (low severity)

* We consider invalid links to be links which do not return a 200 status code. Furthermore, if the dependencies are not hosted on GitHub, not all checks will be possible to be made (e.g., release tag/commit SHA).

As for its differential analyses, dirty-waters reports the following smells:

  • Dependencies with code signature changes (high severity)
  • Downgraded dependencies (medium severity)
  • Dependencies with commits made by both new authors and reviewers (medium severity)
  • Dependencies with commits approved by new reviewers (medium severity)
  • Dependencies with new contributors (low severity)

Additionally, dirty-waters gives a supplier view on the dependency trees (who owns the different dependencies?)

dirty-waters is developed as part of the Chains research project.

Installation

Installation via pip

You can install dirty-waters via pip:

```bash pip install dirty-waters

or

pipx install dirty-waters ```

Set up the GitHub API token (or with a .env file):

bash export GITHUB_API_TOKEN=<your_token>

Usage

Command line

Run the tool using the following command structure:

```

analyzing the software supply chain of Maven project INRIA/spoon

$ dirty-waters -p INRIA/spoon -pm maven ```

All configuration options

``` usage: main.py [-h] -p PROJECTREPONAME [-v RELEASEVERSIONOLD] [-vn RELEASEVERSIONNEW] [-d] [-n] -pm {yarn-classic,yarn-berry,pnpm,npm,maven} [--debug] [--config CONFIG] [--gradual-report GRADUAL_REPORT | --no-gradual-report] [--check-source-code] [--check-source-code-sha] [--check-deprecated] [--check-forks] [--check-provenance] [--check-code-signature] [--check-aliased-packages]

options: -h, --help show this help message and exit -p PROJECTREPONAME, --project-repo-name PROJECTREPONAME Specify the project repository name. Example: MetaMask/metamask-extension -v RELEASEVERSIONOLD, --release-version-old RELEASEVERSIONOLD The old release tag of the project repository. Defaults to HEAD. Example: v10.0.0 -vn RELEASEVERSIONNEW, --release-version-new RELEASEVERSIONNEW The new release version of the project repository. -d, --differential-analysis Run differential analysis and generate a markdown report of the project -n, --name-match Compare the package names with the name in the in the package.json file. This option will slow down the execution time due to the API rate limit of code search. -pm {yarn-classic,yarn-berry,pnpm,npm,maven}, --package-manager {yarn-classic,yarn-berry,pnpm,npm,maven} The package manager used in the project. --debug Enable debug mode. --config CONFIG Path to configuration file (JSON) --gradual-report GRADUAL_REPORT Enable/disable gradual reporting (default: true) --no-gradual-report Disable gradual reporting (deprecated, use --gradual-report=false instead)

smell checks: --check-source-code Check for dependencies with no link to source code repositories --check-source-code-sha Check for dependencies with no commit sha/tag for release --check-deprecated Check for deprecated dependencies --check-forks Check for dependencies that are forks --check-provenance Check for dependencies with no build attestation --check-code-signature Check for dependencies with missing/invalid code signature --check-aliased-packages Check for aliased packages ```

Reports are gradual by default: that is, only the highest severity smell type with issues found within this project is reported. You can disable this feature, and get a full report, by setting the --gradual-report flag to false.

  1. Static analysis:

```bash

If manually cloned

python3 main.py -p MetaMask/metamask-extension -pm yarn-berry

If installed via pip

dirty-waters -p MetaMask/metamask-extension -pm yarn-berry ```

  1. Differential analysis:

```bash

If manually cloned

python3 main.py -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -d -pm yarn-berry

If installed via pip

dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -vn v11.12.0 -d -pm yarn-berry ```

Notes:

  • -v should be the version of GitHub release, e.g. for this release, the value should be v11.11.0, not Version 11.11.0 or 11.11.0.
  • When using -d for differential analysis, -vn must be specified.

Development

To set up dirty-waters, follow these steps:

  1. Clone the repository:

bash git clone https://github.com/chains-project/dirty-waters.git cd dirty-waters

  1. Set up a virtual environment and install dependencies:

bash python3 -m venv venv source venv/bin/activate pip install -r requirements.txt cd tool

In alternative to virtual environments, you may also use the Nix flake present in this repository.

  1. Set up the GitHub API token (ideally, in a .env file):

bash export GITHUB_API_TOKEN=<your_token>

Configuration

You can set the tool's configuration through a JSON file, which can be then passed to the tool using the --config flag. At the moment, we have configuration support to: - ignore smells for specific dependencies (ignore), as well as dependencies with specific parents (ignore-if-parent); - provide hardcoded URLs (revisions) for both source code repositories (source_code_url) and tag/SHA locations (source_code_version_url).

The dependencies can be set either as an exact match or as a regex pattern (this only for ignoring smells). Note that regular expressions don't behave the same as Unix match expressions: e.g., @types* will match every string starting with @type and 0 or more s following it. For a Unix-like behavior, the equivalent regular expression would be ^@types/.*.

To ignore smells, you can either set "all" to ignore every check for the dependency or specify the checks you want to ignore.

The possible specific check options to ignore are as follows (note that checks represented as "children" of another check are ignored if the parent one is):

  • "source_code"
    • "source_code_sha"
    • "forks"
  • "deprecated"
  • "provenance"
  • "code_signature"
  • "aliased_packages"

An example configuration file:

json { "ignore": { "shescape@2.1.0": "all", "^@types/.*": ["forks"] }, "ignore-if-parent": { "^org.apache.maven.plugins:maven-release-plugin.*": "all" }, "revisions": { "io.perfmark:perfmark-api@0.27.0": { "source_code_url": "https://github.com/perfmark/perfmark", "source_code_version_url": "https://github.com/perfmark/perfmark/tree/v0.27.0" } } }

Note that for cases where a package is aliased, we check for the original package name, not the aliased one: i.e., if we alias the package string-width to string-width-cjs, we will check for string-width@versionx.y.z, not string-width-cjs@versionx.y.z.

Package Name Formatting

The packages present in the configuration file should be set with a specific formatting: <package_name>@<version>. In the case of Maven packages, you should use the format <group_id>:<artifact_id>@<version>.

Continuous integration

See Github action at https://github.com/chains-project/dirty-waters-action

Software Supply Chain Smell Support

dirty-waters currently supports package managers within the JavaScript and Java ecosystems. However, due to some constraints associated with the nature of the package managers, the tool may not be able to detect all the smells in the project. The following table shows the supported package managers and their associated smells, for static analysis:

| Package Manager | No Source Code Repository | Invalid Source Code Repository URL | No SHA/Release Tag | Deprecated Dependency | Depends on a Fork | No Build Attestation | No/Invalid Code Signature | Aliased Packages | | --------------- | ------------------------- | ---------------------------------- | ------------------ | --------------------- | ----------------- | -------------------- | ------------------------- | ---------------- | | Yarn Classic | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Yarn Berry | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Pnpm | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No | | Npm | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | | Maven | Yes | Yes | Yes | No | Yes | No | Yes | No |

All package managers support every smell in the differential analysis scenario.

Smell Check Options

By default, all supported checks for the given package manager are performed in static analysis. You can specify individual checks using the following flags (note that if at least one flag is passed, instead of all checks being performed, only the flagged ones will be):

  • --check-source-code: Check for dependencies with no link to source code repositories
  • --check-source-code-sha: Check for dependencies with no tag/commit sha for release
  • --check-deprecated: Check for deprecated dependencies
  • --check-forks: Check for dependencies that are forks
  • --check-provenance: Check for dependencies with no build attestation
  • --check-code-signature: Check for dependencies with no/invalid code signature

Note: The --check-source-code-sha and --check-forks flags require --check-source-code to be enabled, as release tags can only be checked if we can first verify the source code repository.

As an example of running specific checks:

bash dirty-waters -p MetaMask/metamask-extension -v v11.11.0 -pm yarn-berry --check-source-code --check-source-code-sha

This run will only check for dependencies with no link to source code repositories and dependencies with no tag/commit sha for release.

For differential analysis, it is currently not possible to specify individual checks -- all checks will be performed.

Notes

Inaccessible Tags

Sometimes, the release version specified in a lockfile/pom/similar is not necessarily the same as the tag used in the repository. This can happen for a variety of reasons. We have compiled several tag formats which were deemed reasonable to lookup, if the exact tag specified in the lockfile/pom/similar is not found. They come from a combination of AROMA's work and our own research on this subject. These formats are the following:

Tag formats - `` - `v` - `r-` - `release-` - `parent-` - `@` - `-v` - `_v` - `-` - `_` - `@` - `-v` - `_v` - `-` - `_` - `@` - `-v` - `_v` - `-` - `_` - `release/` - `-release` - `v.` - `p1-p2-p3` As examples of what `package_name`, `repo_name`, and `project_name` could be, `maven-surefire` is an interesting dependency: - `maven-surefire-common` is the package name - `maven-surefire` is the repo name (we remove the owner prefix) - `surefire` is the project name In particular, there are many `maven-*` dependencies whose tags follow these last conventions.

Note than this does not mean that if dirty-waters does not find a tag, it doesn't exist: it means that it either doesn't exist, or that its format is not one of the above.

This list may be expanded in the future. If you feel that a relevant format is missing, please open an issue and/or a pull request!

Academic Work

Other issues not handled by dirty-waters

  • Missing dependencies: simply run mvn/pip/... install :)
  • Bloated dependencies: we recommend DepClean for Java, depcheck for NPM
  • Version constraint inconsistencies: we recommend pipdeptree for Python
  • Smells in GitHub Actions: we recommend zizmor

License

MIT License.

Owner

  • Name: CHAINS research project at KTH Royal Institute of Technology
  • Login: chains-project
  • Kind: organization

"Consistent Hardening and Analysis of Software Supply Chains" at KTH, funded by SSF

GitHub Events

Total
  • Create event: 178
  • Commit comment event: 3
  • Release event: 1
  • Issues event: 95
  • Watch event: 13
  • Delete event: 91
  • Issue comment event: 177
  • Push event: 577
  • Pull request review comment event: 12
  • Pull request review event: 16
  • Pull request event: 218
  • Fork event: 5
Last Year
  • Create event: 178
  • Commit comment event: 3
  • Release event: 1
  • Issues event: 95
  • Watch event: 13
  • Delete event: 91
  • Issue comment event: 177
  • Push event: 577
  • Pull request review comment event: 12
  • Pull request review event: 16
  • Pull request event: 218
  • Fork event: 5

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 396
  • Total Committers: 6
  • Avg Commits per committer: 66.0
  • Development Distribution Score (DDS): 0.336
Past Year
  • Commits: 339
  • Committers: 6
  • Avg Commits per committer: 56.5
  • Development Distribution Score (DDS): 0.224
Top Committers
Name Email Commits
Diogo Gaspar d****r@k****e 263
Stamp9 3****2@q****m 94
renovate[bot] 2****] 19
Martin Monperrus m****s@g****g 10
Sofia Bobadilla 6****a 8
nektos/act n****t 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 72
  • Total pull requests: 237
  • Average time to close issues: 28 days
  • Average time to close pull requests: 4 days
  • Total issue authors: 7
  • Total pull request authors: 6
  • Average comments per issue: 0.9
  • Average comments per pull request: 0.48
  • Merged pull requests: 193
  • Bot issues: 1
  • Bot pull requests: 106
Past Year
  • Issues: 71
  • Pull requests: 236
  • Average time to close issues: 18 days
  • Average time to close pull requests: 3 days
  • Issue authors: 7
  • Pull request authors: 6
  • Average comments per issue: 0.85
  • Average comments per pull request: 0.48
  • Merged pull requests: 193
  • Bot issues: 1
  • Bot pull requests: 106
Top Authors
Issue Authors
  • randomicecube (34)
  • monperrus (22)
  • ericcornelissen (9)
  • Stamp9 (4)
  • larissaschmid (1)
  • renovate[bot] (1)
  • algomaster99 (1)
Pull Request Authors
  • randomicecube (118)
  • renovate[bot] (106)
  • monperrus (7)
  • Stamp9 (2)
  • ericcornelissen (2)
  • LogFlames (2)
Top Labels
Issue Labels
enhancement (23) Java (8) High (7) documentation (4) bug (3) Low (3) Medium (2) JavaScript (2) help wanted (2) question (1)
Pull Request Labels
enhancement (84) documentation (24) Java (20) bug (19) JavaScript (14)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 308 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 104
  • Total maintainers: 1
pypi.org: dirty-waters

Automatically detect software supply chain smells and issues

  • Versions: 104
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 308 Last month
Rankings
Dependent packages count: 10.1%
Average: 33.5%
Dependent repos count: 56.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • Requests ==2.32.3
  • pandas ==2.2.3
  • requests_cache ==1.2.1
  • tqdm ==4.66.5