Metrics As Scores

Metrics As Scores: A Tool- and Analysis Suite and Interactive Application for Exploring Context-Dependent Distributions - Published in JOSS (2023)

https://github.com/mrshoenel/metrics-as-scores

Science Score: 98.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 39 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords from Contributors

mesh

Scientific Fields

Economics Social Sciences - 63% confidence
Artificial Intelligence and Machine Learning Computer Science - 62% confidence
Mathematics Computer Science - 45% confidence
Last synced: 4 months ago · JSON representation ·

Repository

Contains the data and scripts needed for the application Metrics as Scores

Basic Info
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 16
Created over 3 years ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License Citation

README.md

Metrics As Scores DOI status

codecov


Please Note: Metrics As Scores (MAS) changed considerably between versions v1.0.8 and v2.x.x.

The current version is v2.8.2.

From version v2.x.x it has the following new features:

  • Textual User Interface (TUI)
  • Proper documentation and testing
  • New version on PyPI. Install the package and run the command line interface by typing mas!

Metrics As Scores Demo.


Contains the data and scripts needed for the application Metrics as Scores, check out https://mas.research.hönel.net/.

This package accompanies the paper entitled “Contextual Operationalization of Metrics As Scores: Is My Metric Value Good?” (Hönel et al. 2022). It seeks to answer the question whether or not the domain a software metric was captured in, matters. It enables the user to compare domains and to understand their differences. In order to answer the question of whether a metric value is actually good, we need to transform it into a score. Scores are normalized and rectified distances, that can be compared in an apples-to-apples manner, across domains. The same metric value might be good in one domain, while it is not in another. To borrow an example from the domain of software: It is much more acceptable (or common) to have large applications (in terms of lines of code) in the domains of games and databases than it is for the domains of IDEs and SDKs. Given an ideal value for a metric (which may also be user-defined), we can transform observed metric values to distances from that value and then use the cumulative distribution function to map distances to scores.


Usage

You may install Metrics As Scores directly from PyPI. For users that wish to contribute to Metrics As Scores, a development setup is recommended. In either case, after the installation, you have access to the text-based user interface.

``` shell

Installation from PyPI:

pip install metrics-as-scores ```

You can bring up the TUI simply by typing the following after installing or cloning the repo (see next section for more details):

shell mas

Text-based User Interface (TUI)

Metrics As Scores features a text-based command line user interface (TUI). It offers a couple of workflows/wizards, that help you to work and interact with the application. There is no need to modify any source code, if you want to do one of the following:

  • Show Installed Datasets
  • Show List of Known Datasets Available Online That Can Be Downloaded
  • Download and install a known or existing dataset
  • Create Own Dataset to be used with Metrics-As-Scores
  • Fit Parametric Distributions for Own Dataset
  • Pre-generate distributions for usage in the Web-Application
  • Bundle Own dataset so it can be published
  • Run local, interactive Web-Application using a selected dataset

Metrics As Scores Text-based User Interface
(TUI).

Web Application

Metrics As Scores’ main feature is perhaps the Web Application. It can be run directly and locally from the TUI using a selected dataset (you may download a known dataset or use your own). The Web Application allows to visually inspect each feature across all the defined groups. It features the PDF/PMF, CDF and CCDF, as well as the PPF for each feature in each group. It offers five different principal types of densities: Parametric, Parametric (discrete), Empirical, Empirical (discrete), and (approximate) Kernel Density Estimation. The Web Application includes a detailed Help section that should answer most of your questions.

Metrics As Scores Interactive Web
.

Development Setup

This project was developed using and requires Python >=3.10. The development documentation can be found at https://mrshoenel.github.io/metrics-as-scores/. Steps:

  1. Clone the Repository,
  2. Set up a virtual environment,
  3. Install packages.

Setting Up a Virtual Environment

It is recommended to use a virtual environment. To use a virtual environment, follow these steps (Windows specific; activation of the environment might differ).

shell virtualenv --python=C:/Python310/python.exe venv # Use specific Python version for virtual environment venv/Scripts/activate

Here is a Linux example that assumes you have Python 3.10 installed (this may also require installing python3.10-venv and/or python3.10-dev):

shell python3.10 -m venv venv source venv/bin/activate # Linux

Installing Packages

The project is managed with Poetry. To install the required packages, simply run the following.

``` shell venv/Scripts/activate

First, update pip:

(venv) C:\metrics-as-scores>python -m pip install --upgrade pip

First install Poetry v1.3.2 using pip:

(venv) C:\metrics-as-scores>pip install poetry==1.3.2

Install the projects and its dependencies

(venv) C:\metrics-as-scores> poetry install ```

The same in Linux:

shell source venv/bin/activate # Linux (venv) ubuntu@vm:/tmp/metrics-as-scores$ python -m pip install --upgrade pip (venv) ubuntu@vm:/tmp/metrics-as-scores$ pip install poetry==1.3.2 (venv) ubuntu@vm:/tmp/metrics-as-scores$ poetry install

Running Tests

Tests are run using poethepoet:

``` shell

Runs the tests and prints coverage

(venv) C:\metrics-as-scores>poe test ```

You can also generate coverage reports:

``` shell

Writes reports to the local directory htmlcov

(venv) C:\metrics-as-scores>poe cov ```


Example Usage

Metrics As Scores can be thought of an interactive, multiple-ANOVA analysis and explorer. The analysis of variance (ANOVA; Chambers, Freeny, and Heiberger (2017)) is usually used to analyze the differences among hypothesized group means for a single feature. An ANOVA might be used to estimate the goodness-of-fit of a statistical model. Beyond ANOVA, MAS seeks to answer the question of whether a sample of a certain quantity (feature) is more or less common across groups. For each group, we can determine what might constitute a common/ideal value, and how distant the sample is from that value. This is expressed in terms of a percentile (a standardized scale of [0,1]), which we call score.

Concrete Example Using the Qualitas.class Corpus Dataset

The notebook notebooks/Example-webapp-qcc.ipynb holds a concrete example for using the web application to interactively obtain scores. In this example, we create a hypothetical application that ought to be in the application domain SDK. Using a concrete metric, Number of Packages, we find out that our hypothetical new SDK application scores poorly for what it is intended to be.

This example illustrates the point that software metrics, when captured out of context, are meaningless (Gil and Lalouche 2016). For example, typical values for complexity metrics are vastly different, depending on the type of application. We find that, for example, applications of type SDK have a much lower expected complexity compared to Games (1.9 vs. 3.1) (Hönel et al. 2022). Software metrics are often used in software quality models. However, without knowledge of the application’s context (here: domain), the deduced quality of these models is at least misleading, if not completely off. This becomes apparent if we examine how an application’s complexity scores across certain domains.

Since there are many software metrics that are captured simultaneously, we can also compare domains in their entirety: How many metrics are statistically significantly different from each other? Is there a set of domains that are not distinguishable from each other? Are there metrics that are always different across domains and must be used with care? In this example, we use a known and downloadable dataset (Hönel 2023b). It is based on software metrics and application domains of the “Qualitas.class corpus” (Terra et al. 2013; Tempero et al. 2010).

Concrete Example Using the Iris Dataset

The notebook notebooks/Example-create-own-dataset.ipynb holds a concrete example for creating/importing/using one’s own dataset. Although all necessary steps can be achieved using the TUI, this notebook demonstrates a complete example of implementing this in code.

Diamonds Example

The diamonds dataset (Wickham 2016) holds prices of over 50,000 round cut diamonds. It contains a number attributes for each diamond, such as its price, length, depth, or weight. The dataset, however, features three quality attributes: The quality of the cut, the clarity, and the color. Suppose we are interested in examining properties of diamonds of the highest quality only, across colors. Therefore, we select only those diamonds from the dataset that have an ideal cut and the best (IF) clarity. Now only the color quality gives a context to each diamonds and its attributes (i.e., diamonds are now grouped by color).

This constellation now allows us to examine differences across differently colored diamonds. For example, there are considerable differences in price. We find that only the group of diamonds of the best color is significantly different from the other groups. This example is available as a downloadable dataset (Hönel 2023c).


Datasets

Metrics As Scores can use existing and own datasets. Please keep reading to learn how.

Use Your Own

Metrics As Scores has a built-in wizard that lets you import your own dataset! There is another wizard that bundles your dataset so that it can be shared with others. You may contribute your dataset so we can add it to the curated list of known datasets (see next section). If you do not have an own dataset, you can use the built-in wizard to download any of the known datasets, too!

Note that Metrics As Scores supports you with all tools necessary to create a publishable dataset. For example, it carries out the common statistical tests:

  • ANOVA (Chambers, Freeny, and Heiberger 2017): Analysis of variance of your data across the available groups.
  • Tukey’s Honest Significance Test (TukeyHSD; Tukey (1949)): This test is used to gain insights into the results of an ANOVA test. While the former only allows obtaining the amount of corroboration for the null hypothesis, TukeyHSD performs all pairwise comparisons (for all possible combinations of any two groups).
  • Two-sample T-test: Compares the means of two samples to give an indication whether or not they appear to come from the same distribution. Again, this is useful for comparing groups. Tukey’s test is used to gain insights into the results of an ANOVA test. While the former only allows obtaining the amount of corroboration for the null hypothesis, TukeyHSD performs all pairwise comparisons (for all possible combinations of any two groups).

It also creates an automatic report based on these tests that you can simply render into a PDF using Quarto.

A publishable dataset must contain parametric fits and pre-generated densities (please check the wizard for these two). Metrics As Scores can fit approximately 120 continuous and discrete random variables using Pymoo (Blank and Deb 2020). Note that Metrics As Scores also automatically carries out a number of goodness-of-fit tests. The type of test also depends on the data (for example, not each test is valid for discrete data, such as the KS two-sample test). These tests are then used to select some best fitting random variable for display in the web application.

  • Cramér-von Mises- (Cramér 1928) and Kolmogorov–Smirnov one-sample (Stephens 1974) tests: After fitting a distribution, the sample is tested against the fitted parametric distribution. Since the fitted distribution cannot usually accommodate all of the sample’s subtleties, the test will indicate whether the fit is acceptable or not.
  • Cramér-von Mises- (Anderson 1962), Kolmogorov–Smirnov-, and Epps–Singleton (Epps and Singleton 1986) two-sample tests: After fitting, we create a second sample by uniformly sampling from the PPF. Then, both samples can be used in these tests. The Epps–Singleton test is also applicable for discrete distributions.

Known Datasets

The following is a curated list of known, publicly available datasets that can be used with Metrics As Scores. These datasets can be downloaded using the text-based user interface.


Personalizing the Web Application

The web application Metrics As Scores is located in the directory src/metrics_as_scores/webapp/. The app itself has three vertical blocks: a header, the interactive part, and a footer. Header and footer can be easily edited by modifing the files src/metrics_as_scores/webapp/header.html and src/metrics_as_scores/webapp/footer.html.

Note that when you create your own dataset, you get to add sections to the header and footer using two HTML fragments. This is recommended over modifying the web application directly.

If you want to change the title of the application, you will have to modify the file src/metrics_as_scores/webapp/main.py at the very end:

``` python

Change this line to your desired title.

curdoc().title = "Metrics As Scores" ```

Important: If you modify the web application, you must always maintain two links: one to https://mas.research.hönel.net/ and one to this repository, that is, https://github.com/MrShoenel/metrics-as-scores.

References

Anderson, T. W. 1962. “On the Distribution of the Two-Sample Cramer-von Mises Criterion.” *The Annals of Mathematical Statistics* 33 (3): 1148–59. .
Blank, Julian, and Kalyanmoy Deb. 2020. “pymoo: Multi-Objective Optimization in Python.” *IEEE Access* 8: 89497–509. .
Chambers, John M., Anne E. Freeny, and Richard M. Heiberger. 2017. “Analysis of Variance; Designed Experiments.” In *Statistical Models in S*, edited by John M. Chambers and Trevor J. Hastie, 1st ed. Routledge. .
Cramér, Harald. 1928. “On the Composition of Elementary Errors.” *Scandinavian Actuarial Journal* 1928 (1): 13–74. .
Epps, T. W., and Kenneth J. Singleton. 1986. “An Omnibus Test for the Two-Sample Problem Using the Empirical Characteristic Function.” *Journal of Statistical Computation and Simulation* 26 (3-4): 177–203. .
Gil, Joseph Yossi, and Gal Lalouche. 2016. “When Do Software Complexity Metrics Mean Nothing? - When Examined Out of Context.” *J. Object Technol.* 15 (1): 2:1–25. .
Hönel, Sebastian. 2023a. “Metrics As Scores Dataset: Elisa Spectrophotometer Positive Samples.” Zenodo. .
———. 2023b. “Metrics As Scores Dataset: Metrics and Domains From the Qualitas.class Corpus.” Zenodo. .
———. 2023c. “Metrics As Scores Dataset: Price, Weight, and Other Properties of Over 1,200 Ideal-Cut and Best-Clarity Diamonds.” Zenodo. .
———. 2023d. “Metrics As Scores Dataset: The Iris Flower Data Set.” Zenodo. .
Hönel, Sebastian, Morgan Ericsson, Welf Löwe, and Anna Wingkvist. 2022. “Contextual Operationalization of Metrics As Scores: Is My Metric Value Good?” In *22nd IEEE International Conference on Software Quality, Reliability and Security, QRS 2022, Guangzhou, China, December 5–9, 2022*, 333–43. IEEE. .
Stephens, M. A. 1974. “EDF Statistics for Goodness of Fit and Some Comparisons.” *Journal of the American Statistical Association* 69 (347): 730–37. .
Tempero, Ewan D., Craig Anslow, Jens Dietrich, Ted Han, Jing Li, Markus Lumpe, Hayden Melton, and James Noble. 2010. “The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies.” In *17th Asia Pacific Software Engineering Conference, APSEC 2010, Sydney, Australia, November 30 - December 3, 2010*, edited by Jun Han and Tran Dan Thu, 336–45. IEEE Computer Society. .
Terra, Ricardo, Luis Fernando Miranda, Marco Tulio Valente, and Roberto da Silva Bigonha. 2013. “Qualitas.class corpus: a compiled version of the qualitas corpus.” *ACM SIGSOFT Software Engineering Notes* 38 (5): 1–4. .
Tukey, John W. 1949. “Comparing Individual Means in the Analysis of Variance.” *Biometrics* 5 (2): 99–114. .
Wickham, Hadley. 2016. *ggplot2: Elegant Graphics for Data Analysis*. Springer-Verlag New York. .

Owner

  • Name: Sebastian Hönel
  • Login: MrShoenel
  • Kind: user

Ph.D. student with the Linnaeus University Centre for Data Intensive Sciences and Applications (DISA), working currently in the DISTA research group.

JOSS Publication

Metrics As Scores: A Tool- and Analysis Suite and Interactive Application for Exploring Context-Dependent Distributions
Published
August 25, 2023
Volume 8, Issue 88, Page 4913
Authors
Sebastian Hönel ORCID
Department of Computer Science and Media Technology, Linnaeus University, Sweden
Morgan Ericsson ORCID
Department of Computer Science and Media Technology, Linnaeus University, Sweden
Welf Löwe ORCID
Department of Computer Science and Media Technology, Linnaeus University, Sweden
Anna Wingkvist ORCID
Department of Computer Science and Media Technology, Linnaeus University, Sweden
Editor
Mikkel Meyer Andersen ORCID
Tags
Multiple ANOVA Distribution fitting Inverse sampling Empirical distributions Kernel density estimation

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Hönel
  given-names: Sebastian
  orcid: "https://orcid.org/0000-0001-7937-1645"
- family-names: Ericsson
  given-names: Morgan
  orcid: "https://orcid.org/0000-0003-1173-5187"
- family-names: Löwe
  given-names: Welf
  orcid: "https://orcid.org/0000-0002-7565-3714"
- family-names: Wingkvist
  given-names: Anna
  orcid: "https://orcid.org/0000-0002-0835-823X"
contact:
- family-names: Hönel
  given-names: Sebastian
  orcid: "https://orcid.org/0000-0001-7937-1645"
doi: 10.5281/zenodo.8202326
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Hönel
    given-names: Sebastian
    orcid: "https://orcid.org/0000-0001-7937-1645"
  - family-names: Ericsson
    given-names: Morgan
    orcid: "https://orcid.org/0000-0003-1173-5187"
  - family-names: Löwe
    given-names: Welf
    orcid: "https://orcid.org/0000-0002-7565-3714"
  - family-names: Wingkvist
    given-names: Anna
    orcid: "https://orcid.org/0000-0002-0835-823X"
  date-published: 2023-08-25
  doi: 10.21105/joss.04913
  issn: 2475-9066
  issue: 88
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 4913
  title: "Metrics As Scores: A Tool- and Analysis Suite and Interactive
    Application for Exploring Context-Dependent Distributions"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.04913"
  volume: 8
title: "Metrics As Scores: A Tool- and Analysis Suite and Interactive
  Application for Exploring Context-Dependent Distributions"

GitHub Events

Total
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 1
  • Pull request event: 3
  • Create event: 2
Last Year
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 1
  • Pull request event: 3
  • Create event: 2

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 632
  • Total Committers: 3
  • Avg Commits per committer: 210.667
  • Development Distribution Score (DDS): 0.074
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Sebastian Hönel d****t@h****t 585
dependabot[bot] 4****] 26
Sebastian Hönel s****l@l****e 21
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 12
  • Total pull requests: 35
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 12 days
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 6.75
  • Average comments per pull request: 0.77
  • Merged pull requests: 26
  • Bot issues: 0
  • Bot pull requests: 35
Past Year
  • Issues: 0
  • Pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 29 days
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.4
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 5
Top Authors
Issue Authors
  • mdhaber (11)
  • kostiantyn-kucher (1)
Pull Request Authors
  • dependabot[bot] (52)
Top Labels
Issue Labels
Pull Request Labels
dependencies (52)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 31 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 16
  • Total maintainers: 1
pypi.org: metrics-as-scores

Interactive web application, tool- and analysis suite for approximating, exploring, understanding, and sampling from conditional distributions.

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 31 Last month
Rankings
Dependent packages count: 6.6%
Downloads: 8.4%
Average: 20.3%
Stargazers count: 25.5%
Forks count: 30.5%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 4 months ago

Dependencies

poetry.lock pypi
  • 156 dependencies
pyproject.toml pypi
  • StrEnum ^0.4.8
  • bokeh ^2.4.3
  • joblib ^1.2.0
  • jupyterlab ^3.4.7
  • matplotlib ^3.6.0
  • nptyping ^2.3.1
  • ptvsd ^4.3.2
  • pymoo ^0.6.0
  • python >=3.10, <3.12
  • scipy ^1.9.1
  • sklearn ^0.0
  • statsmodels ^0.13.2
  • tqdm ^4.64.1
.github/workflows/draft-pdf.yml actions
  • actions/checkout v2 composite
  • actions/upload-artifact v1 composite
  • openjournals/openjournals-draft-action master composite
.github/workflows/build-doc.yml_ actions
  • actions/checkout v2.3.4 composite
  • actions/setup-python v4.5.0 composite
  • actions/upload-artifact v1 composite
  • ad-m/github-push-action master composite
.github/workflows/codecov.yml actions
  • actions/checkout v2.3.4 composite
  • actions/setup-python v4.5.0 composite
  • codecov/codecov-action v3 composite
.github/workflows/static.yml actions
  • actions/checkout v3 composite
  • actions/configure-pages v3 composite
  • actions/deploy-pages v1 composite
  • actions/upload-pages-artifact v1 composite