mildsvm

Multiple Instance Learning with Distributions, SVM

https://github.com/skent259/mildsvm

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (20.1%) to scientific vocabulary

Keywords

distributional-data multiple-instance-learning ordinal r svm weakly-supervised-learning

Last synced: 6 months ago · JSON representation

Repository

Multiple Instance Learning with Distributions, SVM

Basic Info

Host: GitHub
Owner: skent259
License: other
Language: R
Default Branch: main
Homepage:
Size: 886 KB

Statistics

Stars: 3
Watchers: 0
Forks: 0
Open Issues: 18
Releases: 6

Topics

distributional-data multiple-instance-learning ordinal r svm weakly-supervised-learning

Created over 5 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# mildsvm


[![CRAN status](https://www.r-pkg.org/badges/version/mildsvm)](https://CRAN.R-project.org/package=mildsvm)
[![R-CMD-check](https://github.com/skent259/mildsvm/workflows/R-CMD-check/badge.svg)](https://github.com/skent259/mildsvm/actions)
[![Codecov test coverage](https://codecov.io/gh/skent259/mildsvm/branch/master/graph/badge.svg)](https://app.codecov.io/gh/skent259/mildsvm?branch=master)


Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The `mildsvm` package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames. 

The `mildsvm` package implements methods that cover a variety of data types, including:

- ordinal and binary labels
- weakly supervised and traditional supervised structures 
- vector-based and distributional-instance rows of data 

A full table of functions with references is available [below](#methods-implemented). We highlight two methods based on recent research: 

- `omisvm()` runs a novel OMI-SVM approach for ordinal, multiple instance (weakly supervised) data using the work of Kent and Yu (2022+)
- `mismm()` run the MISMM approach for binary, weakly supervised data where the instances can be thought of as a matrix of draws from a distribution. This non-convex SVM approach is formalized and applied to breast cancer diagnosis based on morphological features of the tumor microenvironment in [Kent and Yu (2022)][p2].

## Usage

A typical MI data frame (a `mi_df`) with ordinal labels might look like this, with multiple rows of information for each of the `bag_name`s involved and a label that matches each bag: 

```{r ordmvnorm}
library(mildsvm)
data("ordmvnorm")

print(ordmvnorm)
# dplyr::distinct(ordmvnorm, bag_label, bag_name)
```


The `mildsvm` package uses the familiar formula and predict methods that R uses will be familiar with. To indicate that MI data is involved, we specify the unique bag label and bag name with `mi(bag_label, bag_name) ~ predictors`:  

```{r ord-example}
fit <- omisvm(mi(bag_label, bag_name) ~ V1 + V2 + V3,
              data = ordmvnorm, 
              weights = NULL)
print(fit)
predict(fit, new_data = ordmvnorm)
```

Or, if the data frame has the `mi_df` class, we can directly pass it to the function and all features will be included:

```{r ord-example-2}
fit2 <- omisvm(ordmvnorm)
print(fit2)
```


## Installation

You can install the released version of mildsvm from [CRAN](https://CRAN.R-project.org) with:

``` r
install.packages("mildsvm")
```

Alternatively, you can install the development version from [GitHub](https://github.com/) with:

``` r
# install.packages("devtools")
devtools::install_github("skent259/mildsvm")
```

## Additional Usage

`mildsvm` also works well MI data with distributional instances. There is a 3-level structure with *bags*, *instances*, and *samples*.  As in MIL, *instances* are contained within *bags* (where we only observe the bag label).  However, for MILD, each instance represents a distribution, and the *samples* are drawn from this distribution.  

You can generate MILD data with `generate_mild_df()`:

```{r generate_mild_df}
# Normal(mean=0, sd=1) vs Normal(mean=3, sd=1)
set.seed(4)
mild_df <- generate_mild_df(
  ncov = 1, nimp_pos = 1, nimp_neg = 1, 
  positive_dist = "mvnormal", positive_mean = 3,
  negative_dist = "mvnormal", negative_mean = 0, 
  nbag = 4,
  ninst = 2, 
  nsample = 2
)
print(mild_df)
```

You can train a MISVM classifier using `mismm()` on the MILD data with the `mild()` formula specification:

```{r message = FALSE}
fit3 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1, data = mild_df, cost = 100)

# summarize predictions at the bag layer
library(dplyr)
mild_df %>% 
  dplyr::bind_cols(predict(fit3, mild_df, type = "raw")) %>% 
  dplyr::bind_cols(predict(fit3, mild_df, type = "class")) %>% 
  dplyr::distinct(bag_label, bag_name, .pred, .pred_class)
```

If you summarize a MILD data set (for example, by taking the mean of each covariate), you can recover a MIL data set.  Use `summarize_samples()` for this:

```{r summarize_samples}
mil_df <- summarize_samples(mild_df, .fns = list(mean = mean)) 
print(mil_df)
```

You can train an MI-SVM classifier using `misvm()` on MIL data with the helper function `mi()`:

```{r, message = FALSE, warning=FALSE}
fit4 <- misvm(mi(bag_label, bag_name) ~ mean, data = mil_df, cost = 100)

print(fit4)
```




### Methods implemented

| Function        | Method           | Outcome/label | Data type             | Extra libraries | Reference |
|-----------------|------------------|---------------|-----------------------|-----------------|-----------|
| `omisvm()`      | `"qp-heuristic"` | ordinal       | MI                    | gurobi          | [1]       |
| `mismm()`       | `"heuristic"`    | binary        | distributional MI     | ---             | [2]       |
| `mismm()`       | `"mip"`          | binary        | distributional MI     | gurobi          | [2]       |
| `mismm()`       | `"qp-heuristic"` | binary        | distributional MI     | gurobi          | [2]       |
| `misvm()`       | `"heuristic"`    | binary        | MI                    | ---             | [3]       |
| `misvm()`       | `"mip"`          | binary        | MI                    | gurobi          | [3], [2]  |
| `misvm()`       | `"qp-heuristic"` | binary        | MI                    | gurobi          | [3]       |
| `mior()`        | `"qp-heuristic"` | ordinal       | MI                    | gurobi          | [4]       |
| `misvm_orova()` | `"heuristic"`    | ordinal       | MI                    | ---             | [3], [1]  |
| `misvm_orova()` | `"mip"`          | ordinal       | MI                    | gurobi          | [3], [1]  |
| `misvm_orova()` | `"qp-heuristic"` | ordinal       | MI                    | gurobi          | [3], [1]  |
| `svor_exc()`    | `"smo"`          | ordinal       | vector                | ---             | [5]       |
| `smm()`         | ---              | binary        | distributional vector | ---             | [6]       |

#### Table acronyms

- MI: multiple instance
- SVM: support vector machine
- SMM: support measure machine
- OR: ordinal regression
- OVA: one-vs-all
- MIP: mixed integer programming
- QP: quadratic programming
- SVOR: support vector ordinal regression
- EXC: explicit constraints
- SMO: sequential minimal optimization

### References 

[1] Kent, S., & Yu, M. (2022+). Ordinal multiple instance support vector machines. *In prep.*

[2] [Kent, S., & Yu, M. (2022)][p2]. Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment. *arXiv preprint arXiv:2206.14704.*

[3] Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. *Advances in neural information processing systems, 15.*

[4] Xiao, Y., Liu, B., & Hao, Z. (2017). Multiple-instance ordinal regression. *IEEE Transactions on Neural Networks and Learning Systems*, *29*(9), 4398-4413.

[5] Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. *Neural computation*, *19*(3), 792-815.

[6] Muandet, K., Fukumizu, K., Dinuzzo, F., & Schölkopf, B. (2012). Learning from distributions via support measure machines. *Advances in neural information processing systems*, *25*.


[p2]: https://arxiv.org/abs/2206.14704

Owner

Name: Sean Kent
Login: skent259
Kind: user

Website: http://pages.stat.wisc.edu/~kent/
Repositories: 6
Profile: https://github.com/skent259

Ph.D. student in Statistics at UW-Madison

GitHub Events

Total

Issues event: 1
Issue comment event: 4
Push event: 6
Pull request review event: 2
Pull request review comment event: 2
Pull request event: 2

Last Year

Issues event: 1
Issue comment event: 4
Push event: 6
Pull request review event: 2
Pull request review comment event: 2
Pull request event: 2

Committers

Last synced: almost 3 years ago

All Time

Total Commits: 162
Total Committers: 3
Avg Commits per committer: 54.0
Development Distribution Score (DDS): 0.179

Top Committers

Name	Email	Commits
Sean Kent	s**9@g**m	133
Sean Kent	4**9@u**m	23
Sean Kent	s**t@s**n	6

Committer Domains (Top 20 + Academic)

seans-mbp.lan: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 49
Total pull requests: 26
Average time to close issues: 5 months
Average time to close pull requests: about 1 hour
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.82
Average comments per pull request: 0.12
Merged pull requests: 26
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 1
Average time to close issues: 22 days
Average time to close pull requests: about 2 hours
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 1.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

skent259 (49)

Pull Request Authors

skent259 (26)

Top Labels

Issue Labels

enhancement (21) bug (5) documentation (3) question (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 239 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

cran.r-project.org: mildsvm

Multiple-Instance Learning with Support Vector Machines

Homepage: https://github.com/skent259/mildsvm
Documentation: http://cran.r-project.org/web/packages/mildsvm/mildsvm.pdf
License: MIT + file LICENSE
Latest release: 0.4.1
published 6 months ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 239 Last month

Rankings

Stargazers count: 28.5%

Forks count: 28.8%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Average: 41.2%

Downloads: 83.5%

Maintainers (1)

skent259@gmail.com

Last synced: 6 months ago

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
dplyr * imports
e1071 * imports
kernlab * imports
magrittr * imports
mvtnorm * imports
pROC * imports
pillar * imports
purrr * imports
rlang * imports
stats * imports
tibble * imports
tidyr * imports
utils * imports
Matrix * suggests
covr * suggests
gurobi * suggests
testthat * suggests

.github/workflows/R-CMD-check.yaml actions

actions/checkout v4 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v1 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/r.yml actions

actions/checkout v3 composite
r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite

.github/workflows/test-coverage.yaml actions

actions/checkout v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

mildsvm

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: mildsvm

Rankings

Maintainers (1)

Dependencies