Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (20.1%) to scientific vocabulary
Keywords
distributional-data
multiple-instance-learning
ordinal
r
svm
weakly-supervised-learning
Last synced: 6 months ago
·
JSON representation
Repository
Multiple Instance Learning with Distributions, SVM
Basic Info
Statistics
- Stars: 3
- Watchers: 0
- Forks: 0
- Open Issues: 18
- Releases: 6
Topics
distributional-data
multiple-instance-learning
ordinal
r
svm
weakly-supervised-learning
Created over 5 years ago
· Last pushed 6 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# mildsvm
[](https://CRAN.R-project.org/package=mildsvm)
[](https://github.com/skent259/mildsvm/actions)
[](https://app.codecov.io/gh/skent259/mildsvm?branch=master)
Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The `mildsvm` package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames.
The `mildsvm` package implements methods that cover a variety of data types, including:
- ordinal and binary labels
- weakly supervised and traditional supervised structures
- vector-based and distributional-instance rows of data
A full table of functions with references is available [below](#methods-implemented). We highlight two methods based on recent research:
- `omisvm()` runs a novel OMI-SVM approach for ordinal, multiple instance (weakly supervised) data using the work of Kent and Yu (2022+)
- `mismm()` run the MISMM approach for binary, weakly supervised data where the instances can be thought of as a matrix of draws from a distribution. This non-convex SVM approach is formalized and applied to breast cancer diagnosis based on morphological features of the tumor microenvironment in [Kent and Yu (2022)][p2].
## Usage
A typical MI data frame (a `mi_df`) with ordinal labels might look like this, with multiple rows of information for each of the `bag_name`s involved and a label that matches each bag:
```{r ordmvnorm}
library(mildsvm)
data("ordmvnorm")
print(ordmvnorm)
# dplyr::distinct(ordmvnorm, bag_label, bag_name)
```
The `mildsvm` package uses the familiar formula and predict methods that R uses will be familiar with. To indicate that MI data is involved, we specify the unique bag label and bag name with `mi(bag_label, bag_name) ~ predictors`:
```{r ord-example}
fit <- omisvm(mi(bag_label, bag_name) ~ V1 + V2 + V3,
data = ordmvnorm,
weights = NULL)
print(fit)
predict(fit, new_data = ordmvnorm)
```
Or, if the data frame has the `mi_df` class, we can directly pass it to the function and all features will be included:
```{r ord-example-2}
fit2 <- omisvm(ordmvnorm)
print(fit2)
```
## Installation
You can install the released version of mildsvm from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("mildsvm")
```
Alternatively, you can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("skent259/mildsvm")
```
## Additional Usage
`mildsvm` also works well MI data with distributional instances. There is a 3-level structure with *bags*, *instances*, and *samples*. As in MIL, *instances* are contained within *bags* (where we only observe the bag label). However, for MILD, each instance represents a distribution, and the *samples* are drawn from this distribution.
You can generate MILD data with `generate_mild_df()`:
```{r generate_mild_df}
# Normal(mean=0, sd=1) vs Normal(mean=3, sd=1)
set.seed(4)
mild_df <- generate_mild_df(
ncov = 1, nimp_pos = 1, nimp_neg = 1,
positive_dist = "mvnormal", positive_mean = 3,
negative_dist = "mvnormal", negative_mean = 0,
nbag = 4,
ninst = 2,
nsample = 2
)
print(mild_df)
```
You can train a MISVM classifier using `mismm()` on the MILD data with the `mild()` formula specification:
```{r message = FALSE}
fit3 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1, data = mild_df, cost = 100)
# summarize predictions at the bag layer
library(dplyr)
mild_df %>%
dplyr::bind_cols(predict(fit3, mild_df, type = "raw")) %>%
dplyr::bind_cols(predict(fit3, mild_df, type = "class")) %>%
dplyr::distinct(bag_label, bag_name, .pred, .pred_class)
```
If you summarize a MILD data set (for example, by taking the mean of each covariate), you can recover a MIL data set. Use `summarize_samples()` for this:
```{r summarize_samples}
mil_df <- summarize_samples(mild_df, .fns = list(mean = mean))
print(mil_df)
```
You can train an MI-SVM classifier using `misvm()` on MIL data with the helper function `mi()`:
```{r, message = FALSE, warning=FALSE}
fit4 <- misvm(mi(bag_label, bag_name) ~ mean, data = mil_df, cost = 100)
print(fit4)
```
### Methods implemented
| Function | Method | Outcome/label | Data type | Extra libraries | Reference |
|-----------------|------------------|---------------|-----------------------|-----------------|-----------|
| `omisvm()` | `"qp-heuristic"` | ordinal | MI | gurobi | [1] |
| `mismm()` | `"heuristic"` | binary | distributional MI | --- | [2] |
| `mismm()` | `"mip"` | binary | distributional MI | gurobi | [2] |
| `mismm()` | `"qp-heuristic"` | binary | distributional MI | gurobi | [2] |
| `misvm()` | `"heuristic"` | binary | MI | --- | [3] |
| `misvm()` | `"mip"` | binary | MI | gurobi | [3], [2] |
| `misvm()` | `"qp-heuristic"` | binary | MI | gurobi | [3] |
| `mior()` | `"qp-heuristic"` | ordinal | MI | gurobi | [4] |
| `misvm_orova()` | `"heuristic"` | ordinal | MI | --- | [3], [1] |
| `misvm_orova()` | `"mip"` | ordinal | MI | gurobi | [3], [1] |
| `misvm_orova()` | `"qp-heuristic"` | ordinal | MI | gurobi | [3], [1] |
| `svor_exc()` | `"smo"` | ordinal | vector | --- | [5] |
| `smm()` | --- | binary | distributional vector | --- | [6] |
#### Table acronyms
- MI: multiple instance
- SVM: support vector machine
- SMM: support measure machine
- OR: ordinal regression
- OVA: one-vs-all
- MIP: mixed integer programming
- QP: quadratic programming
- SVOR: support vector ordinal regression
- EXC: explicit constraints
- SMO: sequential minimal optimization
### References
[1] Kent, S., & Yu, M. (2022+). Ordinal multiple instance support vector machines. *In prep.*
[2] [Kent, S., & Yu, M. (2022)][p2]. Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment. *arXiv preprint arXiv:2206.14704.*
[3] Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. *Advances in neural information processing systems, 15.*
[4] Xiao, Y., Liu, B., & Hao, Z. (2017). Multiple-instance ordinal regression. *IEEE Transactions on Neural Networks and Learning Systems*, *29*(9), 4398-4413.
[5] Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. *Neural computation*, *19*(3), 792-815.
[6] Muandet, K., Fukumizu, K., Dinuzzo, F., & Schölkopf, B. (2012). Learning from distributions via support measure machines. *Advances in neural information processing systems*, *25*.
[p2]: https://arxiv.org/abs/2206.14704
Owner
- Name: Sean Kent
- Login: skent259
- Kind: user
- Website: http://pages.stat.wisc.edu/~kent/
- Repositories: 6
- Profile: https://github.com/skent259
Ph.D. student in Statistics at UW-Madison
GitHub Events
Total
- Issues event: 1
- Issue comment event: 4
- Push event: 6
- Pull request review event: 2
- Pull request review comment event: 2
- Pull request event: 2
Last Year
- Issues event: 1
- Issue comment event: 4
- Push event: 6
- Pull request review event: 2
- Pull request review comment event: 2
- Pull request event: 2
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 162
- Total Committers: 3
- Avg Commits per committer: 54.0
- Development Distribution Score (DDS): 0.179
Top Committers
| Name | Commits | |
|---|---|---|
| Sean Kent | s****9@g****m | 133 |
| Sean Kent | 4****9@u****m | 23 |
| Sean Kent | s****t@s****n | 6 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 49
- Total pull requests: 26
- Average time to close issues: 5 months
- Average time to close pull requests: about 1 hour
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.82
- Average comments per pull request: 0.12
- Merged pull requests: 26
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: 22 days
- Average time to close pull requests: about 2 hours
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- skent259 (49)
Pull Request Authors
- skent259 (26)
Top Labels
Issue Labels
enhancement (21)
bug (5)
documentation (3)
question (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 239 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
cran.r-project.org: mildsvm
Multiple-Instance Learning with Support Vector Machines
- Homepage: https://github.com/skent259/mildsvm
- Documentation: http://cran.r-project.org/web/packages/mildsvm/mildsvm.pdf
- License: MIT + file LICENSE
-
Latest release: 0.4.1
published 6 months ago
Rankings
Stargazers count: 28.5%
Forks count: 28.8%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 41.2%
Downloads: 83.5%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.5.0 depends
- dplyr * imports
- e1071 * imports
- kernlab * imports
- magrittr * imports
- mvtnorm * imports
- pROC * imports
- pillar * imports
- purrr * imports
- rlang * imports
- stats * imports
- tibble * imports
- tidyr * imports
- utils * imports
- Matrix * suggests
- covr * suggests
- gurobi * suggests
- testthat * suggests
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v4 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v1 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/r.yml
actions
- actions/checkout v3 composite
- r-lib/actions/setup-r f57f1301a053485946083d7a45022b278929a78a composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite