Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
Statistics
  • Stars: 2
  • Watchers: 2
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created about 9 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Codemeta

README.Rmd

---
output: github_document
---



```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```
# predictSource


The package predictSource provides functions to verify that data can be used to separate sources of samples, to predict the sources of additional samples, and to create plots that evaluate the validity of the predictions.  Data can be both quantitative and qualitative.  A proposed analysis strategy is to use random forests to evaluate whether the data can separate the sources and to identify the most important predictors if there are many, use a classification tree to understand how the data are used to separate sources, use random forests to predict the sources of unknown samples, and then evaluate the validity of the predictions by ploting the first two principal components of the unknowns with the convex hulls of the known sources.  The random forest analysis also produces the probabilities of assignment to each source for each sample; this can be helpful in identifying unknowns that are difficult to classify.The package also contains functions for exploratory data analysis (descriptive statistics, 2- and 3-dimensional plots [the latter can be rotated], tests for 1- and 2-dimensional Gaussian distributions [helpful in identifying outliers]) and multivariate analysis (principal components). A detailed vignette provides examples for the use of each function (using obsidian data in the examples) and some background for classification trees, random forests, and checking for Gaussian distributions.

The motivation for the package was predicting the sources of obsidian artifacts. Archaeologists and geochemists usually do this using 2- and 3-dimensional scatterplots. The functions in this package make predictions much faster than can be done with scatterplots; the principal components graphic identifies objects that are or may be misclassified. Archaeological knowledge should also be used in making predictions.

The figure below shows principal components plots using data sets with the composition of five elements from five obsidian sources and the predicted sources of 91 artifacts, with predictions made from scatterplots (see the vignette for information about these data sets). The left-hand plot shows the convex hulls of the first two principal components from the source data.  The second plot shows the locations of artifacts that are outside of their respective predicted source convex hulls.  That plot clearly identifies one misclassified artifact (predicted to be from source D but inside the convex hull for source C); the remaining artifacts appear to be correctly classified.  For these data, the random forests predictions appear to be correct for all of the artifacts.

```{r  results='hide', message=FALSE, fig.keep=3, fig.cap="Principal components plot with Jemez obsidian source convex hulls and obsidian artifacts with points outside the convex hull labeled with source predictions based on scatterplots."}
library(predictSource)
data(ObsidianSources)
data(ObsidianArtifacts)
analyticVars <- c("Rb", "Sr", "Y", "Zr", "Nb")
sources <- unique(ObsidianSources[, "Code"])
pcaEval <-
  ps_pcaEvaluation(
    SourceData = ObsidianSources,
    unknownData = ObsidianArtifacts,
    SourceGroup = "Code",
    unknownGroup = "Code",
    known_sources = sources,
    predicted_sources = sources,
    AnalyticVars = analyticVars,
    ID = "ID",
    plotAllPoints = TRUE,
    plotHullsOutsidePoints = TRUE,
    plotOutsidePoints = TRUE
  )

```

The figure below is from a random forests analysis of the artifacts. The figure contains box plots of the source assignment probabilities for each artifact, excluding the probabilities of assignment to the predicted source.  This plot identifies the artifacts for which assignment is most difficult. Source C is potentially of most concern. The user can create a data frame with information on artifacts that may be most likely to be misclassified. See the vignette for more details.   

```{r  results='hide',message=FALSE, fig.cap="Figure 7.3b: Box plots of the estimated probabilities of sources other than the predicted sources for the obsidian artifacts.", fig.keep='last'}
library(predictSource)
data(ObsidianSources)
data(ObsidianArtifacts)
analyticVars <- c("Rb", "Sr", "Y", "Zr", "Nb")
saveRandomForest <-
  ps_randomForest(
    data = ObsidianSources,
    GroupVar = "Code",
    Groups = "All",
    AnalyticVars = analyticVars,
    NvarUsed = 3,
    plotErrorRate = FALSE,
    plotImportance = FALSE,
    predictSources = TRUE,
    predictData = ObsidianArtifacts,
    plotSourceProbs = TRUE
  )

```


## Installation

You can install predictSource from GitHub with:

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("benmarwick/predictSource")
```

Owner

  • Name: Ben Marwick
  • Login: benmarwick
  • Kind: user
  • Location: Seattle
  • Company: University of Washington

CodeMeta (codemeta.json)

{
  "@context": [
    "https://doi.org/10.5063/schema/codemeta-2.0",
    "http://schema.org"
  ],
  "@type": "SoftwareSourceCode",
  "identifier": "predictSource",
  "description": "A package to analyze data used to determine whether samples from multiple sources\n    can be separated, to predict the sources of samples from unknown\n    sources, and to evaluation the validity of those predictions. Sample data can include both\n    quantitative and qualitative data. The package includes functions for creating an analysis file\n    from multiple files and exploratory data analysis, as well as multivariate statistical methods to\n    determine source separation, predict sources of unknown samples, and evaluate the validity of the\n    predictions (principal components, classification trees, and random forests).",
  "name": "predictSource: Compositional Data Analysis of Archaeological Artefacts",
  "license": "https://spdx.org/licenses/MIT",
  "version": "0.1.0",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "R",
    "version": "3.6.2",
    "url": "https://r-project.org"
  },
  "runtimePlatform": "R version 3.6.2 (2019-12-12)",
  "author": [
    {
      "@type": "Person",
      "givenName": "Ben",
      "familyName": "Marwick",
      "email": "benmarwick@gmail.com"
    },
    {
      "@type": "Person",
      "givenName": "John",
      "familyName": "Karon",
      "email": "sierrastew@mindspring.com"
    },
    {
      "@type": "Person",
      "givenName": "Steven",
      "familyName": "Shakley",
      "email": "shackley@berkeley.edu"
    }
  ],
  "contributor": {},
  "copyrightHolder": {},
  "funder": {},
  "maintainer": [
    {
      "@type": "Person",
      "givenName": "Ben",
      "familyName": "Marwick",
      "email": "benmarwick@gmail.com"
    }
  ],
  "softwareSuggestions": [
    {
      "@type": "SoftwareApplication",
      "identifier": "knitr",
      "name": "knitr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=knitr"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rmarkdown",
      "name": "rmarkdown",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rmarkdown"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "testthat",
      "name": "testthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=testthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "here",
      "name": "here",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=here"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "kableExtra",
      "name": "kableExtra",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=kableExtra"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "magrittr",
      "name": "magrittr",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=magrittr"
    }
  ],
  "softwareRequirements": [
    {
      "@type": "SoftwareApplication",
      "identifier": "MASS",
      "name": "MASS",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=MASS"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "MVN",
      "name": "MVN",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=MVN"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "mgcv",
      "name": "mgcv",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=mgcv"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "scatterplot3d",
      "name": "scatterplot3d",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=scatterplot3d"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rgl",
      "name": "rgl",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rgl"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "randomForest",
      "name": "randomForest",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=randomForest"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "rpart",
      "name": "rpart",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=rpart"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "partykit",
      "name": "partykit",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=partykit"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "Formula",
      "name": "Formula",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=Formula"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "ellipse",
      "name": "ellipse",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=ellipse"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "devtools",
      "name": "devtools",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=devtools"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "qqtest",
      "name": "qqtest",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=qqtest"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "corrplot",
      "name": "corrplot",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=corrplot"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "missForest",
      "name": "missForest",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=missForest"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "nortest",
      "name": "nortest",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=nortest"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "assertthat",
      "name": "assertthat",
      "provider": {
        "@id": "https://cran.r-project.org",
        "@type": "Organization",
        "name": "Comprehensive R Archive Network (CRAN)",
        "url": "https://cran.r-project.org"
      },
      "sameAs": "https://CRAN.R-project.org/package=assertthat"
    },
    {
      "@type": "SoftwareApplication",
      "identifier": "R",
      "name": "R",
      "version": ">= 2.10"
    }
  ],
  "codeRepository": "https://github.com/benmarwick/predictSource",
  "readme": "https://github.com/benmarwick/predictSource/blob/master/README.md",
  "fileSize": "4311.306KB",
  "contIntegration": "https://travis-ci.org/benmarwick/predictSource"
}

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 297
  • Total Committers: 3
  • Avg Commits per committer: 99.0
  • Development Distribution Score (DDS): 0.135
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
John Karon s****w@m****m 257
Ben Marwick b****k@h****m 27
Ben Marwick b****k@g****m 13
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • benmarwick (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 2.10 depends
  • Formula * imports
  • MASS * imports
  • MVN * imports
  • assertthat * imports
  • corrplot * imports
  • devtools * imports
  • ellipse * imports
  • mgcv * imports
  • missForest * imports
  • nortest * imports
  • partykit * imports
  • qqtest * imports
  • randomForest * imports
  • rgl * imports
  • rpart * imports
  • scatterplot3d * imports
  • here * suggests
  • kableExtra * suggests
  • knitr * suggests
  • magrittr * suggests
  • rmarkdown * suggests
  • testthat * suggests
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite