caspr

CASPR is a deep learning framework applying transformer architecture to learn and predict from tabular data at scale.

https://github.com/microsoft/caspr

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

attention-mechanism business deep-learning tabular-data transformer transformer-architecture transformer-encoder

Keywords from Contributors

large-language-model
Last synced: 6 months ago · JSON representation ·

Repository

CASPR is a deep learning framework applying transformer architecture to learn and predict from tabular data at scale.

Basic Info
  • Host: GitHub
  • Owner: microsoft
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.47 MB
Statistics
  • Stars: 38
  • Watchers: 7
  • Forks: 3
  • Open Issues: 3
  • Releases: 0
Topics
attention-mechanism business deep-learning tabular-data transformer transformer-architecture transformer-encoder
Created over 3 years ago · Last pushed about 3 years ago
Metadata Files
Readme Contributing License Code of conduct Citation Security Support

README.md

CASPR is a transformer-based framework for deep learning from sequential data in tabular format, most common in business applications.

Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering however adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. With **CASPR** we propose a novel approach to encode sequential data in tabular format (e.g., customer transactions, purchase history and other interactions) into a generic representation of a subject's (e.g., customer's) association with the business. We evaluate these embeddings as features to train multiple models spanning a variety of applications (see: [paper](https://arxiv.org/abs/2211.09174)). CASPR, Customer Activity Sequence-based Prediction and Representation, applies transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.

Getting Started & Resources

  • CASPR: Customer Activity Sequence-based Prediction and Representation (NeurIPS 2022, New Orleans: Tabular Representation Learning)

  • Build

    • pre-requisites: python==3.9, setuptools
    • building the wheel: python setup.py build bdist_wheel
  • Installation

``` (now) pip install .\dist\AI.Models.CASPR-.whl[]

(future) pip install AI.Models.CASPR[] ```

use any of below modifiers, to customize the installation for target system / usecase: horovod - for distributed training and inference on Horovod databricks - for distributed training and inference on Databricks aml - for (distributed) training and inference on Azure ML hdi - for execution on Azure HD Insights xai - to enable explainability test - for extended test execution dev - for development purposes only * Examples

(TODO: can we point to a well commented one of our examples w/ or w/o data?)

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports please file a GitHub Issue.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

License

This project is licensed under the MIT License.


Owner

  • Name: Microsoft
  • Login: microsoft
  • Kind: organization
  • Email: opensource@microsoft.com
  • Location: Redmond, WA

Open source projects and samples from Microsoft

Citation (CITATION.cff)

cff-version: 1.2.0
title: CASPR
message: "Please use this information to cite CASPR in
  research or other publications."
authors:
  - given-names: Pin-Jung
    family-names: Chen
    email: pinjung.chen@microsoft.com
    affiliation: Microsoft Corporation
  - given-names: Sahil
    family-names: Bhatnagar
    email: sahil.bhatnagar@microsoft.com
    affiliation: Microsoft Corporation
  - given-names: Damian Konrad
    family-names: Kowalczyk
    email: damian.kowalczyk@microsoft.com
    affiliation: Microsoft Corporation
  - given-names: Mayank
    family-names: Shrivastava
    email: mayank.shrivastava@microsoft.com
    affiliation: Microsoft Corporation
  - given-names: Sagar
    family-names: Goyal
    email: goyalsagar@outlook.com

date-released: 2022-11-16
repository-code: "https://github.com/microsoft/CASPR"
license: "MIT"
keywords:
  - deep learning
  - machine learning
  - tabular data
  
version: 0.2.6
doi: 10.48550/arXiv.2211.09174
references:
  - type: article
    authors:
      - given-names: Pin-Jung
        family-names: Chen
        email: pinjung.chen@microsoft.com
        affiliation: Microsoft Corporation
      - given-names: Sahil
        family-names: Bhatnagar
        email: sahil.bhatnagar@microsoft.com
        affiliation: Microsoft Corporation
      - given-names: Damian Konrad
        family-names: Kowalczyk
        email: damian.kowalczyk@microsoft.com
        affiliation: Microsoft Corporation
      - given-names: Mayank
        family-names: Shrivastava
        email: mayank.shrivastava@microsoft.com
        affiliation: Microsoft Corporation
      - given-names: Sagar
        family-names: Goyal
        email: goyalsagar@outlook.com
    title: "CASPR: Customer Activity Sequence-based Prediction and Representation"
    year: 2022
    journal: ArXiv
    doi: 10.48550/arXiv.2211.09174
    url: https://arxiv.org/abs/2211.09174

abstract: >-
  Tasks critical to enterprise profitability, such as customer churn prediction, fraudulent account detection or customer lifetime value estimation, are often tackled by models trained on features engineered from customer data in tabular format. Application-specific feature engineering adds development, operationalization and maintenance costs over time. Recent advances in representation learning present an opportunity to simplify and generalize feature engineering across applications. When applying these advancements to tabular data researchers deal with data heterogeneity, variations in customer engagement history or the sheer volume of enterprise datasets. In this paper, we propose a novel approach to encode tabular data containing customer transactions, purchase history and other interactions into a generic representation of a customer's association with the business. We then evaluate these embeddings as features to train multiple models spanning a variety of applications. CASPR, Customer Activity Sequence-based Prediction and Representation, applies Transformer architecture to encode activity sequences to improve model performance and avoid bespoke feature engineering across applications. Our experiments at scale validate CASPR for both small and large enterprise applications.

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 14
  • Total Committers: 6
  • Avg Commits per committer: 2.333
  • Development Distribution Score (DDS): 0.643
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Microsoft Open Source m****e 5
Sahil Bhatnagar s****2@g****m 3
Damian Kowalczyk n****k 3
Damian Kowalczyk d****c@m****m 1
sabhatn s****n@m****m 1
microsoft-github-operations[bot] 5****] 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 3
  • Total pull requests: 3
  • Average time to close issues: N/A
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 1.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • zhouzhongmi (1)
  • rjr89 (1)
  • ananv21 (1)
Pull Request Authors
  • nightflight-dk (2)
  • cruck12 (1)
Top Labels
Issue Labels
Pull Request Labels
documentation (1)

Dependencies

setup.py pypi