Recent Releases of obp

obp - 0.5.5

Updates

  • Add some advanced off-policy gradient estimators (https://github.com/st-tech/zr-obp/pull/167)
  • Automatic candidate hyperparamer sorting for slope (https://github.com/st-tech/zr-obp/pull/168)
  • Fixing the the error checking about "pea" in obp.ope.OffPolicyEvaluation (https://github.com/st-tech/zr-obp/pull/169)
  • Fixing the expected reward factual in the independent reward structure (https://github.com/st-tech/zr-obp/pull/170)
  • Allowing slope to use the true marginal importance weight for mips (https://github.com/st-tech/zr-obp/pull/172)

References

  • Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." 2022.
  • Thorsten Joachims, Adith Swaminathan, and Maarten de Rijke. "Deep Learning for Logged Bandit Feedback.", 2018.
  • Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik.Doubly Robust Off-Policy Evaluation with Shrinkage.", 2020.
  • Alberto Maria Metelli, Alessio Russo, and Marcello Restelli. "Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning.", 2021.

- Python
Published by usaito over 3 years ago

obp - 0.5.3

Updates

  • Implement a synthetic data generator class for OPE with action embeddings (obp.dataset.SyntheticBanditDatasetWithActionEmbeds) and an estimator leveraging the action embeddings (obp.ope.MarginalziedInverseProbabilityWeighting) (https://github.com/st-tech/zr-obp/pull/155 )
  • Implement several OPE estimators for the multiple logger setting (https://github.com/st-tech/zr-obp/pull/154 )

References

  • Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." arXiv2022.
  • Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims. "Effective Evaluation using Logged Bandit Feedback from Multiple Loggers.", KDD2018.
  • Nathan Kallus, Yuta Saito, and Masatoshi Uehara. "Optimal Off-Policy Evaluation from Multiple Logging Policies.", ICML2021.

- Python
Published by usaito almost 4 years ago

obp - v0.5.2

Updates

  • Implement obp.policy.QLearner (https://github.com/st-tech/zr-obp/pull/144 )
  • Implement Balanced IPW Estimator as obp.ope.BalancedInverseProbabilityWeighting. See Sondhi et al.(2020) for details. (https://github.com/st-tech/zr-obp/pull/146 ).
  • Implement the Cascade Doubly Robust estimator for the combinatorial action OPE as obp.ope.CascadeDR. See Kiyohara et al.(2022) for details. (https://github.com/st-tech/zr-obp/pull/142 )
  • Implement a data-driven hyperparameter tuning method for OPE called SLOPE proposed by Su et al.(2020) and Tucker et al.(2021) (https://github.com/st-tech/zr-obp/pull/148 )
  • Implement new estimators for the standard OPE based on a power-mean transformation of importance weights proposed by Metelli et al.(2021) (https://github.com/st-tech/zr-obp/pull/149 )
  • Implement dataset class for generating synthetic logged bandit data with multiple loggers. Corresponding estimators will be added in the next update (https://github.com/st-tech/zr-obp/pull/150 )
  • Implement an argument to control the number of deficient actions in obp.dataset.SyntheticBanditDataset and obp.dataset.MultiClassToBanditReduction. See Sachdeva et al.(2020) for details. (https://github.com/st-tech/zr-obp/pull/150 )
  • Implement some flexible functions to synthesize reward function and behavior policy (https://github.com/st-tech/zr-obp/pull/145 )

Minors

  • Adjust to sklearn>=1.0.0
  • Fix/update error messages, docstrings, and examples.

References

  • Haruka Kiyohara, Yuta Saito, Tatsuya Matsuhiro, Yusuke Narita, Nobuyuki Shimizu, and Yasuo Yamamoto. Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model. WSDM2022.
  • Arjun Sondhi, David Arbour, Drew Dimmery. Balanced Off-Policy Evaluation in General Action Spaces. AISTATS2020.
  • Yi Su, Pavithra Srinath, Akshay Krishnamurthy. Adaptive Estimator Selection for Off-Policy Evaluation. ICML2020.
  • George Tucker and Jonathan Lee. Improved Estimator Selection for Off-Policy Evaluation. 2021
  • Alberto Maria Metelli, Alessio Russo, Marcello Restelli. Subgaussian and Differentiable Importance Sampling for Off-Policy Evaluation and Learning. NeurIPS2021.
  • Noveen Sachdeva, Yi Su, and Thorsten Joachims. "Off-policy Bandits with Deficient Support.", KDD2020.
  • Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims. "Effective Evaluation using Logged Bandit Feedback from Multiple Loggers.", KDD2018.
  • Nathan Kallus, Yuta Saito, and Masatoshi Uehara. "Optimal Off-Policy Evaluation from Multiple Logging Policies.", ICML2021.

- Python
Published by usaito about 4 years ago

obp - 0.5.0

The changes are summarized below:

Major updates

  • Add OPE/OPE with Continuous Actions
    • SyntheticContinuousBanditDataset (https://github.com/st-tech/zr-obp/pull/112 )
    • ContinuousOPEEstimators 1
    • ContinuousNNPolicyLearner 2
  • Add Weight clipping to IPW and DR (https://github.com/st-tech/zr-obp/pull/115 )
  • Add Automatic Hyperparameter Tuning of OPE estimators 3
  • Add arguments to the SyntheticBanditDataset class to generate more flexible synthetic data (https://github.com/st-tech/zr-obp/pull/123 )
  • Add subsample option to the OpenBanditDataset class (https://github.com/st-tech/zr-obp/pull/118 )
  • Modify an input type of off_policy_objective argument and Add some hyperparameters to NNPolicyLearner (https://github.com/st-tech/zr-obp/pull/132)

Minor updates

  • Fix README (https://github.com/st-tech/zr-obp/pull/119 )
  • Fix Scalar value checking (https://github.com/st-tech/zr-obp/pull/122 )
  • Add ValueError to OffPolicyEvaluation class (https://github.com/st-tech/zr-obp/pull/125 )
  • Fix Error messages (https://github.com/st-tech/zr-obp/pull/126 )
  • Add Some Errors (https://github.com/st-tech/zr-obp/pull/125, https://github.com/st-tech/zr-obp/pull/129 )
  • Update Quickstart examples (https://github.com/st-tech/zr-obp/pull/127 )

Cautions

  • the hyperparameter name of obp.ope.SwitchDoublyRobust has changed to lambda_ from tau
  • the type of argument off_policy_objective of obp.policy.NNPolicyLearner has changed to str from callable

References

  • Nathan Kallus and Angela Zhou. Policy Evaluation and Optimization with Continuous Treatments, AISTATS2018.
  • Nathan Kallus and Masatoshi Uehara. "Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies", NeurIPS2020.
  • Yi Su, Maria Dimakopoulou, Akshay Krishnamurthy, and Miroslav Dudik. "Doubly Robust Off-Policy Evaluation with Shrinkage.", ICML2020.

- Python
Published by usaito over 4 years ago

obp - 0.4.1

The changes are summarized below:

  • Add some functions to implement OPE for slate contextual bandit setting [1]

    • SlateSyntheticBanditFeedback (https://github.com/st-tech/zr-obp/pull/82, https://github.com/st-tech/zr-obp/pull/93, https://github.com/st-tech/zr-obp/pull/95, https://github.com/st-tech/zr-obp/pull/98, https://github.com/st-tech/zr-obp/pull/100, https://github.com/st-tech/zr-obp/pull/101, https://github.com/st-tech/zr-obp/pull/102, https://github.com/st-tech/zr-obp/pull/104, https://github.com/st-tech/zr-obp/pull/105)
    • Slate OPE Estimators (https://github.com/st-tech/zr-obp/pull/88)
  • Make OffPolicyEvaluation class more useful

    • add a method to visualize and compare OPE results of several different policies (https://github.com/st-tech/zr-obp/pull/103)
    • Enable to use different estimated_rewards_by_reg_model values (this will make MRDR [2] easier to use with obp, https://github.com/st-tech/zr-obp/pull/92)
  • Fix some bugs and Refactoring

    • Epsilon-greedy algorithm (https://github.com/st-tech/zr-obp/pull/107)
    • Type checks in OPE estimators (https://github.com/st-tech/zr-obp/pull/106)
    • Linear and logistic policies (https://github.com/st-tech/zr-obp/pull/91)
  • Welcome new contributors (https://github.com/st-tech/zr-obp/pull/94)

references

  • [1] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Benjamin Carterette. 2020. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1779–1788.
  • [2] Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In Proceedings of the 35th International Conference on Machine Learning, PMLR 80, 1447–1456.

- Python
Published by usaito over 4 years ago

obp - 0.4.0

The changes are summarized below:

  • Implemented obp/tests and Continuous Integration (Github Actions). Now you are free to contribute to the project! See our contribution guidelines
  • Added examples describing the usage of the ReplayMethod to evaluate adaptive bandit algorithms (https://github.com/st-tech/zr-obp/pull/67)
    • https://github.com/st-tech/zr-obp/blob/master/examples/quickstart/online.ipynb
  • Made many minor fixes to improve the usage of the package as listed below
    • https://github.com/st-tech/zr-obp/pull/77, https://github.com/st-tech/zr-obp/pull/69, https://github.com/st-tech/zr-obp/pull/68, https://github.com/st-tech/zr-obp/pull/62, https://github.com/st-tech/zr-obp/pull/59,

- Python
Published by usaito almost 5 years ago

obp - 0.3.3

The changes are summarized below:

  • add sample_action method to obp.policy.IPWLearner which trains an offline bandit policy that samples a non-repetitive set of actions for new data. Thus, it can be used in practice even when the action interface has a list structure
    • https://github.com/st-tech/zr-obp/pull/22
    • detailed description: https://zr-obp.readthedocs.io/en/latest/_autosummary/obp.policy.offline.html#module-obp.policy.offline
  • fix a bug in the fit_predict method of obp.ope.RegressonModel
    • https://github.com/st-tech/zr-obp/pull/23
  • Complete the benchmark experiments on a wide variety of OPE estimators using the full size of Open Bandit Dataset.
    • The detailed results and discussions can be found at the coming arXiv updates.
    • https://github.com/st-tech/zr-obp/tree/master/benchmark/ope

- Python
Published by usaito over 5 years ago

obp -

This release enhances the OBP package in the following ways. - add some new contents to the obp document: https://zr-obp.readthedocs.io/en/latest/index.html - In particular, you can use "off-policy evaluation" section as a textbook about this area - add obp.dataset.MultiClassToBanditReduction class for handling multi-class classification datasets as bandit feedback https://github.com/st-tech/zr-obp/pull/19 - https://zr-obp.readthedocs.io/en/latest/_autosummary/obp.dataset.multiclass.html#module-obp.dataset.multiclass - this will allow researchers to run their synthetic experiments with some multi-class classification datasets easily - relevant quickstart and example will be added to the repository soon - add continuous reward option to obp.dataset.SyntheticBanditDataset - add squared error (se) option for the evaluation of OPE with obp.ope.OffPolicyEvaluation - fix some README and docstring inconsistencies - refactor the dataset and ope modules

- Python
Published by usaito over 5 years ago

obp - 0.3.1

In this release, we fix some bugs in the cross-fitting procedure

- Python
Published by usaito over 5 years ago

obp -

This release enhances the OBP package in the following ways.

  • allowing evaluation policy to be stochastic, which makes the package more consistent with the formulation of OPE
  • adding some advanced estimation techniques such as cross-fitting and doubly robust with shrinkage
  • modifying examples to evaluate offline bandit policies (not online ones), which again makes the package more consistent with the formulation of OPE: https://github.com/st-tech/zr-obp/tree/master/examples
  • adding some slides: https://github.com/st-tech/zr-obp/tree/master/slides

- Python
Published by usaito over 5 years ago