https://github.com/google-research/unique-randomizer
UniqueRandomizer is a data structure for sampling outputs of a randomized program, such as a neural sequence model, incrementally and without replacement.
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary
Repository
UniqueRandomizer is a data structure for sampling outputs of a randomized program, such as a neural sequence model, incrementally and without replacement.
Basic Info
Statistics
- Stars: 10
- Watchers: 5
- Forks: 4
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
UniqueRandomizer
Overview
UniqueRandomizer is a data structure for sampling outputs of a randomized program, such as a neural sequence model, incrementally and without replacement.
- Incremental sampling: Instead of sampling a large batch of outputs all at once, as with beam search, UniqueRandomizer provides samples one at a time. This enables flexibility in stopping criteria, such as stopping the sampling process as soon as a satisfactory output is found.
- Sampling without replacement: In many applications, a neural model is used to produce candidate solutions to some search or optimization problem. In such applications it is usually desirable to consider unique candidate solutions, since duplicates are typically not useful.
For more details, refer to our paper, Incremental Sampling Without Replacement for Sequence Models, published at ICML 2020.
BibTeX entry:
@article{shi2020uniquerandomizer,
title = {Incremental Sampling Without Replacement for Sequence Models},
author = {Kensen Shi and David Bieber and Charles Sutton},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
year = {2020}
}
Installation
python3 -m pip install --user unique-randomizer
This package requires Python 3. The above command automatically installs the following dependencies as well:
- absl-py >= 0.6.1
- numpy >= 1.15.4
- scipy >= 1.1.0
Usage
To use UniqueRandomizer, first identify the program or function that you wish to
draw unique samples from, such as the draw_sample function in the following
example:
def draw_sample(sequence_model, state):
"""Draws a sample (a sequence of token indices) from the sequence model."""
tokens = []
token = BOS
for i in range(MAX_LEN):
probs, state = sequence_model(token, state)
token = np.random.choice(np.arange(len(probs)), p=probs)
if token == EOS:
break
tokens.append(token)
return tokens
Note that draw_sample can take inputs and can use control flow such as loops,
conditionals, and recursion. There are only two constraints on the draw_sample
function:
- It must be deterministic given the inputs, except for random choices
provided by
np.random.choice(or some other method of selecting a random index given a discrete probability distribution). - Two different sequences of random choices must lead to
draw_samplereturning different outputs.
Next, add a UniqueRandomizer object as an input to draw_sample, and use its
sample_distribution function to replace np.random.choice:
diff
- def draw_sample(sequence_model, state):
+ def draw_sample(sequence_model, state, randomizer):
"""Draws a sample (a sequence of token indices) from the sequence model."""
tokens = []
token = BOS
for i in range(MAX_LEN):
probs, state = sequence_model(token, state)
- token = np.random.choice(np.arange(len(probs)), p=probs)
+ token = randomizer.sample_distribution(probs)
if token == EOS:
break
tokens.append(token)
return tokens
Finally, a simple loop around draw_sample can collect unique samples, as
follows:
def draw_unique_samples(model, state, num_samples):
"""Draws multiple unique samples from the sequence model."""
samples = []
randomizer = unique_randomizer.UniqueRandomizer()
for _ in range(num_samples):
samples.append(draw_sample(model, state, randomizer))
randomizer.mark_sequence_complete()
return samples
Code Samples
We include a few code samples that demonstrate how to use UniqueRandomizer:
examples/weighted_coin_flips.py: This provides a very simple example of using UniqueRandomizer. The functionflip_two_weighted_coinssimulates flipping a pair of weighted coins. Thesample_flips_without_replacementfunction then uses UniqueRandomizer to efficiently sample outputs offlip_two_weighted_coinswithout replacement.examples/expand_grammar.py: This defines a Probabilistic Context-Free Grammar (PCFG), as well as methods to sample elements of the grammar without replacement by using UniqueRandomizer, rejection sampling, and Stochastic Beam Search (SBS). The scriptexamples/expand_grammar_main.pyenables easy comparison between the different sampling methods under different scenarios.examples/sequence_example.py: This implements sampling without replacement from a sequence model, using UniqueRandomizer, Batched UniqueRandomizer, rejection sampling, and SBS. The scriptexamples/sequence_example_main.pyenables easy comparison between the different sampling methods under different scenarios.
Disclaimer
This is not an officially supported Google product.
Owner
- Name: Google Research
- Login: google-research
- Kind: organization
- Location: Earth
- Website: https://research.google
- Repositories: 226
- Profile: https://github.com/google-research
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 10 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 1
- Total maintainers: 1
pypi.org: unique-randomizer
UniqueRandomizer: Incremental Sampling Without Replacement
- Homepage: https://github.com/google-research/unique-randomizer
- Documentation: https://unique-randomizer.readthedocs.io/
- License: Apache Software License
-
Latest release: 0.0.1
published over 5 years ago