mc-vectorizer
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Gannon44
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Size: 1.29 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MC Vectorizer
🚀 A Python library for encoding and decoding Minecraft structures using diffusion-ready vector representations. The current implementation handles a custom .npy structure, but I plan to make it more generalizable in the future.
Features
✅ Block Name Vectorization – Converts block names to 32-dimensional latent vectors.
✅ Efficient Reverse Mapping – Uses KDTree for fast latent-to-block lookups.
✅ Structure Encoding – Converts Minecraft .npy structures into trainable representations.
✅ Structure Decoding – Recovers block IDs or average colors from latent vectors.
✅ Command-Line & API Support – Use as a CLI tool or integrate into your Python code.
Overview:
The block_vectorization module provides a (very incomplete) framework for converting Minecraft structure data between two formats:
1. Vectorization:
Convert a 3D structure (where each voxel contains either a block ID string or 0) into a 4D latent representation. Each block is represented by a 32‑channel vector built from:
- Order-based (positional) encoding,
- Semantic embeddings (from a SentenceTransformer, reduced via UMAP),
- TF-IDF embeddings (reduced via UMAP).
- Reverse–Vectorization:
Convert the 32‑channel latent representation back into a human–readable format by:- Using a KDTree (built from the latent mappings) to find the closest latent vector,
- Mapping that latent vector to a block name,
- Converting the block name to a block ID (using a provided mapping), and
- Optionally mapping the block ID to its average color.
The module is designed for both programmatic use (via single function calls) and via the command–line interface.
Module Components:
BlockNameVectorizer Class:
- Purpose: Build and store a mapping from block names to 32-dimensional latent vectors.
- Key Methods:
transform(block_name): Returns the latent vector for a given block name.reverse(vector): Returns the closest block name for a given latent vector using an internal KDTree.
- Mapping Persistence:
The vectorizer can save its mappings to JSON files (block_name2latent.jsonandlatent2block_name.json) and load them from a specified directory. Useget_block_name_vectorizer()to obtain an instance, optionally forcing recreation from a CSV file.
Vectorization Functions:
vectorize_structure(input_data, block_vectorizer, block_id2name, latent_dim=32, output_path=None)- Input: A 3D structure provided as a file path, directory, single NumPy array, or list/tuple of arrays. Each voxel is either a block ID (string) or 0.
- Processing:
For each voxel:- If the value is 0, it is interpreted as "Air."
- Otherwise, the block ID is mapped to a block name using the
block_id2namedictionary, then the vectorizer converts it to a latent vector.
- Output: A 4D NumPy array with shape (H, W, D, 32) or a set of such arrays saved to disk if
output_pathis provided.
Reverse–Vectorization Functions:
reverse_vectorize_structure(input_data, block_vectorizer, block_name2id, block_id2color=None, stop_at_block_id=False, latent_dim=32, output_path=None)- Input: A 4D latent vector structure (or collection thereof) provided as a file path, directory, or already loaded array(s).
- Processing:
For each latent vector:- The KDTree in
block_vectorizerfinds the closest latent vector. - This latent vector is mapped to a block name.
- The block name is converted to a block ID using
block_name2id. - Optionally (if
stop_at_block_idis False) the block ID is further mapped to an average color usingblock_id2color.
- The KDTree in
- Output: A structure containing either block IDs (if
stop_at_block_idis True) or average color values.
Command–Line Interface (CLI):
- The module supports two subcommands:
vectorize:
Convert block ID structures to latent vectors.- Arguments:
--input_path: Input file or folder containing .npy files with block IDs.--output_path: Where to save the resulting .npy file(s).--block_id2name: Path to a JSON file mapping block IDs to block names.--vectorizer_csv: CSV file (no header) containing the ordered list of block names.--vectorizer_dir: Directory to load or save vectorizer mappings.--recreate_vectorizer: (Flag) Force recreation of vectorizer mappings from CSV.--latent_dim: Latent vector dimensionality (default 32).
reverse:
Convert latent vector structures back to block IDs or average colors.- Arguments:
--input_path: Input file or folder containing latent vector .npy files.--output_path: Output file or folder.--block_name2id: Path to JSON mapping from block names to block IDs.--block_id2color: Path to JSON mapping from block IDs to average colors.--stop_at_block_id: (Flag) If set, the output will be block IDs; otherwise, average colors.--vectorizer_csv,--vectorizer_dir,--recreate_vectorizer,--latent_dim: As above.
- The module supports two subcommands:
Programmatic API Usage:
To use the module within your Python code, import the functions and classes:
```python from blockvectorization import getblocknamevectorizer, vectorizestructure, reversevectorize_structure
Obtain a BlockNameVectorizer (load precomputed mappings if available)
vectorizer = getblocknamevectorizer(csvpath="mappings/orderedblocknames.csv", precomputeddir="mappings", latentdim=32)
To vectorize a single loaded structure (NumPy array):
with open("inputstructure.npy", "rb") as f: structure = np.load(f, allowpickle=True)
block_id2name is a dict mapping block IDs to block names.
with open("mappings/blockid2name.json", "r") as f: blockid2name = json.load(f) vectorized = vectorizestructure(structure, vectorizer, blockid2name, latent_dim=32)
To reverse–vectorize back to block IDs:
with open("mappings/blockname2id.json", "r") as f: blockname2id = json.load(f)
Optionally, load block_id2color mapping for color conversion.
with open("mappings/blockid2color.json", "r") as f: blockid2color = json.load(f) reverseddata = reversevectorizestructure(vectorized, vectorizer, blockname2id, blockid2color=blockid2color, stopatblockid=False, latentdim=32) ```
Command–Line Usage:
From the terminal, you can run the module using:
Vectorization:
bash python block_vectorization.py vectorize \ --input_path input_structures \ --output_path vectorized_structures \ --block_id2name mappings/block_id2name.json \ --vectorizer_csv mappings/ordered_block_names.csv \ --vectorizer_dir mappings \ --latent_dim 32Reverse–Vectorization:
bash python block_vectorization.py reverse \ --input_path vectorized_structures \ --output_path reconstructed_output \ --block_name2id mappings/block_name2id.json \ --block_id2color mappings/block_id2color.json \ --vectorizer_csv mappings/ordered_block_names.csv \ --vectorizer_dir mappings \ --latent_dim 32To output block IDs rather than colors, add the flag--stop_at_block_id.
Owner
- Login: Gannon44
- Kind: user
- Repositories: 1
- Profile: https://github.com/Gannon44
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gonsiorowski" given-names: "Jack Gannon" orcid: "https://orcid.org/0009-0000-9880-8048" title: "mc-vectorizer" version: 1.0.0 date-released: 2025-03-10 url: "https://github.com/Gannon44/mc-vectorizer"
GitHub Events
Total
- Push event: 11
- Create event: 2
Last Year
- Push event: 11
- Create event: 2