mc-vectorizer

https://github.com/gannon44/mc-vectorizer

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.6%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: Gannon44
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 1.29 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created 12 months ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

MC Vectorizer

🚀 A Python library for encoding and decoding Minecraft structures using diffusion-ready vector representations. The current implementation handles a custom .npy structure, but I plan to make it more generalizable in the future.

Features

✅ Block Name Vectorization – Converts block names to 32-dimensional latent vectors.
✅ Efficient Reverse Mapping – Uses KDTree for fast latent-to-block lookups.
✅ Structure Encoding – Converts Minecraft .npy structures into trainable representations.
✅ Structure Decoding – Recovers block IDs or average colors from latent vectors.
✅ Command-Line & API Support – Use as a CLI tool or integrate into your Python code.

Overview:

The block_vectorization module provides a (very incomplete) framework for converting Minecraft structure data between two formats: 1. Vectorization:
Convert a 3D structure (where each voxel contains either a block ID string or 0) into a 4D latent representation. Each block is represented by a 32‑channel vector built from: - Order-based (positional) encoding, - Semantic embeddings (from a SentenceTransformer, reduced via UMAP), - TF-IDF embeddings (reduced via UMAP).

Reverse–Vectorization:
Convert the 32‑channel latent representation back into a human–readable format by:
- Using a KDTree (built from the latent mappings) to find the closest latent vector,
- Mapping that latent vector to a block name,
- Converting the block name to a block ID (using a provided mapping), and
- Optionally mapping the block ID to its average color.

The module is designed for both programmatic use (via single function calls) and via the command–line interface.

Module Components:

BlockNameVectorizer Class:
- Purpose: Build and store a mapping from block names to 32-dimensional latent vectors.
- Key Methods:
  - transform(block_name): Returns the latent vector for a given block name.
  - reverse(vector): Returns the closest block name for a given latent vector using an internal KDTree.
- Mapping Persistence:
  The vectorizer can save its mappings to JSON files (block_name2latent.json and latent2block_name.json) and load them from a specified directory. Use get_block_name_vectorizer() to obtain an instance, optionally forcing recreation from a CSV file.
Vectorization Functions:
- vectorize_structure(input_data, block_vectorizer, block_id2name, latent_dim=32, output_path=None)
  - Input: A 3D structure provided as a file path, directory, single NumPy array, or list/tuple of arrays. Each voxel is either a block ID (string) or 0.
  - Processing:
    For each voxel:
    - If the value is 0, it is interpreted as "Air."
    - Otherwise, the block ID is mapped to a block name using the block_id2name dictionary, then the vectorizer converts it to a latent vector.
  - Output: A 4D NumPy array with shape (H, W, D, 32) or a set of such arrays saved to disk if output_path is provided.
Reverse–Vectorization Functions:
- reverse_vectorize_structure(input_data, block_vectorizer, block_name2id, block_id2color=None, stop_at_block_id=False, latent_dim=32, output_path=None)
  - Input: A 4D latent vector structure (or collection thereof) provided as a file path, directory, or already loaded array(s).
  - Processing:
    For each latent vector:
    - The KDTree in block_vectorizer finds the closest latent vector.
    - This latent vector is mapped to a block name.
    - The block name is converted to a block ID using block_name2id.
    - Optionally (if stop_at_block_id is False) the block ID is further mapped to an average color using block_id2color.
  - Output: A structure containing either block IDs (if stop_at_block_id is True) or average color values.
Command–Line Interface (CLI):
- The module supports two subcommands:
  - vectorize:
    Convert block ID structures to latent vectors.
  - Arguments:
    - --input_path: Input file or folder containing .npy files with block IDs.
    - --output_path: Where to save the resulting .npy file(s).
    - --block_id2name: Path to a JSON file mapping block IDs to block names.
    - --vectorizer_csv: CSV file (no header) containing the ordered list of block names.
    - --vectorizer_dir: Directory to load or save vectorizer mappings.
    - --recreate_vectorizer: (Flag) Force recreation of vectorizer mappings from CSV.
    - --latent_dim: Latent vector dimensionality (default 32).
  - reverse:
    Convert latent vector structures back to block IDs or average colors.
  - Arguments:
    - --input_path: Input file or folder containing latent vector .npy files.
    - --output_path: Output file or folder.
    - --block_name2id: Path to JSON mapping from block names to block IDs.
    - --block_id2color: Path to JSON mapping from block IDs to average colors.
    - --stop_at_block_id: (Flag) If set, the output will be block IDs; otherwise, average colors.
    - --vectorizer_csv, --vectorizer_dir, --recreate_vectorizer, --latent_dim: As above.

Programmatic API Usage:

To use the module within your Python code, import the functions and classes:

```python from blockvectorization import getblocknamevectorizer, vectorizestructure, reversevectorize_structure

Obtain a BlockNameVectorizer (load precomputed mappings if available)

vectorizer = getblocknamevectorizer(csvpath="mappings/orderedblocknames.csv", precomputeddir="mappings", latentdim=32)

To vectorize a single loaded structure (NumPy array):

with open("inputstructure.npy", "rb") as f: structure = np.load(f, allowpickle=True)

block_id2name is a dict mapping block IDs to block names.

with open("mappings/blockid2name.json", "r") as f: blockid2name = json.load(f) vectorized = vectorizestructure(structure, vectorizer, blockid2name, latent_dim=32)

To reverse–vectorize back to block IDs:

with open("mappings/blockname2id.json", "r") as f: blockname2id = json.load(f)

Optionally, load block_id2color mapping for color conversion.

with open("mappings/blockid2color.json", "r") as f: blockid2color = json.load(f) reverseddata = reversevectorizestructure(vectorized, vectorizer, blockname2id, blockid2color=blockid2color, stopatblockid=False, latentdim=32) ```

Command–Line Usage:

From the terminal, you can run the module using:

Vectorization: bash python block_vectorization.py vectorize \ --input_path input_structures \ --output_path vectorized_structures \ --block_id2name mappings/block_id2name.json \ --vectorizer_csv mappings/ordered_block_names.csv \ --vectorizer_dir mappings \ --latent_dim 32
Reverse–Vectorization: bash python block_vectorization.py reverse \ --input_path vectorized_structures \ --output_path reconstructed_output \ --block_name2id mappings/block_name2id.json \ --block_id2color mappings/block_id2color.json \ --vectorizer_csv mappings/ordered_block_names.csv \ --vectorizer_dir mappings \ --latent_dim 32 To output block IDs rather than colors, add the flag --stop_at_block_id.

Owner

Login: Gannon44
Kind: user

Repositories: 1
Profile: https://github.com/Gannon44

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Gonsiorowski"
  given-names: "Jack Gannon"
  orcid: "https://orcid.org/0009-0000-9880-8048"
title: "mc-vectorizer"
version: 1.0.0
date-released: 2025-03-10
url: "https://github.com/Gannon44/mc-vectorizer"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science