text2earth

[IEEE GRSM 2025 🔥] "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"

https://github.com/chen-yang-liu/text2earth

Keywords

foundation-models image-generation remote-sensing vision-language

Last synced: 10 months ago · JSON representation ·

Repository

[IEEE GRSM 2025 🔥] "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"

Basic Info

Host: GitHub
Owner: Chen-Yang-Liu
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://chen-yang-liu.github.io/Text2Earth/
Size: 167 MB

Statistics

Stars: 105
Watchers: 4
Forks: 4
Open Issues: 2
Releases: 0

Topics

foundation-models image-generation remote-sensing vision-language

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

**[Chenyang Liu](https://chen-yang-liu.github.io/), [Keyan Chen](https://kyanchen.github.io), [Rui Zhao](https://ruizhaocv.github.io/), [Zhengxia Zou](https://scholar.google.com.hk/citations?hl=en&user=DzwoyZsAAAAJ), and [Zhenwei Shi*✉](https://scholar.google.com.hk/citations?hl=en&user=kNhFWQIAAAAJ)** [![Page](https://img.shields.io/badge/Project-Page-87CEEB)](https://chen-yang-liu.github.io/Text2Earth/) [![Paper](https://img.shields.io/badge/arXiv-Paper-.svg)](https://ieeexplore.ieee.org/document/10988859) [![YouTube](https://img.shields.io/badge/YouTube-Video-red.svg)](https://youtu.be/Rw9wzUpO01M)

Share us a :star: if you're interested in this repo

This is official repository of the paper: "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model", accepted by IEEE Geoscience and Remote Sensing Magazine.

## News 🔥

✅ 2025.06.01: Git-RSCLIP series downloads exceeded 60,000 times 🔥

✅ 2025-04-16: The paper has been accepted by IEEE Geoscience and Remote Sensing Magazine.

✅ 2025-03-03: Our Git-RSCLIP model available: [🤗 Huggingface | 🌊 Modelscope]

✅ 2025-02-20: The Git-10M dataset is available: [🤗 Huggingface | 🌊 Modelscope].

✅ 2025-01-01: The paper is available.

🛰️ Git-10M Dataset

Dataset Download

The Git-10M dataset is a global-scale dataset, consisting of 10.5 million image-text pairs with geographical locations and resolution information.
The Git-10M dataset is available at: [🤗 Huggingface | 🌊 Modelscope].

Visual Quality Enhancement

You can skip the following steps if you have higher visual quality requirements for the image.
Some collected images exhibited poor visual quality, such as noise and artifact, which could negatively impact the training of image generation models. To address this, you can use an image enhancement model pre-trained on my private high-quality remote sensing dataset to improving the overall image quality.

Follow the steps below:

Step 1: python cd ./Text2Earth/Tools Step 2: Run Python code to process images:

python python visual_quality_enhancement.py \ --input_dir /path/to/Git-10M/images \ --output_dir /path/to/Git-10M/enhanced_images

🧩 Text2Earth Model

Pre-trained Weights

We provide two versions of the model: - Text2Earth Link : [🤗 Huggingface | 🌊 Modelscope]. - Text2Earth-inpainting Link : [🤗 Huggingface | 🌊 Modelscope].

Demo-Usage:

✅ Loading Usage 1: Use Text2Earth directly through [🤗 Diffuser] without installing our repository.

Text2Earth that generates remote sensing images from text prompts:

```python import torch from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler

model_id = "lcybuaa/Text2Earth"

Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):

scheduler = EulerDiscreteScheduler.frompretrained(modelid, subfolder="scheduler") pipe = StableDiffusionPipeline.frompretrained(modelid, torchdtype=torch.float16, scheduler=scheduler, custompipeline="pipelinetext2earthdiffusion", safetychecker=None) pipe = pipe.to("cuda") prompt = "Seven green circular farmlands are neatly arranged on the ground" image = pipe(prompt, height=256, width=256, numinferencesteps=50, guidancescale=4.0).images[0]

image.save("circular.png") ```
Text2Earth-inpainting that inpaints remote sensing images based on text prompts and inpainting masks: ```python import torch from diffusers import StableDiffusionInpaintPipeline from diffusers.utils import load_image

modelid = "lcybuaa/Text2Earth-inpainting" pipe = StableDiffusionInpaintPipeline.frompretrained( modelid, torchdtype=torch.float16, custompipeline='pipelinetext2earthdiffusioninpaint', safety_checker=None ) pipe.to("cuda")

load base and mask image

image and mask_image should be PIL images.

The mask structure is white for inpainting and black for keeping as is

initimage = loadimage(r"./Text2Earth/examples/texttoimage/inpainting/sparseresidential310.jpg") maskimage = loadimage(r"./Text2Earth/examples/texttoimage/inpainting/sparseresidential310.png")

prompt = "There is one big green lake" image = pipe(prompt=prompt, image=initimage, maskimage=maskimage, height=256, width=256, numinferencesteps=50, guidancescale=4.0).images[0] image.save("lake.png") ```

✅ Loading Usage 2: Install our repository (See Installation), then you can use the provided Pipeline, which is more convenient for users to customize and edit.

Text2Earth that generates remote sensing images from text prompts:

```python import torch from diffusers import Text2EarthDiffusionPipeline, EulerDiscreteScheduler

model_id = "lcybuaa/Text2Earth"

Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):

scheduler = EulerDiscreteScheduler.frompretrained(modelid, subfolder="scheduler") pipe = Text2EarthDiffusionPipeline.frompretrained(modelid, torchdtype=torch.float16, scheduler=scheduler, safetychecker=None) pipe = pipe.to("cuda") prompt = "Seven green circular farmlands are neatly arranged on the ground" image = pipe(prompt, height=256, width=256, numinferencesteps=50, guidance_scale=4.0).images[0]

image.save("circular.png") ```
Text2Earth-inpainting that inpaints remote sensing images based on text prompts and inpainting masks: ```python import torch from diffusers import Text2EarthDiffusionInpaintPipeline from diffusers.utils import load_image

modelid = "lcybuaa/Text2Earth-inpainting" pipe = Text2EarthDiffusionInpaintPipeline.frompretrained( modelid, torchdtype=torch.float16, custompipeline='pipelinetext2earthdiffusioninpaint', safety_checker=None ) pipe.to("cuda")

load base and mask image

image and mask_image should be PIL images.

The mask structure is white for inpainting and black for keeping as is

initimage = loadimage(r"https://github.com/Chen-Yang-Liu/Text2Earth/blob/main/images/sparseresidential310.jpg") maskimage = loadimage(r"https://github.com/Chen-Yang-Liu/Text2Earth/blob/main/images/sparseresidential310.png")

prompt = "There is one big green lake" image = pipe(prompt=prompt, image=initimage, maskimage=maskimage, height=256, width=256, numinferencesteps=50, guidancescale=4.0).images[0] image.save("lake.png") ```

✅ NOTE: Text2Earth and Text2Earth-inpainting allow users to specify the spatial resolution of the generated images, ranging from 0.5m to 128m per pixel. This can be achieved by including specific identifiers in the prompt. ```python

You can indirectly set the spatial resolution by specifying the GoogleMapLevel, which ranges from [10, 18], corresponding to resolutions from [128m, 0.5m].

The conversion formula is: Resolution = 2^(17 - Level).

GoogleMapLevel = 16 # Resolution = 2**(17-Level) contentprompt = "Seven green circular farmlands are neatly arranged on the ground" promptwithresolution = '{res}GOOGLELEVEL' + contentprompt pipe = xxx # Text2EarthDiffusionPipeline or Text2EarthDiffusionInpaintPipeline image = pipe(prompt=promptwith_resolution, ...).images[0] ```

Installation

Step 1: Download or clone the repository. python git clone https://github.com/Chen-Yang-Liu/Text2Earth.git cd ./Text2Earth Step 2: Create a virtual environment named Text2Earth_env and activate it. python conda create -n Text2Earth_env python=3.9 conda activate Text2Earth_env

Step 3: Install accelerate then run accelerate config

Step 4: Our Text2Earth is based on Diffuser. Now install Text2Earth: python cd ./Text2Earth pip install -e ".[torch]"

Training

Code is coming soon.

Evaluation

Code is coming soon.

Experimental Results

Building on the Git-10M dataset, we developed Text2Earth, a 1.3 billion parameter generative foundation model. Text2Earth excels in resolution-controllable text2image generation and demonstrates robust generalization and flexibility across multiple tasks.

Comparison of Text2image models on the previous benchmark dataset (RSICD):

On the previous benchmark dataset RSICD, Text2Earth surpasses the previous models with a significant improvement of +26.23 FID and +20.95% Zero-shot OA metric.

Zero-Shot text2image generation: Text2Earth can generate specific image content based on user-free text input, without scene-specific fine-tuning or retraining.
Unbounded Remote Sensing Scene Construction: Using our Text2Earth, users can seamlessly and infinitely generate remote sensing images on a canvas, effectively overcoming the fixed-size limitations of traditional generative models. Text2Earth’s resolution controllability is the key to maintaining visual coherence across the generated scene during the expansion process.
Remote Sensing Image Editing: Text2Earth can perform scene modifications based on user-provided text such as replacing or removing geographic features. And it ensures that these modifications are seamlessly integrated with the surrounding areas, maintaining continuity and coherence.
Cross-Modal Image Generation: Text2Earth can be used for Text-Driven Multi-modal Image Generation, including RGB, SAR, NIR, and PAN images.

Text2Earth also exhibits potential in Image-to-Image Translation, containing cross-modal translation and image enhancement, such as PAN to RGB (PAN2RGB), NIR to RGB (NIR2RGB), PAN to NIR (PAN2NIR), super-resolution, and image dehazing.

🍀 Git-RSCLIP Model

The Git-RSCLIP model is a remote sensing image-text foundation model, which is trained on the Git-10M dataset.
For more details, please see the github repository: [Github]

✍️️ Citation

If you find this paper useful in your research, please consider citing: @ARTICLE{10988859, author={Liu, Chenyang and Chen, Keyan and Zhao, Rui and Zou, Zhengxia and Shi, Zhenwei}, journal={IEEE Geoscience and Remote Sensing Magazine}, title={Text2Earth: Unlocking text-driven remote sensing image generation with a global-scale dataset and a foundation model}, year={2025}, volume={}, number={}, pages={2-23}, doi={10.1109/MGRS.2025.3560455}}

📖 License

This repo is distributed under MIT License. The code can be used for academic purposes only.

Owner

Name: Liu Chenyang
Login: Chen-Yang-Liu
Kind: user
Location: Beijing

Website: https://Chen-Yang-Liu.github.io
Repositories: 15
Profile: https://github.com/Chen-Yang-Liu

Liu Chenyang

Citation (CITATION.cff)

cff-version: 1.2.0
title: 'Diffusers: State-of-the-art diffusion models'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Patrick
    family-names: von Platen
  - given-names: Suraj
    family-names: Patil
  - given-names: Anton
    family-names: Lozhkov
  - given-names: Pedro
    family-names: Cuenca
  - given-names: Nathan
    family-names: Lambert
  - given-names: Kashif
    family-names: Rasul
  - given-names: Mishig
    family-names: Davaadorj
  - given-names: Dhruv
    family-names: Nair
  - given-names: Sayak
    family-names: Paul
  - given-names: Steven
    family-names: Liu
  - given-names: William
    family-names: Berman
  - given-names: Yiyi
    family-names: Xu
  - given-names: Thomas
    family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'
abstract: >-
  Diffusers provides pretrained diffusion models across
  multiple modalities, such as vision and audio, and serves
  as a modular toolbox for inference and training of
  diffusion models.
keywords:
  - deep-learning
  - pytorch
  - image-generation
  - hacktoberfest
  - diffusion
  - text2image
  - image2image
  - score-based-generative-modeling
  - stable-diffusion
  - stable-diffusion-diffusers
license: Apache-2.0
version: 0.12.1

GitHub Events

Total

Issues event: 12
Watch event: 125
Issue comment event: 19
Public event: 1
Push event: 27
Fork event: 4

Last Year

Issues event: 12
Watch event: 125
Issue comment event: 19
Public event: 1
Push event: 27
Fork event: 4

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 6
Total pull requests: 0
Average time to close issues: 14 days
Average time to close pull requests: N/A
Total issue authors: 6
Total pull request authors: 0
Average comments per issue: 2.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 6
Pull requests: 0
Average time to close issues: 14 days
Average time to close pull requests: N/A
Issue authors: 6
Pull request authors: 0
Average comments per issue: 2.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

VoyagerXvoyagerx (1)
wmarkcom (1)
MLS2021 (1)
caoql98 (1)
Bili-Sakura (1)
Sonettoo (1)