text2earth
[IEEE GRSM 2025 π₯] "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
βCITATION.cff file
Found CITATION.cff file -
βcodemeta.json file
Found codemeta.json file -
β.zenodo.json file
Found .zenodo.json file -
βDOI references
Found 1 DOI reference(s) in README -
βAcademic publication links
Links to: scholar.google, ieee.org -
βAcademic email domains
-
βInstitutional organization owner
-
βJOSS paper metadata
-
βScientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Keywords
Repository
[IEEE GRSM 2025 π₯] "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"
Basic Info
- Host: GitHub
- Owner: Chen-Yang-Liu
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://chen-yang-liu.github.io/Text2Earth/
- Size: 167 MB
Statistics
- Stars: 105
- Watchers: 4
- Forks: 4
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
**[Chenyang Liu](https://chen-yang-liu.github.io/), [Keyan Chen](https://kyanchen.github.io), [Rui Zhao](https://ruizhaocv.github.io/), [Zhengxia Zou](https://scholar.google.com.hk/citations?hl=en&user=DzwoyZsAAAAJ), and [Zhenwei Shi*β](https://scholar.google.com.hk/citations?hl=en&user=kNhFWQIAAAAJ)** [](https://chen-yang-liu.github.io/Text2Earth/) [:
scheduler = EulerDiscreteScheduler.frompretrained(modelid, subfolder="scheduler") pipe = StableDiffusionPipeline.frompretrained(modelid, torchdtype=torch.float16, scheduler=scheduler, custompipeline="pipelinetext2earthdiffusion", safetychecker=None) pipe = pipe.to("cuda") prompt = "Seven green circular farmlands are neatly arranged on the ground" image = pipe(prompt, height=256, width=256, numinferencesteps=50, guidancescale=4.0).images[0]
image.save("circular.png") ```
Text2Earth-inpaintingthat inpaints remote sensing images based on text prompts and inpainting masks: ```python import torch from diffusers import StableDiffusionInpaintPipeline from diffusers.utils import load_imagemodelid = "lcybuaa/Text2Earth-inpainting" pipe = StableDiffusionInpaintPipeline.frompretrained( modelid, torchdtype=torch.float16, custompipeline='pipelinetext2earthdiffusioninpaint', safety_checker=None ) pipe.to("cuda")
load base and mask image
image and mask_image should be PIL images.
The mask structure is white for inpainting and black for keeping as is
initimage = loadimage(r"./Text2Earth/examples/texttoimage/inpainting/sparseresidential310.jpg") maskimage = loadimage(r"./Text2Earth/examples/texttoimage/inpainting/sparseresidential310.png")
prompt = "There is one big green lake" image = pipe(prompt=prompt, image=initimage, maskimage=maskimage, height=256, width=256, numinferencesteps=50, guidancescale=4.0).images[0] image.save("lake.png") ```
β
Loading Usage 2: Install our repository (See Installation), then you can use the provided Pipeline, which is more convenient for users to customize and edit.
Text2Earththat generates remote sensing images from text prompts:```python import torch from diffusers import Text2EarthDiffusionPipeline, EulerDiscreteScheduler
model_id = "lcybuaa/Text2Earth"
Running the pipeline (if you don't swap the scheduler it will run with the default DDIM, in this example we are swapping it to DPMSolverMultistepScheduler):
scheduler = EulerDiscreteScheduler.frompretrained(modelid, subfolder="scheduler") pipe = Text2EarthDiffusionPipeline.frompretrained(modelid, torchdtype=torch.float16, scheduler=scheduler, safetychecker=None) pipe = pipe.to("cuda") prompt = "Seven green circular farmlands are neatly arranged on the ground" image = pipe(prompt, height=256, width=256, numinferencesteps=50, guidance_scale=4.0).images[0]
image.save("circular.png") ```
Text2Earth-inpaintingthat inpaints remote sensing images based on text prompts and inpainting masks: ```python import torch from diffusers import Text2EarthDiffusionInpaintPipeline from diffusers.utils import load_imagemodelid = "lcybuaa/Text2Earth-inpainting" pipe = Text2EarthDiffusionInpaintPipeline.frompretrained( modelid, torchdtype=torch.float16, custompipeline='pipelinetext2earthdiffusioninpaint', safety_checker=None ) pipe.to("cuda")
load base and mask image
image and mask_image should be PIL images.
The mask structure is white for inpainting and black for keeping as is
initimage = loadimage(r"https://github.com/Chen-Yang-Liu/Text2Earth/blob/main/images/sparseresidential310.jpg") maskimage = loadimage(r"https://github.com/Chen-Yang-Liu/Text2Earth/blob/main/images/sparseresidential310.png")
prompt = "There is one big green lake" image = pipe(prompt=prompt, image=initimage, maskimage=maskimage, height=256, width=256, numinferencesteps=50, guidancescale=4.0).images[0] image.save("lake.png") ```
β
NOTE: Text2Earth and Text2Earth-inpainting allow users to specify the spatial resolution of the generated images, ranging from 0.5m to 128m per pixel.
This can be achieved by including specific identifiers in the prompt.
```python
You can indirectly set the spatial resolution by specifying the GoogleMapLevel, which ranges from [10, 18], corresponding to resolutions from [128m, 0.5m].
The conversion formula is: Resolution = 2^(17 - Level).
GoogleMapLevel = 16 # Resolution = 2**(17-Level) contentprompt = "Seven green circular farmlands are neatly arranged on the ground" promptwithresolution = '{res}GOOGLELEVEL' + contentprompt pipe = xxx # Text2EarthDiffusionPipeline or Text2EarthDiffusionInpaintPipeline image = pipe(prompt=promptwith_resolution, ...).images[0] ```
Installation
Step 1: Download or clone the repository.
python
git clone https://github.com/Chen-Yang-Liu/Text2Earth.git
cd ./Text2Earth
Step 2: Create a virtual environment named Text2Earth_env and activate it.
python
conda create -n Text2Earth_env python=3.9
conda activate Text2Earth_env
Step 3: Install accelerate then run accelerate config
Step 4: Our Text2Earth is based on Diffuser. Now install Text2Earth:
python
cd ./Text2Earth
pip install -e ".[torch]"
Training
Code is coming soon.
Evaluation
Code is coming soon.
Experimental Results
Building on the Git-10M dataset, we developed Text2Earth, a 1.3 billion parameter generative foundation model. Text2Earth excels in resolution-controllable text2image generation and demonstrates robust generalization and flexibility across multiple tasks.
- Comparison of Text2image models on the previous benchmark dataset (RSICD):
On the previous benchmark dataset RSICD, Text2Earth surpasses the previous models with a significant improvement of +26.23 FID and +20.95% Zero-shot OA metric.
Zero-Shot text2image generation: Text2Earth can generate specific image content based on user-free text input, without scene-specific fine-tuning or retraining.
Unbounded Remote Sensing Scene Construction: Using our Text2Earth, users can seamlessly and infinitely generate remote sensing images on a canvas, effectively overcoming the fixed-size limitations of traditional generative models. Text2Earthβs resolution controllability is the key to maintaining visual coherence across the generated scene during the expansion process.
Remote Sensing Image Editing: Text2Earth can perform scene modifications based on user-provided text such as replacing or removing geographic features. And it ensures that these modifications are seamlessly integrated with the surrounding areas, maintaining continuity and coherence.
Cross-Modal Image Generation: Text2Earth can be used for Text-Driven Multi-modal Image Generation, including RGB, SAR, NIR, and PAN images.
Text2Earth also exhibits potential in Image-to-Image Translation, containing cross-modal translation and image enhancement, such as PAN to RGB (PAN2RGB), NIR to RGB (NIR2RGB), PAN to NIR (PAN2NIR), super-resolution, and image dehazing.
π Git-RSCLIP Model
- The Git-RSCLIP model is a remote sensing image-text foundation model, which is trained on the Git-10M dataset.
- For more details, please see the github repository: [Github]
βοΈοΈ Citation
If you find this paper useful in your research, please consider citing:
@ARTICLE{10988859,
author={Liu, Chenyang and Chen, Keyan and Zhao, Rui and Zou, Zhengxia and Shi, Zhenwei},
journal={IEEE Geoscience and Remote Sensing Magazine},
title={Text2Earth: Unlocking text-driven remote sensing image generation with a global-scale dataset and a foundation model},
year={2025},
volume={},
number={},
pages={2-23},
doi={10.1109/MGRS.2025.3560455}}
π License
This repo is distributed under MIT License. The code can be used for academic purposes only.
Owner
- Name: Liu Chenyang
- Login: Chen-Yang-Liu
- Kind: user
- Location: Beijing
- Website: https://Chen-Yang-Liu.github.io
- Repositories: 15
- Profile: https://github.com/Chen-Yang-Liu
Liu Chenyang
Citation (CITATION.cff)
cff-version: 1.2.0
title: 'Diffusers: State-of-the-art diffusion models'
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Patrick
family-names: von Platen
- given-names: Suraj
family-names: Patil
- given-names: Anton
family-names: Lozhkov
- given-names: Pedro
family-names: Cuenca
- given-names: Nathan
family-names: Lambert
- given-names: Kashif
family-names: Rasul
- given-names: Mishig
family-names: Davaadorj
- given-names: Dhruv
family-names: Nair
- given-names: Sayak
family-names: Paul
- given-names: Steven
family-names: Liu
- given-names: William
family-names: Berman
- given-names: Yiyi
family-names: Xu
- given-names: Thomas
family-names: Wolf
repository-code: 'https://github.com/huggingface/diffusers'
abstract: >-
Diffusers provides pretrained diffusion models across
multiple modalities, such as vision and audio, and serves
as a modular toolbox for inference and training of
diffusion models.
keywords:
- deep-learning
- pytorch
- image-generation
- hacktoberfest
- diffusion
- text2image
- image2image
- score-based-generative-modeling
- stable-diffusion
- stable-diffusion-diffusers
license: Apache-2.0
version: 0.12.1
GitHub Events
Total
- Issues event: 12
- Watch event: 125
- Issue comment event: 19
- Public event: 1
- Push event: 27
- Fork event: 4
Last Year
- Issues event: 12
- Watch event: 125
- Issue comment event: 19
- Public event: 1
- Push event: 27
- Fork event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 6
- Total pull requests: 0
- Average time to close issues: 14 days
- Average time to close pull requests: N/A
- Total issue authors: 6
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 0
- Average time to close issues: 14 days
- Average time to close pull requests: N/A
- Issue authors: 6
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- VoyagerXvoyagerx (1)
- wmarkcom (1)
- MLS2021 (1)
- caoql98 (1)
- Bili-Sakura (1)
- Sonettoo (1)