yuzumarker.fontdetection

✨ 首个CJK（中日韩）字体识别以及样式提取模型 YuzuMarker的字体识别模型与实现 / First-ever CJK (Chinese Japanese Korean) Font Recognition and Style Extractor, side project of YuzuMarker

https://github.com/jeffersonqin/yuzumarker.fontdetection

Keywords

chinese cjk-characters cjk-font cnn computer-vision cv font font-recognition fonts japanese korean pytorch pytorch-cnn pytorch-lightning recognition

Last synced: 11 months ago · JSON representation ·

Repository

✨ 首个CJK（中日韩）字体识别以及样式提取模型 YuzuMarker的字体识别模型与实现 / First-ever CJK (Chinese Japanese Korean) Font Recognition and Style Extractor, side project of YuzuMarker

Basic Info

Host: GitHub
Owner: JeffersonQin
License: mit
Language: Python
Default Branch: master
Homepage: https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection
Size: 172 KB

Statistics

Stars: 485
Watchers: 2
Forks: 23
Open Issues: 8
Releases: 0

Topics

chinese cjk-characters cjk-font cnn computer-vision cv font font-recognition fonts japanese korean pytorch pytorch-cnn pytorch-lightning recognition

Created almost 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

README.md

title: YuzuMarker.FontDetection emoji: 😅 colorFrom: blue colorTo: yellow sdk: docker

app_port: 7860

✨YuzuMarker.FontDetection✨

First-ever CJK (Chinese, Japanese, Korean) font recognition model

Commit activity License Contributors

News

[Update 2023/05/05] Project recommended on ruanyifeng.com (阮一峰的网络日志 - 科技爱好者周刊): https://www.ruanyifeng.com/blog/2023/05/weekly-issue-253.html
[Update 2023/11/18] Dataset is now opensource! Download on huggingface from here: https://huggingface.co/datasets/gyrojeff/YuzuMarker.FontDetection/tree/master

Scene Text Font Dataset Generation

This repository also contains data for automatically generating a dataset of scene text images with different fonts. The dataset is generated using the CJK font pack by VCB-Studio and thousands of background image from pixiv.net.

The pixiv data will not be shared since it is just randomly scraped. You may prepare your own background dataset that would fit your data distribution as you like.

For the text corpus,

Chinese is randomly generated from 3500 common Chinese characters.
Japanese is randomly generated from a list of lyrics from https://www.uta-net.com.
Korean is randomly generated from its alphabet.

All text are also mixed with English text to simulate real-world data.

Data Preparation Walkthrough

Download the CJK font pack and extract it to the dataset/fonts directory.
Prepare the background data and put them in the dataset/pixivimages directory.
Run following script to clean the file names bash python dataset_filename_preprocess.py

Generation Script Walkthrough

Now the preparation is complete. The following command can be used to generate the dataset:

bash python font_ds_generate_script.py 1 1

Note that the command is followed by two parameters. The second one is to split the task into multiple partitions, and the first one is the index of the partitioned task to run. For example, if you want to run the task in 4 partitions, you can run the following commands in parallel to speed up the process:

bash python font_ds_generate_script.py 1 4 python font_ds_generate_script.py 2 4 python font_ds_generate_script.py 3 4 python font_ds_generate_script.py 4 4

The generated dataset will be saved in the dataset/font_img directory.

Note that batch_generate_script_cmd_32.bat and batch_generate_script_cmd_64.bat are batch scripts for Windows that can be used to generate the dataset in parallel with 32 partitions and 64 partitions.

Final Check

Since the task might be terminated unexpectedly or deliberately by user. The script has a caching mechanism to avoid re-generating the same image.

In this case, the script might not be able to detect corruption in cache (might be caused by terminating when writing to files) during this task, thus we also provides a script checking the generated dataset and remove the corrupted images and labels.

bash python font_ds_detect_broken.py

After running the script, you might want to rerun the generation script to fill up the holes of the removed corrupted files.

(Optional) Linux Cluster Generation Walkthrough

If you would like to run the generation script on linux clusters, we also provides the environment setup script linux_venv_setup.sh.

The prerequisite is that you have a linux cluster with python3-venv installed and python3 is available in the path.

To setup the environment, run the following command:

bash ./linux_venv_setup.sh

The script will create a virtual environment in the venv directory and install all the required packages. The script is required in most cases since the script will also install libraqm which is required for the text rendering of PIL and is often not installed by default in most linux server distributions.

After the environment is setup, you might compile a task scheduler to deploy generation task in parallel.

The main idea is similar to the direct usage of the script, except that here we accept three parameters,

TOTAL_MISSION: the total number of partitions of the task
MIN_MISSION: the minimum partition index of the task to run
MAX_MISSION: the maximum partition index of the task to run

and the compilation command is as following:

bash gcc -D MIN_MISSION=<MIN_MISSION> \ -D MAX_MISSION=<MAX_MISSION> \ -D TOTAL_MISSION=<TOTAL_MISSION> \ batch_generate_script_linux.c \ -o <object-file-name>.out

For example if you want to run the task in 64 partitions, and want to spilit the work on 4 machines, you can compile the following command on each machine:

```bash

Machine 1

gcc -D MINMISSION=1 \ -D MAXMISSION=16 \ -D TOTALMISSION=64 \ batchgeneratescriptlinux.c \ -o mission-1-16.out

Machine 2

gcc -D MINMISSION=17 \ -D MAXMISSION=32 \ -D TOTALMISSION=64 \ batchgeneratescriptlinux.c \ -o mission-17-32.out

Machine 3

gcc -D MINMISSION=33 \ -D MAXMISSION=48 \ -D TOTALMISSION=64 \ batchgeneratescriptlinux.c \ -o mission-33-48.out

Machine 4

gcc -D MINMISSION=49 \ -D MAXMISSION=64 \ -D TOTALMISSION=64 \ batchgeneratescriptlinux.c \ -o mission-49-64.out ```

Then you can run the compiled object file on each machine to start the generation task.

bash ./mission-1-16.out # Machine 1 ./mission-17-32.out # Machine 2 ./mission-33-48.out # Machine 3 ./mission-49-64.out # Machine 4

There is also another helper script to check the progress of the generation task. It can be used as following:

bash python font_ds_stat.py

MISC Info of the Dataset

The generation is CPU bound, and the generation speed is highly dependent on the CPU performance. Indeed the work itself is an engineering problem.

Some fonts are problematic during the generation process. The script has an manual exclusion list in config/fonts.yml and also support unqualified font detection on the fly. The script will automatically skip the problematic fonts and log them for future model training.

Model Training

Have the dataset ready under the dataset directory, you can start training the model. Note that you can have more than one folder of dataset, and the script will automatically merge them as long as you provide the path to the folder by command line arguments.

```bash $ python train.py -h usage: train.py [-h] [-d [DEVICES ...]] [-b SINGLEBATCHSIZE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-p] [-i] [-a {v1,v2,v3}] [-l LR] [-s [DATASETS ...]] [-n MODEL_NAME] [-f] [-z SIZE] [-t {medium,high,heighest}] [-r]

optional arguments: -h, --help show this help message and exit -d [DEVICES ...], --devices [DEVICES ...] GPU devices to use (default: [0]) -b SINGLEBATCHSIZE, --single-batch-size SINGLEBATCHSIZE Batch size of single device (default: 64) -c CHECKPOINT, --checkpoint CHECKPOINT Trainer checkpoint path (default: None) -m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont} Model to use (default: resnet18) -p, --pretrained Use pretrained model for ResNet (default: False) -i, --crop-roi-bbox Crop ROI bounding box (default: False) -a {v1,v2,v3}, --augmentation {v1,v2,v3} Augmentation strategy to use (default: None) -l LR, --lr LR Learning rate (default: 0.0001) -s [DATASETS ...], --datasets [DATASETS ...] Datasets paths, seperated by space (default: ['./dataset/fontimg']) -n MODELNAME, --model-name MODEL_NAME Model name (default: current tag) -f, --font-classification-only Font classification only (default: False) -z SIZE, --size SIZE Model feature image input size (default: 512) -t {medium,high,heighest}, --tensor-core {medium,high,heighest} Tensor core precision (default: high) -r, --preserve-aspect-ratio-by-random-crop Preserve aspect ratio (default: False) ```

Font Classification Experiment Results

On our synthesized dataset,

| Backbone | Data Aug | Pretrained | Crop
Text
BBox | Preserve
Aspect
Ratio | Output
Norm | Input Size | Hyper
Param | Accur | Commit | Dataset | Precision | | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-: | :-:| :-: | | DeepFont | ✔️* | ❌ | ✅ | ❌ | Sigmoid | 105x105 | I¹ | [Can't Converge] | 665559f | I⁵ | bfloat163x | | DeepFont | ✔️* | ❌ | ✅ | ❌ | Sigmoid | 105x105 | IV⁴ | [Can't Converge] | 665559f | I | bfloat163x | | ResNet-18 | ❌ | ❌ | ❌ | ❌ | Sigmoid | 512x512 | I | 18.58% | 5c43f60 | I | float32 | | ResNet-18 | ❌ | ❌ | ❌ | ❌ | Sigmoid | 512x512 | II² | 14.39% | 5a85fd3 | I | bfloat163x | | ResNet-18 | ❌ | ❌ | ❌ | ❌ | Tanh | 512x512 | II | 16.24% | ff82fe6 | I | bfloat163x | | ResNet-18 | ✅⁸ | ❌ | ❌ | ❌ | Tanh | 512x512 | II | 27.71% | a976004 | I | bfloat16_3x | | ResNet-18 | ✅ | ❌ | ❌ | ❌ | Tanh | 512x512 | I | 29.95% | 8364103 | I | bfloat163x | | ResNet-18 | ✅* | ❌ | ❌ | ❌ | Sigmoid | 512x512 | I | 29.37% [Early stop] | 8d2e833 | I | bfloat163x | | ResNet-18 | ✅* | ❌ | ❌ | ❌ | Sigmoid | 416x416 | I | [Lower Trend] | d5a3215 | I | bfloat163x | | ResNet-18 | ✅* | ❌ | ❌ | ❌ | Sigmoid | 320x320 | I | [Lower Trend] | afcdd80 | I | bfloat163x | | ResNet-18 | ✅* | ❌ | ❌ | ❌ | Sigmoid | 224x224 | I | [Lower Trend] | 8b9de80 | I | bfloat163x | | ResNet-34 | ✅* | ❌ | ❌ | ❌ | Sigmoid | 512x512 | I | 32.03% | 912d566 | I | bfloat163x | | ResNet-50 | ✅* | ❌ | ❌ | ❌ | Sigmoid | 512x512 | I | 34.21% | e980b66 | I | bfloat163x | | ResNet-18 | ✅* | ✅ | ❌ | ❌ | Sigmoid | 512x512 | I | 31.24% | 416c7bb | I | bfloat163x | | ResNet-18 | ✅* | ✅ | ✅ | ❌ | Sigmoid | 512x512 | I | 34.69% | 855e240 | I | bfloat163x | | ResNet-18 | ✔️*⁹ | ✅ | ✅ | ❌ | Sigmoid | 512x512 | I | 38.32% | 1750035 | I | bfloat163x | | ResNet-18 | ✔️* | ✅ | ✅ | ❌ | Sigmoid | 512x512 | III³ | 38.87% | 0693434 | I | bfloat163x | | ResNet-50 | ✔️* | ✅ | ✅ | ❌ | Sigmoid | 512x512 | III | 48.99% | bc0f7fc | II⁶ | bfloat163x | | ResNet-50 | ✔️ | ✅ | ✅ | ❌ | Sigmoid | 512x512 | III | 48.45% | 0f071a5 | II | bfloat16_3x | | ResNet-50 | ✔️ | ✅ | ✅ | ✅¹¹ | Sigmoid | 512x512 | III | 46.12% | 0f071a5 | II | bfloat16 | | ResNet-50 | ❕¹⁰ | ✅ | ✅ | ❌ | Sigmoid | 512x512 | III | 43.86% | 0f071a5 | II | bfloat16 | | ResNet-50 | ❕ | ✅ | ✅ | ✅ | Sigmoid | 512x512 | III | 41.35% | 0f071a5 | II | bfloat16 |

* Bug in implementation
¹ learning rate = 0.0001, lambda = (2, 0.5, 1)
² learning rate = 0.00005, lambda = (4, 0.5, 1)
³ learning rate = 0.001, lambda = (2, 0.5, 1)
⁴ learning rate = 0.01, lambda = (2, 0.5, 1)
⁵ Initial version of synthesized dataset
⁶ Doubled synthesized dataset (2x)
⁷ Quadruple synthesized dataset (4x)
⁸ Data Augmentation v1: Color Jitter + Random Crop [81%-100%]
⁹ Data Augmentation v2: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°]
¹⁰ Data Augmentation v3: Color Jitter + Random Crop [30%-130%] + Random Gaussian Blur + Random Gaussian Noise + Random Rotation [-15°, 15°] + Random Horizontal Flip + Random Downsample [1, 2]
¹¹ Preserve Aspect Ratio by Random Cropping

Pretrained Models

Available at: https://huggingface.co/gyrojeff/YuzuMarker.FontDetection/tree/main

Note that since I trained everything on pytorch 2.0 with torch.compile, if you want to use the pretrained model you would need to install pytorch 2.0 and compile it with torch.compile as in demo.py.

Demo Deployment (Method 1)

To deploy the demo, you would need either the whole font dataset under ./dataset/fonts or a cache file indicating fonts of model called font_demo_cache.bin. This will be later released as resource.

To deploy, first run the following script to generate the demo font image (if you have the fonts dataset):

bash python generate_font_sample_image.py

then run the following script to start the demo server:

```bash $ python demo.py -h usage: demo.py [-h] [-d DEVICE] [-c CHECKPOINT] [-m {resnet18,resnet34,resnet50,resnet101,deepfont}] [-f] [-z SIZE] [-s] [-p PORT] [-a ADDRESS]

optional arguments: -h, --help show this help message and exit -d DEVICE, --device DEVICE GPU devices to use (default: 0), -1 for CPU -c CHECKPOINT, --checkpoint CHECKPOINT Trainer checkpoint path (default: None). Use link as huggingface://// for huggingface.co models, currently only supports model file in the root directory. -m {resnet18,resnet34,resnet50,resnet101,deepfont}, --model {resnet18,resnet34,resnet50,resnet101,deepfont} Model to use (default: resnet18) -f, --font-classification-only Font classification only (default: False) -z SIZE, --size SIZE Model feature image input size (default: 512) -s, --share Get public link via Gradio (default: False) -p PORT, --port PORT Port to use for Gradio (default: 7860) -a ADDRESS, --address ADDRESS Address to use for Gradio (default: 127.0.0.1) ```

Demo Deployment (Method 2)

If docker is available on your machine, you can deploy directly by docker as how I did for huggingface space.

You may follow the command line argument provided in the last section to change the last line of the Dockerfile to accomodate your needs.

Build the docker image:

bash docker build -t yuzumarker.fontdetection .

Run the docker image:

bash docker run -it -p 7860:7860 yuzumarker.fontdetection

Online Demo

The project is also deployed on Huggingface Space: https://huggingface.co/spaces/gyrojeff/YuzuMarker.FontDetection

Related works and Resources

DeepFont: Identify Your Font from An Image: https://arxiv.org/abs/1507.03196
Font Identification and Recommendations: https://mangahelpers.com/forum/threads/font-identification-and-recommendations.35672/
Unconstrained Text Detection in Manga: a New Dataset and Baseline: https://arxiv.org/pdf/2009.04042.pdf
SwordNet: Chinese Character Font Style Recognition Network: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9682683

Star History

Citation

If you use this work, please cite in the following manner. Thank you.

@misc{qin2023yuzumarkerfont, author = {Haoyun Qin}, title = {YuzuMarker.FontDetection}, year = {2023}, url = {https://github.com/JeffersonQin/YuzuMarker.FontDetection}, note = {GitHub repository} }

Owner

Name: gyro永不抽风
Login: JeffersonQin
Kind: user
Location: Shanghai, Philadelphia
Company: University of Pennsylvania

Website: gyrojeff.top
Repositories: 11
Profile: https://github.com/JeffersonQin

OIer | Amateur Programer | osu! | SFLS'22 | UPenn SEAS CIS'26

Citation (CITATION.bib)

@misc{qin2023yuzumarkerfont,
  author       = {Haoyun Qin},
  title        = {YuzuMarker.FontDetection},
  year         = {2023},
  url          = {https://github.com/JeffersonQin/YuzuMarker.FontDetection},
  note         = {GitHub repository}
}

GitHub Events

Total

Issues event: 8
Watch event: 55
Issue comment event: 20
Push event: 2
Fork event: 4

Last Year

Issues event: 8
Watch event: 55
Issue comment event: 20
Push event: 2
Fork event: 4

Committers

Last synced: about 1 year ago

All Time

Total Commits: 152
Total Committers: 1
Avg Commits per committer: 152.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 2
Committers: 1
Avg Commits per committer: 2.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
JeffersonQin	g**f@f**m	152

Committer Domains (Top 20 + Academic)

foxmail.com: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 33
Total pull requests: 0
Average time to close issues: 15 days
Average time to close pull requests: N/A
Total issue authors: 9
Total pull request authors: 0
Average comments per issue: 1.3
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 6
Pull requests: 0
Average time to close issues: about 1 month
Average time to close pull requests: N/A
Issue authors: 4
Pull request authors: 0
Average comments per issue: 3.83
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

JeffersonQin (22)
jaweii (3)
ChenSiyi1 (1)
AndyGuo1 (1)
jim-copyhero (1)
babyta (1)
ArmandAlbert (1)
Yasuharaaa (1)

yuzumarker.fontdetection

Science Score: 41.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

app_port: 7860

✨YuzuMarker.FontDetection✨

News

Scene Text Font Dataset Generation

Data Preparation Walkthrough

Generation Script Walkthrough

Final Check

(Optional) Linux Cluster Generation Walkthrough

Machine 1

Machine 2

Machine 3

Machine 4

MISC Info of the Dataset

Model Training

Font Classification Experiment Results

Pretrained Models

Demo Deployment (Method 1)

Demo Deployment (Method 2)

Online Demo

Related works and Resources

Star History

Citation

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels