https://github.com/google-research/arxiv-latex-cleaner

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

https://github.com/google-research/arxiv-latex-cleaner

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    8 of 42 committers (19.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

arxiv latex

Keywords from Contributors

distributed parallel hyperparameter-optimization keras reinforcement-learning
Last synced: 6 months ago · JSON representation

Repository

arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv

Basic Info
  • Host: GitHub
  • Owner: google-research
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 480 KB
Statistics
  • Stars: 6,405
  • Watchers: 34
  • Forks: 371
  • Open Issues: 28
  • Releases: 37
Topics
arxiv latex
Created over 7 years ago · Last pushed 11 months ago
Metadata Files
Readme Contributing License

README.md

arxiv_latex_cleaner

This tool allows you to easily clean the LaTeX code of your paper to submit to arXiv. From a folder containing all your code, e.g. /path/to/latex/, it creates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload to arXiv.

Example call:

bash arxiv_latex_cleaner /path/to/latex --resize_images --im_size 500 --images_allowlist='{"images/im.png":2000}'

Or simply from a config file

bash arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml

Installation:

bash pip install arxiv-latex-cleaner

| :exclamation: arxivlatexcleaner is only compatible with Python >=3.9 :exclamation: | | ---------------------------------------------------------------------------------- |

If using MacOS, you can install using Homebrew:

bash brew install arxiv_latex_cleaner

Alternatively, you can download the source code:

bash git clone https://github.com/google-research/arxiv-latex-cleaner cd arxiv-latex-cleaner/ python -m arxiv_latex_cleaner --help

And install as a command-line program directly from the source code:

bash python setup.py install

Main features:

Privacy-oriented

  • Removes all auxiliary files (.aux, .log, .out, etc.).
  • Removes all comments from your code (yes, those are visible on arXiv and you do not want them to be). These also include \begin{comment}\end{comment}, \iffalse\fi, and \if0\fi environments.
  • Optionally removes user-defined commands entered with commands_to_delete (such as \todo{} that you redefine as the empty string at the end).
  • Optionally allows you to define custom regex replacement rules through a cleaner_config.yaml file.

Size-oriented

There is a 50MB limit on arXiv submissions, so to make it fit:

  • Removes all unused .tex files (those that are not in the root and not included in any other .tex file).
  • Removes all unused images that take up space (those that are not actually included in any used .tex file).
  • Optionally resizes all images to im_size pixels, to reduce the size of the submission. You can allowlist some images to skip the global size using images_allowlist.
  • Optionally compresses .pdf files using ghostscript (Linux and Mac only). You can allowlist some PDFs to skip the global size using images_allowlist.
  • Optionally converts PNG images to JPG format to reduce file size.

TikZ picture source code concealment

To prevent the upload of tikzpicture source code or raw simulation data, this feature:

  • Replaces the tikzpicture environment \begin{tikzpicture} ... \end{tikzpicture} with the respective \includegraphics{EXTERNAL_TIKZ_FOLDER/picture_name.pdf}.
  • Requires externally compiled TikZ pictures as .pdf files in folder EXTERNAL_TIKZ_FOLDER. See section 52 (Externalization Library) in the PGF/TikZ manual on TikZ picture externalization.
  • Only replaces environments with preceding \tikzsetnextfilename{picture_name} command (as in \tikzsetnextfilename{picture_name}\begin{tikzpicture} ... \end{tikzpicture}) where the externalized picture_name.pdf filename matches picture_name.

More sophisticated pattern replacement based on regex group captures

Sometimes it is useful to work with a set of custom LaTeX commands when writing a paper. To get rid of them upon arXiv submission, one can simply revert them to plain LaTeX with a regular expression insertion.

yaml { "pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}', "insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}', "description" : "Replace figcomp" }

The pattern above will find all \figcomp{path}{w1}{w2} commands and replace them with \parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}. Note that the insertion template is filled with the named groups captures from the pattern. Note that the replacement is processed before all \includegraphics commands are processed and corresponding file paths are copied, making sure all figure files are copied to the cleaned version. See also cleaner_config.yaml for details on how to specify the patterns.

Usage:

``` usage: arxivlatexcleaner@v1.0.8 [-h] [--resizeimages] [--imsize IMSIZE] [--compress_pdf] [--pdfimresolution PDFIMRESOLUTION] [--imagesallowlist IMAGESALLOWLIST] [--keepbib] [--commandstodelete COMMANDSTODELETE [COMMANDSTO_DELETE ...]] [--commandsonlytodelete COMMANDSONLYTODELETE [COMMANDSONLYTODELETE ...]] [--environmentstodelete ENVIRONMENTSTODELETE [ENVIRONMENTSTODELETE ...]] [--ifexceptions IFEXCEPTIONS [IFEXCEPTIONS ...]] [--useexternaltikz USEEXTERNALTIKZ] [--svginkscape [SVGINKSCAPE]] [--convertpngto_jpg] [--pngquality PNGQUALITY] [--pngsizethreshold PNGSIZETHRESHOLD] [--config CONFIG] [--verbose] inputfolder

Clean the LaTeX code of your paper to submit to arXiv. Check the README for more information on the use.

positional arguments: input_folder Input folder containing the LaTeX code.

optional arguments: -h, --help show this help message and exit --resizeimages Resize images. --imsize IMSIZE Size of the output images (in pixels, longest side). Fine tune this to get as close to 10MB as possible. --compresspdf Compress PDF images using ghostscript (Linux and Mac only). --pdfimresolution PDFIMRESOLUTION Resolution (in dpi) to which the tool resamples the PDF images. --imagesallowlist IMAGESALLOWLIST Images (and PDFs) that won't be resized to the default resolution, but the one provided here. Value is pixel for images, and dpi forPDFs, as in --imsize and --pdfimresolution, respectively. Format is a dictionary as: '{"path/to/im.jpg": 1000}' --keepbib Avoid deleting the *.bib files. --commandstodelete COMMANDSTODELETE [COMMANDSTODELETE ...] LaTeX commands that will be deleted. Useful for e.g. user-defined \todo commands. For example, to delete all occurrences of \todo1{} and \todo2{}, run the tool with --commands_to_delete todo1 todo2.Please note that the positional argument input_folder cannot come immediately after commands_to_delete, as the parser does not have any way to know if it's another command to delete. --commandsonlytodelete COMMANDSONLYTODELETE [COMMANDSONLYTODELETE ...] LaTeX commands that will be deleted but the text wrapped in the commands will be retained. Useful for commands that change text formats and colors, which you may want to remove but keep the text within. Usages are exactly the same as commandstodelete. Note that if the commands listed here duplicate that after commandstodelete, the default action will be retaining the wrapped text. --environmentstodelete ENVIRONMENTSTODELETE [ENVIRONMENTSTODELETE ...] LaTeX environments that will be deleted. Useful for e.g. user-defined comment environments. For example, to delete all occurrences of \begin{note} ... \end{note}, run the tool with `--environmentstodelete note. Please note that the positional argumentinputfolder cannot come immediately after environmentstodelete, as the parser does not have any way to know if it's another environment to delete. --if_exceptions IF_EXCEPTIONS [IF_EXCEPTIONS ...] Constant TeX primitive conditionals (\iffalse, \iftrue, etc.) are simplified, i.e., true branches are kept, false branches deleted. To parse the conditional constructs correctly, all commands starting with\ifare assumed to be TeX primitive conditionals (e.g., declared by \newif\ifvar). Some known exceptions to this rule are already included (e.g., \iff, \ifthenelse, etc.), but you can add custom exceptions using--ifexceptions iffalt`. --useexternaltikz USEEXTERNALTIKZ Folder (relative to input folder) containing externalized tikz figures in PDF format. --svginkscape [SVGINKSCAPE] Include PDF files generated by Inkscape via the \includesvg command from the svg package. This is done by replacing the \includesvg calls with \includeinkscape calls pointing to the generated `.pdftexfiles. By default, these files and the generated PDFs are located under./svg-inkscape (relative to the input folder), but a different path (relative to the input folder) can be provided in case a differentinkscapepathwas set when loading thesvg package. --convert_png_to_jpg Convert PNG images to JPG format to reduce file size --png_quality PNG_QUALITY JPG quality for PNG conversion (0-100, default: 50) --png_size_threshold PNG_SIZE_THRESHOLD Minimum PNG file size in MB to apply quality reduction (default: 0.5) --config CONFIG Read settings from.yamlconfig file. If command line arguments are provided additionally, the config file parameters are updated with the command line parameters. --verbose Enable detailed output. ``

Testing:

bash python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test

Note

This is not an officially supported Google product.

Owner

  • Name: Google Research
  • Login: google-research
  • Kind: organization
  • Location: Earth

GitHub Events

Total
  • Issues event: 11
  • Watch event: 1,004
  • Issue comment event: 9
  • Push event: 1
  • Pull request event: 4
  • Fork event: 46
Last Year
  • Issues event: 11
  • Watch event: 1,002
  • Issue comment event: 9
  • Push event: 1
  • Pull request event: 4
  • Fork event: 46

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 151
  • Total Committers: 42
  • Avg Commits per committer: 3.595
  • Development Distribution Score (DDS): 0.623
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Jordi Pont-Tuset j****t@g****m 57
Jordi Pont-Tuset j****t@g****m 15
Sandro Braun s****3@g****m 11
hayesall a****r@b****t 8
Waiss Azizian w****n@e****r 6
Sebastian Dörner d****r@i****e 4
Dylan Duhamel 7****l@u****m 3
Kacper Sokol k****1@m****k 3
sdnr s****r@u****m 3
Aditya Sriram a****m@g****m 2
Andy Lin 3****o@u****m 2
Gilbert Shih l****0@g****m 2
Meraj Hashemizadeh 3****i@u****m 2
Penghui Guo g****8@g****m 2
Sebastiano m****o@g****m 2
Vaufreyd g****b@d****t 2
philgzl s****s@p****m 2
Andrea a****w@g****m 1
Anna-Katharina Wickert a****k@u****m 1
Bernd Busse b****d@b****e 1
Dominik Moritz d****z@g****m 1
Dominique Vaufreydaz D****z@i****r 1
Eric Heiden e****n@o****m 1
Giulio Romualdi g****i@g****m 1
Gustavo Pinto g****6@g****m 1
Jae-Won Chung j****g@u****u 1
Jessica Zhang j****3@g****m 1
Jonas Schult J****t@u****m 1
Kento Nozawa k****w@k****p 1
Matthew Andres Moreno m****n@g****m 1
and 12 more...

Packages

  • Total packages: 3
  • Total downloads:
    • homebrew 78 last-month
    • pypi 3,509 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 4
    (may contain duplicates)
  • Total versions: 88
  • Total maintainers: 2
pypi.org: arxiv-latex-cleaner

Cleans the LaTeX code of your paper to submit to arXiv.

  • Versions: 39
  • Dependent Packages: 0
  • Dependent Repositories: 4
  • Downloads: 3,509 Last month
Rankings
Stargazers count: 1.1%
Forks count: 3.0%
Average: 5.8%
Downloads: 7.2%
Dependent repos count: 7.5%
Dependent packages count: 10.1%
Maintainers (2)
Last synced: 7 months ago
proxy.golang.org: github.com/google-research/arxiv-latex-cleaner
  • Versions: 37
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.8%
Last synced: 7 months ago
formulae.brew.sh: arxiv_latex_cleaner

Clean LaTeX code to submit to arXiv

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 78 Last month
Rankings
Stargazers count: 7.3%
Forks count: 10.1%
Dependent packages count: 19.0%
Average: 31.4%
Dependent repos count: 50.7%
Downloads: 70.1%
Last synced: 7 months ago

Dependencies

.github/workflows/release-workflow.yml actions
  • actions/checkout v2 composite
  • actions/create-release v1 composite
  • actions/setup-python v2 composite
requirements.txt pypi
  • absl_py >=0.12
  • pillow *
  • pyyaml *
  • regex *
setup.py pypi