https://github.com/google-research/arxiv-latex-cleaner
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
8 of 42 committers (19.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
Basic Info
Statistics
- Stars: 6,405
- Watchers: 34
- Forks: 371
- Open Issues: 28
- Releases: 37
Topics
Metadata Files
README.md
arxiv_latex_cleaner
This tool allows you to easily clean the LaTeX code of your paper to submit to
arXiv. From a folder containing all your code, e.g. /path/to/latex/, it
creates a new folder /path/to/latex_arXiv/, that is ready to ZIP and upload to
arXiv.
Example call:
bash
arxiv_latex_cleaner /path/to/latex --resize_images --im_size 500 --images_allowlist='{"images/im.png":2000}'
Or simply from a config file
bash
arxiv_latex_cleaner /path/to/latex --config cleaner_config.yaml
Installation:
bash
pip install arxiv-latex-cleaner
| :exclamation: arxivlatexcleaner is only compatible with Python >=3.9 :exclamation: | | ---------------------------------------------------------------------------------- |
If using MacOS, you can install using Homebrew:
bash
brew install arxiv_latex_cleaner
Alternatively, you can download the source code:
bash
git clone https://github.com/google-research/arxiv-latex-cleaner
cd arxiv-latex-cleaner/
python -m arxiv_latex_cleaner --help
And install as a command-line program directly from the source code:
bash
python setup.py install
Main features:
Privacy-oriented
- Removes all auxiliary files (
.aux,.log,.out, etc.). - Removes all comments from your code (yes, those are visible on arXiv and you
do not want them to be). These also include
\begin{comment}\end{comment},\iffalse\fi, and\if0\fienvironments. - Optionally removes user-defined commands entered with
commands_to_delete(such as\todo{}that you redefine as the empty string at the end). - Optionally allows you to define custom regex replacement rules through a
cleaner_config.yamlfile.
Size-oriented
There is a 50MB limit on arXiv submissions, so to make it fit:
- Removes all unused
.texfiles (those that are not in the root and not included in any other.texfile). - Removes all unused images that take up space (those that are not actually
included in any used
.texfile). - Optionally resizes all images to
im_sizepixels, to reduce the size of the submission. You can allowlist some images to skip the global size usingimages_allowlist. - Optionally compresses
.pdffiles using ghostscript (Linux and Mac only). You can allowlist some PDFs to skip the global size usingimages_allowlist. - Optionally converts PNG images to JPG format to reduce file size.
TikZ picture source code concealment
To prevent the upload of tikzpicture source code or raw simulation data, this feature:
- Replaces the tikzpicture environment
\begin{tikzpicture} ... \end{tikzpicture}with the respective\includegraphics{EXTERNAL_TIKZ_FOLDER/picture_name.pdf}. - Requires externally compiled TikZ pictures as
.pdffiles in folderEXTERNAL_TIKZ_FOLDER. See section 52 (Externalization Library) in the PGF/TikZ manual on TikZ picture externalization. - Only replaces environments with preceding
\tikzsetnextfilename{picture_name}command (as in\tikzsetnextfilename{picture_name}\begin{tikzpicture} ... \end{tikzpicture}) where the externalizedpicture_name.pdffilename matchespicture_name.
More sophisticated pattern replacement based on regex group captures
Sometimes it is useful to work with a set of custom LaTeX commands when writing a paper. To get rid of them upon arXiv submission, one can simply revert them to plain LaTeX with a regular expression insertion.
yaml
{
"pattern" : '(?:\\figcomp{\s*)(?P<first>.*?)\s*}\s*{\s*(?P<second>.*?)\s*}\s*{\s*(?P<third>.*?)\s*}',
"insertion" : '\parbox[c]{{ {second} \linewidth}} {{ \includegraphics[width= {third} \linewidth]{{figures/{first} }} }}',
"description" : "Replace figcomp"
}
The pattern above will find all \figcomp{path}{w1}{w2} commands and replace
them with
\parbox[c]{w1\linewidth}{\includegraphics[width=w2\linewidth]{figures/path}}.
Note that the insertion template is filled with the
named groups captures
from the pattern. Note that the replacement is processed before all
\includegraphics commands are processed and corresponding file paths are
copied, making sure all figure files are copied to the cleaned version. See also
cleaner_config.yaml for details on how to specify the
patterns.
Usage:
``` usage: arxivlatexcleaner@v1.0.8 [-h] [--resizeimages] [--imsize IMSIZE] [--compress_pdf] [--pdfimresolution PDFIMRESOLUTION] [--imagesallowlist IMAGESALLOWLIST] [--keepbib] [--commandstodelete COMMANDSTODELETE [COMMANDSTO_DELETE ...]] [--commandsonlytodelete COMMANDSONLYTODELETE [COMMANDSONLYTODELETE ...]] [--environmentstodelete ENVIRONMENTSTODELETE [ENVIRONMENTSTODELETE ...]] [--ifexceptions IFEXCEPTIONS [IFEXCEPTIONS ...]] [--useexternaltikz USEEXTERNALTIKZ] [--svginkscape [SVGINKSCAPE]] [--convertpngto_jpg] [--pngquality PNGQUALITY] [--pngsizethreshold PNGSIZETHRESHOLD] [--config CONFIG] [--verbose] inputfolder
Clean the LaTeX code of your paper to submit to arXiv. Check the README for more information on the use.
positional arguments: input_folder Input folder containing the LaTeX code.
optional arguments:
-h, --help show this help message and exit
--resizeimages Resize images.
--imsize IMSIZE Size of the output images (in pixels, longest side).
Fine tune this to get as close to 10MB as possible.
--compresspdf Compress PDF images using ghostscript (Linux and Mac
only).
--pdfimresolution PDFIMRESOLUTION
Resolution (in dpi) to which the tool resamples the
PDF images.
--imagesallowlist IMAGESALLOWLIST
Images (and PDFs) that won't be resized to the default
resolution, but the one provided here. Value is pixel
for images, and dpi forPDFs, as in --imsize and
--pdfimresolution, respectively. Format is a
dictionary as: '{"path/to/im.jpg": 1000}'
--keepbib Avoid deleting the *.bib files.
--commandstodelete COMMANDSTODELETE [COMMANDSTODELETE ...]
LaTeX commands that will be deleted. Useful for e.g.
user-defined \todo commands. For example, to delete
all occurrences of \todo1{} and \todo2{}, run the tool
with --commands_to_delete todo1 todo2.Please note
that the positional argument input_folder cannot
come immediately after commands_to_delete, as the
parser does not have any way to know if it's another
command to delete.
--commandsonlytodelete COMMANDSONLYTODELETE [COMMANDSONLYTODELETE ...]
LaTeX commands that will be deleted but the text
wrapped in the commands will be retained. Useful for
commands that change text formats and colors, which
you may want to remove but keep the text within. Usages
are exactly the same as commandstodelete. Note that if
the commands listed here duplicate that after
commandstodelete, the default action will be retaining
the wrapped text.
--environmentstodelete ENVIRONMENTSTODELETE [ENVIRONMENTSTODELETE ...]
LaTeX environments that will be deleted. Useful for e.g.
user-defined comment environments. For example, to
delete all occurrences of \begin{note} ... \end{note},
run the tool with `--environmentstodelete note.
Please note that the positional argumentinputfolder
cannot come immediately after
environmentstodelete, as the parser does not have
any way to know if it's another environment to delete.
--if_exceptions IF_EXCEPTIONS [IF_EXCEPTIONS ...]
Constant TeX primitive conditionals (\iffalse, \iftrue,
etc.) are simplified, i.e., true branches are kept, false
branches deleted. To parse the conditional constructs
correctly, all commands starting with\ifare assumed to
be TeX primitive conditionals (e.g., declared by
\newif\ifvar). Some known exceptions to this rule are
already included (e.g., \iff, \ifthenelse, etc.), but you
can add custom exceptions using--ifexceptions iffalt`.
--useexternaltikz USEEXTERNALTIKZ
Folder (relative to input folder) containing
externalized tikz figures in PDF format.
--svginkscape [SVGINKSCAPE]
Include PDF files generated by Inkscape via the
\includesvg command from the svg package. This is
done by replacing the \includesvg calls with
\includeinkscape calls pointing to the generated
`.pdftexfiles. By default, these files and the
generated PDFs are located under./svg-inkscape
(relative to the input folder), but a different path
(relative to the input folder) can be provided in case a
differentinkscapepathwas set when loading thesvg
package.
--convert_png_to_jpg Convert PNG images to JPG format to reduce file size
--png_quality PNG_QUALITY
JPG quality for PNG conversion (0-100, default: 50)
--png_size_threshold PNG_SIZE_THRESHOLD
Minimum PNG file size in MB to apply quality reduction (default: 0.5)
--config CONFIG Read settings from.yamlconfig file. If command
line arguments are provided additionally, the config
file parameters are updated with the command line
parameters.
--verbose Enable detailed output.
``
Testing:
bash
python -m unittest arxiv_latex_cleaner.tests.arxiv_latex_cleaner_test
Note
This is not an officially supported Google product.
Owner
- Name: Google Research
- Login: google-research
- Kind: organization
- Location: Earth
- Website: https://research.google
- Repositories: 226
- Profile: https://github.com/google-research
GitHub Events
Total
- Issues event: 11
- Watch event: 1,004
- Issue comment event: 9
- Push event: 1
- Pull request event: 4
- Fork event: 46
Last Year
- Issues event: 11
- Watch event: 1,002
- Issue comment event: 9
- Push event: 1
- Pull request event: 4
- Fork event: 46
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Jordi Pont-Tuset | j****t@g****m | 57 |
| Jordi Pont-Tuset | j****t@g****m | 15 |
| Sandro Braun | s****3@g****m | 11 |
| hayesall | a****r@b****t | 8 |
| Waiss Azizian | w****n@e****r | 6 |
| Sebastian Dörner | d****r@i****e | 4 |
| Dylan Duhamel | 7****l@u****m | 3 |
| Kacper Sokol | k****1@m****k | 3 |
| sdnr | s****r@u****m | 3 |
| Aditya Sriram | a****m@g****m | 2 |
| Andy Lin | 3****o@u****m | 2 |
| Gilbert Shih | l****0@g****m | 2 |
| Meraj Hashemizadeh | 3****i@u****m | 2 |
| Penghui Guo | g****8@g****m | 2 |
| Sebastiano | m****o@g****m | 2 |
| Vaufreyd | g****b@d****t | 2 |
| philgzl | s****s@p****m | 2 |
| Andrea | a****w@g****m | 1 |
| Anna-Katharina Wickert | a****k@u****m | 1 |
| Bernd Busse | b****d@b****e | 1 |
| Dominik Moritz | d****z@g****m | 1 |
| Dominique Vaufreydaz | D****z@i****r | 1 |
| Eric Heiden | e****n@o****m | 1 |
| Giulio Romualdi | g****i@g****m | 1 |
| Gustavo Pinto | g****6@g****m | 1 |
| Jae-Won Chung | j****g@u****u | 1 |
| Jessica Zhang | j****3@g****m | 1 |
| Jonas Schult | J****t@u****m | 1 |
| Kento Nozawa | k****w@k****p | 1 |
| Matthew Andres Moreno | m****n@g****m | 1 |
| and 12 more... | ||
Committer Domains (Top 20 + Academic)
Packages
- Total packages: 3
-
Total downloads:
- homebrew 78 last-month
- pypi 3,509 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 4
(may contain duplicates) - Total versions: 88
- Total maintainers: 2
pypi.org: arxiv-latex-cleaner
Cleans the LaTeX code of your paper to submit to arXiv.
- Homepage: https://github.com/google-research/arxiv-latex-cleaner
- Documentation: https://arxiv-latex-cleaner.readthedocs.io/
- License: Apache License, Version 2.0
-
Latest release: 1.0.8
published almost 2 years ago
Rankings
Maintainers (2)
proxy.golang.org: github.com/google-research/arxiv-latex-cleaner
- Documentation: https://pkg.go.dev/github.com/google-research/arxiv-latex-cleaner#section-documentation
- License: apache-2.0
-
Latest release: v1.0.8
published almost 2 years ago
Rankings
formulae.brew.sh: arxiv_latex_cleaner
Clean LaTeX code to submit to arXiv
- Homepage: https://github.com/google-research/arxiv-latex-cleaner
- License: Apache-2.0
-
Latest release: 1.0.8
published almost 2 years ago
Rankings
Dependencies
- actions/checkout v2 composite
- actions/create-release v1 composite
- actions/setup-python v2 composite
- absl_py >=0.12
- pillow *
- pyyaml *
- regex *