manubot-ai-editor

A tool for performing automatic, AI-assisted revisions of Manubot manuscripts.

https://github.com/manubot/manubot-ai-editor

Last synced: 10 months ago · JSON representation ·

Repository

A tool for performing automatic, AI-assisted revisions of Manubot manuscripts.

Basic Info

Host: GitHub
Owner: manubot
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage:
Size: 735 KB

Statistics

Stars: 39
Watchers: 4
Forks: 15
Open Issues: 30
Releases: 11

Created over 3 years ago · Last pushed 11 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

Manubot AI Editor

A tool for performing automatic, AI-assisted revisions of Manubot manuscripts. Check out the manuscript about this tool for more background information.

Supported Large Language Models (LLMs)

We internally use LangChain to invoke models, which allows our tool to theoretically support whichever model providers LangChain supports. That said, we currently support OpenAI and Anthropic models only, and are working to add support for other model providers.

When using OpenAI models, our evaluations show that gpt-4-turbo is in general the best model for revising academic manuscripts. Therefore, this is the default option for OpenAI.

We are still evaluating the models for other providers as we add them, and will update this section accordingly as we complete our evaluations.

Using in a Manubot manuscript

Much of these instructions rely on the specific details of GitHub's website interface, which can change over time. See their official docs for more info on configuring GitHub Actions, managing secrets, and running workflows.

Setup

First, you should decide which model provider you'll use. You can find details on how to set up each provider below: - OpenAI: you'll want to make an OpenAI account and create an API key. - Anthropic: you'll want to make an Anthropic account and create an API key.

Start with a manuscript repo forked from Manubot rootstock, then follow these steps:

In your forks's "▶️ Actions" tab, enable GitHub Actions.
In your fork's "⚙️ Settings" tab, give GitHub Actions workflows read/write permissions and allow them to create pull requests.
If you haven't already, follow the directions above to create an account and get an API key for your chosen model provider.
In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name PROVIDER_API_KEY and paste in your API key as the secret.

If you prefer to select less options when running the workflow, you can optionally set up default values for the model provider and model at either the repo or organization level.

In your fork's "⚙️ Settings" tab, you can optionally create the folllowing Actions repository variables: - AI_EDITOR_MODEL_PROVIDER: Either "openai" or "anthropic"; sets this as the default if "(repo default)" was selected in the workflow parameters. If this is unspecified and "(repo default)" is selected, the workflow will throw an error. - AI_EDITOR_LANGUAGE_MODEL: For the given provider, what model to use if the "model" field in the workflow parameters was left empty. If this is unspecified, Manubot AI Editor will select the default model for your chosen provider.

Multiple Providers

In case you want to use several providers in the same repo, you'll have to register an API key for each provider you intend to use. Like PROVIDER_API_KEY, these keys are also registered as GitHub secrets, and can be specified at either the repository or organizational level.

We currently support the following secrets, with more to follow as we integrate more providers: - OPENAI_API_KEY: the API key for the "openai" provider - ANTHROPIC_API_KEY: the API key for the "anthropic" provider

See the API key variables docs for more information.

Configuring prompts

In order to revise your manuscript, prompts must be provided to the AI model. Manubot rootstock comes with several default, general-purpose prompts so that you can immediately use the AI editor without having to write and configure your own prompts.

But you can also define your own prompts, apply them to specific content, and control other behavior using YAML configuration files that you include with your manuscript. See docs/custom-prompts.md for more information.

Running the editor

In your forks's "▶️ Actions" tab, go to the ai-revision workflow.
Manually run the workflow. You should see several options you can specify, such as the branch to revise and the AI model to use. See these docs for an explanation of each option.
Within a few minutes, the workflow should run, the editor should generate revisions, and a pull request should be created in your fork!

Caveats

In the current implementation, the AI editor can only process, independently, one paragraph at a time. This limits the contextual information the LLM receives and thus the specificity of what it can check and fix. For instance, the revision process does not use information in other places of the manuscript to revise the current paragraph. In addition, we provide section-specific prompts to revise text from different sections of the manuscript, such as the Abstract, Introduction, Results, etc. However, some paragraphs from the same section need different revision strategies. For example, in the Discussion section of a manuscript, the first paragraph should typically summarize the findings from the Results section, while the rest of the paragraphs should follow a different structure. The AI editor, however, can only judge each paragraph with the same section-specific prompt.

Finally, in addition to revising the paragraph using an LLM, the AI Editor will also perform some postprocessing of the revised text such as using one line per sentence to simplify diffs. This might not work as expected in some cases.

We plan to reduce or remove these limitations in the future.

Using from the command line

First, install Manubot in a Python environment, e.g.:

bash pip install --upgrade manubot[ai-rev]

You also need to export an environment variable with your model provider's API key, e.g.:

```bash export OPENAIAPIKEY=ABCD1234

export ANTHROPICAPIKEY=ABCD1234 # if you were using anthropic

```

If you only ever use one model provider (e.g., just OpenAI or just Anthropic), you can alternatively provide just PROVIDER_API_KEY and it will be used with any model provider the tool invokes.

To select a specific provider, set the environment variable AI_EDITOR_MODEL_PROVIDER to one of the following values: - openai for OpenAI - anthropic for Anthropic

If AI_EDITOR_MODEL_PROVIDER is unset, it will default to "openai".

You can also provide other environment variables that will change the behavior of the editor (such as revising certain files only). For example, to specify the temperature parameter of OpenAI models, you can set the variable export AI_EDITOR_TEMPERATURE=0.50. See the complete list of supported variables for more information.

Then, from the root directory of your Manubot manuscript, run the following:

```bash

⚠ THIS WILL OVERWRITE YOUR LOCAL MANUSCRIPT

manubot ai-revision --content-directory content/ --config-directory ci/ ```

The editor will revise each paragraph of your manuscript and write back the revised files in the same directory. Finally, (assuming you are tracking changes to your manuscript with git) you can review each change and either keep it (commit it) or reject it (revert it).

Using model providers' APIs can sometimes incur costs. If you're worried about this or otherwise want to test things out before hitting the real API, you can run a local "dry run" by with a "fake" model:

bash manubot ai-revision \ --content-directory content/ \ --config-directory ci/ \ --model-type DummyManuscriptRevisionModel \ --model-kwargs add_paragraph_marks=True

When it finishes, check out your manuscript files. This will allow you to detect whether the editor is identifying paragraphs correctly. If you find a problem, please report the issue.

Text Encodings

By default, Manubot AI Editor will assume that your input and output files are encoded in the utf-8 encoding.

If you'd prefer for the tool to make a best effort to guess the input encoding and write the output in the same encoding, set the env var AI_EDITOR_SRC_ENCODING to _auto_; the detected encoding will also be used to write the output files.

Alternatively, if you prefer to have your files interpreted or written using specific encodings, you can specify the input encoding with the AI_EDITOR_SRC_ENCODING and the output encoding with the AI_EDITOR_DST_ENCODING environment variables.

Seethese variables' help docs for more information.

Also, see Python 3 Docs: Standard Encodings for a list of possible encodings.

Using the Python API

You can also use the functions of the editor directly from Python.

Since these functions are low-level and not tied to a particular manuscript, you don't have to install Manubot and can just install this package:

bash pip install -U manubot-ai-editor

Example usage:

```python import shutil from pathlib import Path

from manubotaieditor.editor import ManuscriptEditor from manubotaieditor.models import GPT3CompletionModel

create a manuscript editor object.

me = ManuscriptEditor( # where your Markdown files (*.md) are contentdir="content", # where CI-related configuration, including the AI editor's, is stored. # optional, will fallback to defaults if omitted. configdir="ci" )

create a model to revise the manuscript

(if using another provider, e.g. anthropic, replace modelprovider="openai" with modelprovider="anthropic")

model = GPT3CompletionModel( title=me.title, keywords=me.keywords, model_provider="openai", )

create a temporary directory to store the revised manuscript

outputfolder = (Path("tmp") / "manubot-ai-editor-output").resolve() shutil.rmtree(outputfolder, ignoreerrors=True) outputfolder.mkdir(parents=True, exist_ok=True)

revise the manuscript

me.revisemanuscript(outputfolder, model)

the revised manuscript is now in the `output_folder`

uncomment the following code if you want to OVERWRITE the original manuscript in the content folder with the revised manuscript

for f in output_folder.glob("*"):

f.rename(me.content_dir / f.name)

# remove output folder

output_folder.rmdir()

```

The cli_process function in this file provides another example of how to use the API.

Development and Contributions

Please see our CONTRIBUTING.md guide for more information on developing this project or making a contributon.

Owner

Name: Manubot
Login: manubot
Kind: organization

Website: https://manubot.org
Repositories: 7
Profile: https://github.com/manubot

Next generation of scholarly publishing: open, collaborative, reproducible, free.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
---
cff-version: 1.2.0
title: Manubot AI Editor
message: >-
  If you use this work in some way, please cite both the article from
  preferred-citation and the software itself. These details can be
  found within the CITATION.cff file.
type: software
authors:
  - given-names: Milton
    family-names: Pividori
    orcid: "https://orcid.org/0000-0002-3035-4403"
  - given-names: Faisal
    family-names: Alquaddoomi
    orcid: "https://orcid.org/0000-0003-4297-8747"
  - given-names: Vincent
    family-names: Rubinetti
    orcid: "https://orcid.org/0000-0002-4655-3773"
  - given-names: Dave
    family-names: Bunten
    orcid: "https://orcid.org/0000-0001-6041-3665"
  - given-names: Casey
    family-names: Greene
    orcid: "https://orcid.org/0000-0001-8713-9213"
repository-code: "https://github.com/manubot/manubot-ai-editor"
abstract: |
  A tool for performing automatic, AI-assisted revisions of Manubot manuscripts.
keywords:
  - manubot
  - AI
  - editor
  - manuscript
  - revision
  - research
  - large-language-models
license: BSD-3-Clause
identifiers:
  - description: Manuscript
    type: doi
    value: "10.1093/jamia/ocae139"
  - description: Software
    type: doi
    value: "10.5281/zenodo.14911573"
preferred-citation:
  title: >-
    A publishing infrastructure for Artificial Intelligence (AI)-assisted academic authoring
  type: article
  url: https://academic.oup.com/jamia/article/31/9/2103/7693927
  authors: 
    - given-names: Milton
      family-names: Pividori
      orcid: "https://orcid.org/0000-0002-3035-4403"
    - given-names: Casey S.
      family-names: Greene
      orcid: "https://orcid.org/0000-0001-8713-9213"
  date-published: 2024-09-01
  identifiers:
    - type: doi
      value: 10.1093/jamia/ocae139

GitHub Events

Total

Create event: 12
Release event: 2
Issues event: 20
Watch event: 4
Delete event: 4
Issue comment event: 28
Push event: 65
Pull request event: 36
Pull request review event: 76
Pull request review comment event: 60
Fork event: 10

Last Year

Create event: 12
Release event: 2
Issues event: 20
Watch event: 4
Delete event: 4
Issue comment event: 28
Push event: 65
Pull request event: 36
Pull request review event: 76
Pull request review comment event: 60
Fork event: 10

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 50
Total pull requests: 60
Average time to close issues: 2 months
Average time to close pull requests: 22 days
Total issue authors: 11
Total pull request authors: 6
Average comments per issue: 1.44
Average comments per pull request: 1.13
Merged pull requests: 55
Bot issues: 0
Bot pull requests: 2

Past Year

Issues: 17
Pull requests: 43
Average time to close issues: about 2 months
Average time to close pull requests: 25 days
Issue authors: 6
Pull request authors: 3
Average comments per issue: 0.76
Average comments per pull request: 1.02
Merged pull requests: 39
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

miltondp (19)
d33bs (12)
falquaddoomi (7)
vincerubinetti (3)
danich1 (2)
dhimmel (1)
castedo (1)
shanshen123654789 (1)
SilasK (1)
cgreene (1)
agitter (1)

Pull Request Authors

d33bs (34)
falquaddoomi (24)
miltondp (9)
dependabot[bot] (2)
cgreene (2)
vincerubinetti (2)

Top Labels

Issue Labels

enhancement (9) documentation (3)

Pull Request Labels

dependencies (2) python (2)

Packages

Total packages: 1
Total downloads:
- pypi 39 last-month

Total dependent packages: 1
Total dependent repositories: 0
Total versions: 35
Total maintainers: 3

pypi.org: manubot-ai-editor

A Manubot plugin to revise a manuscript using GPT-3

Homepage: https://github.com/manubot/manubot-ai-editor
Documentation: https://manubot-ai-editor.readthedocs.io/
License: BSD-3-Clause
Latest release: 0.5.5
published about 1 year ago

Versions: 35
Dependent Packages: 1
Dependent Repositories: 0
Downloads: 39 Last month

Rankings

Dependent packages count: 6.6%

Downloads: 15.1%

Stargazers count: 17.2%

Average: 18.6%

Forks count: 23.2%

Dependent repos count: 30.6%

Maintainers (3)

miltondp d33bs falquaddoomi