docling-mcp

Making docling agentic through MCP

https://github.com/docling-project/docling-mcp

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary

Keywords from Contributors

accelerated-discovery deepsearch knowledge-extraction knowledge-graph pdf-converter rag semantic-retrieval
Last synced: 7 months ago · JSON representation ·

Repository

Making docling agentic through MCP

Basic Info
  • Host: GitHub
  • Owner: docling-project
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 3.59 MB
Statistics
  • Stars: 181
  • Watchers: 3
  • Forks: 42
  • Open Issues: 16
  • Releases: 8
Created about 1 year ago · Last pushed 7 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation Security

README.md

Docling

Docling MCP: making docling agentic

PyPI version PyPI - Python Version uv Ruff Pydantic v2 pre-commit License MIT PyPI Downloads LF AI & Data

A document processing service using the Docling-MCP library and MCP (Message Control Protocol) for tool integration.

Overview

Docling MCP is a service that provides tools for document conversion, processing and generation. It uses the Docling library to convert PDF documents into structured formats and provides a caching mechanism to improve performance. The service exposes functionality through a set of tools that can be called by client applications.

Features

  • Conversion tools:
    • PDF document conversion to structured JSON format (DoclingDocument)
  • Generation tools:
    • Document generation in DoclingDocument, which can be exported to multiple formats
  • Local document caching for improved performance
  • Support for local files and URLs as document sources
  • Memory management for handling large documents
  • Logging system for debugging and monitoring
  • RAG applications with Milvus upload and retrieval

Getting started

The easiest way to install Docling MCP is connect it to your client is launching it via uvx.

Depending on the transfer protocol required, specify the argument --transport, for example

  • stdio used e.g. in Claude for Desktop and LM Studio

    sh uvx --from docling-mcp docling-mcp-server --transport stdio

  • sse used e.g. in Llama Stack

    sh uvx --from docling-mcp docling-mcp-server --transport sse

  • streamable-http used e.g. in containers setup

    sh uvx --from docling-mcp docling-mcp-server --transport streamable-http

More options are available, e.g. the selection of which toolgroup to launch. Use the --help argument to inspect all the CLI options.

For developing the MCP tools further, please refer to the docs/development.md page for instructions.

Integration with MCP clients

One of the easiest ways to experiment with the tools provided by Docling MCP is to leverage an AI desktop client with MCP support. Most of these clients use a common config interface. Adding Docling MCP in your favorite client is usually as simple as adding the following entry in the configuration file.

json { "mcpServers": { "docling": { "command": "uvx", "args": [ "--from=docling-mcp", "docling-mcp-server" ] } } }

When using Claude for Desktop, simply edit the config file claude_desktop_config.json with the snippet above or the example provided here.

In LM Studio, edit the mcp.json file with the appropriate section or simply clik on the button below for a direct install.

Add MCP Server docling to LM Studio

Other integrations are described in ./docs/integrations/.

Examples

Converting documents

Example of prompt for converting PDF documents:

prompt Convert the PDF document at <provide file-path> into DoclingDocument and return its document-key.

Generating documents

Example of prompt for generating new documents:

``prompt I want you to write a Docling document. To do this, you will create a document first by invokingcreatenewdoclingdocument. Next you can add a title (by invokingaddtitletodoclingdocument) and then iteratively add new section-headings and paragraphs. If you want to insert lists (or nested lists), you will first open a list (by invokingopenlistindoclingdocument`), next add the listitems (by invoking add_listitem_to_list_in_docling_document). After adding list-items, you must close the list (by invoking close_list_in_docling_document). Nested lists can be created in the same way, by opening and closing additional lists.

During the writing process, you can check what has been written already by calling the export_docling_document_to_markdown tool, which will return the currently written document. At the end of the writing, you must save the document and return me the filepath of the saved document.

The document should investigate the impact of tokenizers on the quality of LLMs. ```

License

The Docling MCP codebase is under MIT license. For individual model usage, please refer to the model licenses found in the original packages.

LF AI & Data

Docling and Docling MCP is hosted as a project in the LF AI & Data Foundation.

IBM ❤️ Open Source AI: The project was started by the AI for knowledge team at IBM Research Zurich.

Owner

  • Name: Docling Project
  • Login: docling-project
  • Kind: organization
  • Location: Switzerland

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Docling
message: 'If you use Docling, please consider citing as below.'
type: software
authors:
  - name: Docling Team
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2408.09869'
    description: 'arXiv:2408.09869'
repository-code: 'https://github.com/DS4SD/docling'
license: MIT

GitHub Events

Total
  • Create event: 24
  • Issues event: 19
  • Release event: 6
  • Watch event: 131
  • Delete event: 19
  • Issue comment event: 78
  • Public event: 1
  • Push event: 91
  • Pull request review comment event: 22
  • Pull request review event: 49
  • Pull request event: 57
  • Fork event: 32
Last Year
  • Create event: 24
  • Issues event: 19
  • Release event: 6
  • Watch event: 131
  • Delete event: 19
  • Issue comment event: 78
  • Public event: 1
  • Push event: 91
  • Pull request review comment event: 22
  • Pull request review event: 49
  • Pull request event: 57
  • Fork event: 32

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 16
  • Total Committers: 5
  • Avg Commits per committer: 3.2
  • Development Distribution Score (DDS): 0.75
Past Year
  • Commits: 16
  • Committers: 5
  • Avg Commits per committer: 3.2
  • Development Distribution Score (DDS): 0.75
Top Committers
Name Email Commits
Michele Dolfi d****l@z****m 4
github-actions[bot] g****] 3
Peter W. J. Staar 9****M 3
Cesar Berrospi Ramis 7****m 3
Ash Evans 7****5 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 11
  • Total pull requests: 21
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 11 days
  • Total issue authors: 6
  • Total pull request authors: 6
  • Average comments per issue: 0.09
  • Average comments per pull request: 2.24
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 11
  • Pull requests: 21
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 11 days
  • Issue authors: 6
  • Pull request authors: 6
  • Average comments per issue: 0.09
  • Average comments per pull request: 2.24
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ceberam (9)
  • aevo98765 (3)
  • Ryfernandes (2)
  • victory3047 (1)
  • mskim8717 (1)
  • mataide (1)
Pull Request Authors
  • dolfim-ibm (16)
  • ceberam (14)
  • PeterStaar-IBM (11)
  • aevo98765 (10)
  • lwsinclair (3)
  • Ryfernandes (2)
  • maxmnemonic (1)
Top Labels
Issue Labels
enhancement (9) question (1)
Pull Request Labels
enhancement (4) documentation (2) bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 4,852 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 25
  • Total maintainers: 1
pypi.org: docling-mcp

Running Docling as an agent using tools

  • Versions: 25
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 4,852 Last month
Rankings
Dependent packages count: 9.7%
Average: 32.1%
Dependent repos count: 54.4%
Maintainers (1)
Last synced: 7 months ago