https://github.com/deepset-ai/dc-custom-component-template
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: deepset-ai
- Language: Python
- Default Branch: main
- Size: 42 KB
Statistics
- Stars: 1
- Watchers: 3
- Forks: 4
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
deepset AI Platform Custom Component Template
This repository contains a template for creating custom components for your deepset pipelines. Components are Python code snippets that perform specific tasks within your pipeline. This template will guide you through all the necessary elements your custom component must include.
This template contains two sample components which are ready to be used:
- CharacterSplitter implemented in /src/dc_custom_component/example_components/preprocessors/character_splitter.py: A component that splits documents into smaller chunks by the number of characters you set. You can use it in indexing pipelines.
- KeywordBooster implemented in /src/dc_custom_component/example_components/rankers/keyword_booster.py: A component that boosts the score of documents that contain specific keywords. You can use it in query pipelines.
We've created these examples to help you understand how to structure your components. When importing your custom components to deepset AI Platform, you can remove or rename the example_components folder with the sample components, if you're not planning to use them.
This template serves as a custom components library for your organization. Only the components present in the most recently uploaded template are available for use in your pipelines.
Documentation
For more information about custom components, see Custom Components. For a step-by-step guide on creating custom components, see Create a Custom Component. See also our tutorial for creating a custom RegexBooster component.
1. Setting up your local dev environment
Prerequisites
- Python v3.12 or v3.13
hatchpackage manager
Hatch: A Python package manager
We use hatch to manage our Python packages. Install it with pip:
Linux and macOS:
bash
pip install hatch
Windows: Follow the instructions under https://hatch.pypa.io/1.12/install/#windows
Once installed, create a virtual environment by running:
bash
hatch shell
This installs all the necessary packages needed to create a custom component. You can reference this virtual environment in your IDE.
For more information on hatch, please refer to the official Hatch documentation.
2. Developing your custom component
Structure
| File | Description |
|------|-------------|
| /src/dc_custom_component/components | Directory for implementing custom components. You can logically group custom components in sub-directories. See how sample components are grouped by type. |
| /src/dc_custom_component/__about__.py | Your custom components' version. Bump the version every time you update your component before uploading it to deepset. This is not needed if you are using the GitHub action workflow (in this case the version will be determined by the GitHub release tag). |
| /pyproject.toml | Information about the project. If needed, add your components' dependencies in this file in the dependencies section. |
The directory where your custom component is stored determines the name of the component group in Pipeline Builder. For example, the CharacterSplitter component would appear in the Preprocessors group, while the KeywordBooster component would be listed in the Rankers group. You can drag these components onto the canvas to use them.
When working with YAML, the location of your custom component implementation defines your component's type. For example, the sample components have the following types because of their location:
- dc_custom_component.example_components.preprocessors.character_splitter.CharacterSplitter
- dc_custom_component.example_components.rankers.keyword_booster.KeywordBooster
Here is how you would add them to a pipeline: ```yaml components: splitter: type: dccustomcomponent.examplecomponents.preprocessors.charactersplitter.CharacterSplitter init_parameters: {} ...
```
Working on your component
- Fork this repository.
- Navigate to the
/src/dc_custom_component/components/folder. - Add your custom components following the examples.
- Update the components' version in
/src/__about__.py.
NOTE: This is not needed if you are using the GitHub action workflow (in this case the version will be determined by the GitHub release tag). 5. Format your code using the
hatch run code-quality:allcommand. (Note that hatch commands work from the project root directory only.)
Formatting
We defined a suite of formatting tools. To format your code, run:
bash
hatch run code-quality:all
Testing
It's crucial to thoroughly test your custom component before uploading it to deepset. Consider adding unit and integration tests to ensure your component functions correctly within a pipeline.
- pytest is ready to be used with hatch
- implement your tests under /test
- run hatch run tests
3. Uploading your custom component
You can upload in one of two ways: - By releasing your forked directory. - By zipping the forked repository and uploading it with commands.
Uploading by releasing your forked repository
We use GitHub Actions to build and push custom components to deepset AI Platform. The action runs the tests and code quality checks before pushing the component code to deepset . Create a tag to trigger the build and the push job. This method helps you keep track of the changes and investigate the code deployed to deepset.
After forking or cloning this repository:
- Push all your changes to the forked repository.
- Add the
DEEPSET_CLOUD_API_KEYsecret to your repository. This is your deepset API key. (To add a secret, go to your repository and choose Settings > Secrets and variables > Actions > New repository secret.) - Enable workflows for your repository by going to Actions > Enable workflows.
- (Optional) Adjust the workflow file in
.github/workflows/publish_on_tag.yamlas needed. - (Optional) If you're not using the European deepset tenant, change the
API_URLvariable in.github/workflows/publish_on_tag.yaml - Create a new release with a tag to trigger the GitHub Actions workflow. The workflow builds and pushes the custom component to deepset with the tag as version. For help, see GitHub documentation.
Warning: When using this GitHub Actions workflow, the version specified in the
__about__file will be overwritten by the tag value. Make sure your tag matches the desired version number.
You can check the upload status in the Actions tab of your forked repository.
Uploading a zipped repository with commands
In this method, you run commands to zip and push the repository to deepset AI Platform.
1. (Optional) If you're not using the European tenant, set the API URL:
- deepset AI Platform Europe:
- On Linux and macOS: export API_URL="https://api.cloud.deepset.ai"
- On Windows: set API_URL=https://api.cloud.deepset.ai
- deepset AI Plaform US:
- On Linux and macOS: export API_URL="https://api.us.deepset.ai"
- On Windows: set API_URL=https://api.us.deepset.ai
2. Set your deepset API key.
- On Linux and macOS: export API_KEY=<TOKEN>
- On Windows: set API_KEY=<TOKEN>
3. Upload your project by running the following command from inside of this project:
- On Linux and macOS: hatch run dc:build-and-push
- On Windows: hatch run dc:build-windows and hatch run dc:push-windows
This creates a ZIP file called custom_component.zip in the dist directory and uploads it to deepset.
4. Debugging
To debug the installation of custom components in deepset AI Platform, you can run:
- On Linux and macOS:
hatch run dc:logs - On Windows:
hatch run dc:logs-windows
This will print the installation logs of the latest version of your custom components.
Owner
- Name: deepset
- Login: deepset-ai
- Kind: organization
- Email: hello@deepset.ai
- Location: Berlin, Germany
- Website: https://deepset.ai
- Twitter: deepset_ai
- Repositories: 14
- Profile: https://github.com/deepset-ai
Building enterprise search systems powered by latest NLP & open-source.
GitHub Events
Total
- Create event: 13
- Issues event: 3
- Watch event: 1
- Delete event: 3
- Member event: 3
- Issue comment event: 9
- Push event: 35
- Pull request review comment event: 13
- Pull request review event: 26
- Pull request event: 26
- Fork event: 14
Last Year
- Create event: 13
- Issues event: 3
- Watch event: 1
- Delete event: 3
- Member event: 3
- Issue comment event: 9
- Push event: 35
- Pull request review comment event: 13
- Pull request review event: 26
- Pull request event: 26
- Fork event: 14
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 2
- Total pull requests: 26
- Average time to close issues: 18 days
- Average time to close pull requests: 2 days
- Total issue authors: 2
- Total pull request authors: 8
- Average comments per issue: 0.5
- Average comments per pull request: 0.12
- Merged pull requests: 19
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 24
- Average time to close issues: 18 days
- Average time to close pull requests: about 24 hours
- Issue authors: 2
- Pull request authors: 8
- Average comments per issue: 0.5
- Average comments per pull request: 0.13
- Merged pull requests: 17
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- virtualroot (1)
- deep-rloebbert (1)
- sjrl (1)
Pull Request Authors
- tstadel (8)
- ArzelaAscoIi (7)
- oryx1729 (6)
- agnieszka-m (5)
- wochinge (4)
- FHardow (4)
- JasperLS (2)
- sjrl (1)
- faymarie (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- haystack-ai >=2.0.0