https://github.com/abhinavjindalnihr/rag-search

https://github.com/abhinavjindalnihr/rag-search

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AbhinavJindalNIHR
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 80.1 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

logo_NIHR

Semantic Search

As the following data model and process flow diagrams will explain, this program makes a call to NIHR RDN data warehouse through a secure connection. After some necessary pre-processing of data using pandas and numpy, it then converts the text fields into sentence embeddings vectors using sentence-transformer. The converted vectors are then stored in the FAISS vector store for seamless querying. Afterwards, a user or an analyst uses the front-end streamlit web-app to input a readable query (e.g. ‘Show me research around mental health’) AND the number of records to be fetched. The model then compares the user input against the vector store using k-nearest neighbours approach to produce a list of research studies, along with a rank(lower is better) and a similarity score(lower is better).

Authors

Data flow diagram

Screenshot 2025-02-20 202954

Consider the DeReLiCT Code principles when designing your project.

Future improvements

improvement

Terms of use

Guidance

Some key commands/directions for building your project are listed here. See further details on the wiki.

Essential linux/bash commands

The virtual machine is running on Ubuntu, a Linux distribution.

bash cd # change directory to home cd /workspaces # return to the /workspaces directory cd .. # go up a level in the directory structure ls # list the contents of the current directory pwd # get the path to the current working directory

Essential git commands

We will use git and a GitHUb remote to track our changes. You can use git in the same way you would from your local machine.

bash git status # check on status of current git repo git branch NAME # create a branch called NAME git checkout NAME # swap over to the branch called NAME git add . # stage all changed files for commit, you can replace "." with FILE to add a single file called FILE git commit # commit the staged files (this will open your text editor to create a commit message) git push origin NAME # push local commits to the remote branch tracking the branch NAME

Essential conda commands

```bash

from terminal/outside a conda env

conda env list # list built environments conda env create --file PATH/TO/A/FILE # build a conda env from a file conda env create --file .devcontainer/env-files/mkdocs-env.yml # build a conda env from a file conda activate ENV-NAME # activate the environment ENV-NAME

from inside a conda env (after activating the env)

conda list # lists installed packages in the env conda env export --no-builds > exported-env.yml # exports all packages in the env conda env export --from-history > exported-env.yml # exports the packages that were explicitly installed ```

Essential pytest hints

Add the following to the __init__.py file in your tests/ directory:

```python import sys

sys.path.append("src") ```

You can then run pytest from the main repo directory.

Essential GitHub action hints

Under workflows, select "New workflow" and choose the "Python application" option. Change the Python version to suit your application, and modify the triggers so that you can manually run the action:

yaml on: push: branches: [ "main" ] pull_request: branches: [ "main" ] workflow_dispatch:

Essential mkdocs commands

Ensure you are using a conda environment that has mkdocs and the required additional packages installed (you can install the ready-made mkdocs-env by running conda env create --file .devcontainer/env-files/mkdocs-env.yml and then activating it with conda activate mkdocs).

The following commands should be run from the main folder of your repository (where your pyproject.toml is). ```bash mkdocs new . # initialise a new mkdocs project

You can now edit the mkdocs yml file

TZ=UTC mkdocs serve # serve the mkdocs website without time zone errors

you may need to set up port forwarding to view the website

TZ=UTC mkdocs build # build your docs files in a /site dir TZ=UTC mkdocs gh-deploy # deploy the website - change settings on your gh repo to allow writing by actions ```

You should edit your mkdocs.yml to contain the following plugins so that it can find your docs:

```yaml site_name: NAME HERE

theme: name: "material"

plugins: - mkdocstrings: handlers: python: paths: [src] # search packages in the src folder

nav: - FILE NAME HERE: index.md ```

If you have added sensible and well-formatted comments and docstrings to your code, you can use the mkdocstring plugin to automatically build your documentation.

Simply include:

::: YOUR_PACKAGE_NAME

in one of the markdown files included in your docs (for example, index.md) to include any docs you have added to your package __init__.py file.

To include function-level documentation, just include:

::: YOUR_PACKAGE_NAME.MODULE_NAME For more detail on customising your mkdocs set-up and on writing good documentation, please see this fantastic RealPython tutorial.


Please keep the attribution below this divider section. Update the URL in the code snippet below to direct users to installing your package from a release.


To install the package with pip

Create a virtual environment with pip available. From within this env, simply run the pip install command with the url of the desired packaged binary:

bash python -m pip install https://github.com/murphyqm/swd3-testing-ghcodespaces-demo-repo/releases/download/v0.0.1-alpha.2/hypot-0.0.1.tar.gz

You can test that it has installed correctly by running: bash python -c "import hypot.calc;print(hypot.calc.squared(2))"

This repository was built using the template created by Maeve Murphy Quinlan (c) 2024 under the MIT license. See here for more details.

Owner

  • Login: AbhinavJindalNIHR
  • Kind: user

GitHub Events

Total
  • Issues event: 5
  • Watch event: 1
  • Issue comment event: 2
  • Member event: 1
  • Push event: 22
  • Public event: 1
  • Pull request event: 16
Last Year
  • Issues event: 5
  • Watch event: 1
  • Issue comment event: 2
  • Member event: 1
  • Push event: 22
  • Public event: 1
  • Pull request event: 16

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 8
  • Average time to close issues: 6 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.25
  • Average comments per pull request: 0.13
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 8
  • Average time to close issues: 6 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.25
  • Average comments per pull request: 0.13
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • AbhinavJindalNIHR (4)
Pull Request Authors
  • jinthebin (7)
  • AbhinavJindalNIHR (1)
Top Labels
Issue Labels
enhancement (2) help wanted (1) question (1)
Pull Request Labels