Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: Atulsah17
  • Language: Python
  • Default Branch: main
  • Size: 3.91 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme Citation

README.md

Citation Extractor

Description

This program extracts citations from a set of response texts based on the similarity between the response texts and source contexts obtained from an API.

Requirements

  • Python 3.x
  • requests
  • difflib

Setup

  1. Clone the repository to your local machine.
  2. Install the required dependencies by running pip install -r requirements.txt.

Usage

  1. Run the program by executing the citation_extractor.py file.
  2. The program will fetch data from the specified API, extract citations, and print the results.

Owner

  • Name: Atul sah
  • Login: Atulsah17
  • Kind: user

My name is Atul Sah. I'm a passionate frontend web developer, currently persuing my B.Tech.

Citation (citation_extractor.py)

import requests
import difflib
import json

API_URL = "https://devapi.beyondchats.com/api/get_message_with_sources"

def fetch_data(api_url):
    all_data = []
    current_page = 1
    while True:
        response = requests.get(api_url, params={'page': current_page})
        if response.status_code != 200:
            raise Exception(f"Failed to fetch data: {response.status_code}")
        data = response.json()
        all_data.extend(data['data']['data'])
        if data['data']['current_page'] >= data['data']['last_page']:
            break
        current_page += 1
    return all_data

def extract_citations(response_texts, source_contexts):
    citations = []
    cited_ids = set()  
    for response_text in response_texts:
        for source_context in source_contexts:
            similarity = difflib.SequenceMatcher(None, response_text, source_context['context']).ratio()
            if similarity > 0.7 and source_context['link'] and source_context['id'] not in cited_ids:
                citations.append({
                    "id": source_context['id'],
                    "link": source_context['link']
                })
                cited_ids.add(source_context['id'])  
    return citations

def main():
    # Fetch data from the API
    data = fetch_data(API_URL)
    
    # Extract response texts and source contexts
    response_texts = [item['response'] for item in data]
    source_contexts = [{'id': source['id'], 'link': source['link'], 'context': source['context']} 
                       for item in data for source in item.get('source', [])]
    
    # Extract citations
    citations = extract_citations(response_texts, source_contexts)
    
    # Print response text, source context, and citations
    for i, item in enumerate(data):
        print(f"Response {i+1}:")
        print(f"Context: {item['response']}")
        print("Sources (JSON format):")
        print(json.dumps(item['source'], indent=4))
        print("Citations:")
        for citation in citations:
            print(json.dumps(citation, indent=4))
        print("="*50)

if __name__ == "__main__":
    main()

GitHub Events

Total
Last Year