citation_extractor

https://github.com/atulsah17/citation_extractor

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.6%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: Atulsah17
Language: Python
Default Branch: main
Size: 3.91 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme Citation

README.md

Citation Extractor

Description

This program extracts citations from a set of response texts based on the similarity between the response texts and source contexts obtained from an API.

Requirements

Python 3.x
requests
difflib

Setup

Clone the repository to your local machine.
Install the required dependencies by running pip install -r requirements.txt.

Usage

Run the program by executing the citation_extractor.py file.
The program will fetch data from the specified API, extract citations, and print the results.

Owner

Name: Atul sah
Login: Atulsah17
Kind: user

Twitter: atulsah17
Repositories: 2
Profile: https://github.com/Atulsah17

My name is Atul Sah. I'm a passionate frontend web developer, currently persuing my B.Tech.

Citation (citation_extractor.py)

import requests
import difflib
import json

API_URL = "https://devapi.beyondchats.com/api/get_message_with_sources"

def fetch_data(api_url):
    all_data = []
    current_page = 1
    while True:
        response = requests.get(api_url, params={'page': current_page})
        if response.status_code != 200:
            raise Exception(f"Failed to fetch data: {response.status_code}")
        data = response.json()
        all_data.extend(data['data']['data'])
        if data['data']['current_page'] >= data['data']['last_page']:
            break
        current_page += 1
    return all_data

def extract_citations(response_texts, source_contexts):
    citations = []
    cited_ids = set()  
    for response_text in response_texts:
        for source_context in source_contexts:
            similarity = difflib.SequenceMatcher(None, response_text, source_context['context']).ratio()
            if similarity > 0.7 and source_context['link'] and source_context['id'] not in cited_ids:
                citations.append({
                    "id": source_context['id'],
                    "link": source_context['link']
                })
                cited_ids.add(source_context['id'])  
    return citations

def main():
    # Fetch data from the API
    data = fetch_data(API_URL)
    
    # Extract response texts and source contexts
    response_texts = [item['response'] for item in data]
    source_contexts = [{'id': source['id'], 'link': source['link'], 'context': source['context']} 
                       for item in data for source in item.get('source', [])]
    
    # Extract citations
    citations = extract_citations(response_texts, source_contexts)
    
    # Print response text, source context, and citations
    for i, item in enumerate(data):
        print(f"Response {i+1}:")
        print(f"Context: {item['response']}")
        print("Sources (JSON format):")
        print(json.dumps(item['source'], indent=4))
        print("Citations:")
        for citation in citations:
            print(json.dumps(citation, indent=4))
        print("="*50)

if __name__ == "__main__":
    main()

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science