citation_extractor
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.6%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
·
Repository
Basic Info
- Host: GitHub
- Owner: Atulsah17
- Language: Python
- Default Branch: main
- Size: 3.91 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 2 years ago
· Last pushed about 2 years ago
Metadata Files
Readme
Citation
README.md
Citation Extractor
Description
This program extracts citations from a set of response texts based on the similarity between the response texts and source contexts obtained from an API.
Requirements
- Python 3.x
- requests
- difflib
Setup
- Clone the repository to your local machine.
- Install the required dependencies by running
pip install -r requirements.txt.
Usage
- Run the program by executing the
citation_extractor.pyfile. - The program will fetch data from the specified API, extract citations, and print the results.
Owner
- Name: Atul sah
- Login: Atulsah17
- Kind: user
- Twitter: atulsah17
- Repositories: 2
- Profile: https://github.com/Atulsah17
My name is Atul Sah. I'm a passionate frontend web developer, currently persuing my B.Tech.
Citation (citation_extractor.py)
import requests
import difflib
import json
API_URL = "https://devapi.beyondchats.com/api/get_message_with_sources"
def fetch_data(api_url):
all_data = []
current_page = 1
while True:
response = requests.get(api_url, params={'page': current_page})
if response.status_code != 200:
raise Exception(f"Failed to fetch data: {response.status_code}")
data = response.json()
all_data.extend(data['data']['data'])
if data['data']['current_page'] >= data['data']['last_page']:
break
current_page += 1
return all_data
def extract_citations(response_texts, source_contexts):
citations = []
cited_ids = set()
for response_text in response_texts:
for source_context in source_contexts:
similarity = difflib.SequenceMatcher(None, response_text, source_context['context']).ratio()
if similarity > 0.7 and source_context['link'] and source_context['id'] not in cited_ids:
citations.append({
"id": source_context['id'],
"link": source_context['link']
})
cited_ids.add(source_context['id'])
return citations
def main():
# Fetch data from the API
data = fetch_data(API_URL)
# Extract response texts and source contexts
response_texts = [item['response'] for item in data]
source_contexts = [{'id': source['id'], 'link': source['link'], 'context': source['context']}
for item in data for source in item.get('source', [])]
# Extract citations
citations = extract_citations(response_texts, source_contexts)
# Print response text, source context, and citations
for i, item in enumerate(data):
print(f"Response {i+1}:")
print(f"Context: {item['response']}")
print("Sources (JSON format):")
print(json.dumps(item['source'], indent=4))
print("Citations:")
for citation in citations:
print(json.dumps(citation, indent=4))
print("="*50)
if __name__ == "__main__":
main()