citemyweb
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: m0hill
- License: mit
- Language: HTML
- Default Branch: master
- Size: 23.4 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
CiteMyWeb
CiteMyWeb is a handy web app that simplifies the citation process for research articles. It's designed to save researchers time by extracting the appropriate Digital Object Identifier (DOI) directly from the article URL, rather than requiring manual input of the DOI or article name. CiteMyWeb is an open-source project, developed using Flask, and is hosted on Railway.
This tool currently supports more than 40 research article websites and provides a range of citation styles to choose from. Users can easily copy the generated citation and paste it into their research paper, thesis, or any other document.
In the future, I aim to extend CiteMyWeb to accommodate all types of articles, web pages, and books, broadening its usability beyond the academic community. This will include even deriving citations from Amazon URLs for books without the need for an International Standard Serial Number (ISSN) or other information.
The project is always looking for improvement, both in terms of additional features and the user interface/user experience (UI/UX).
Live App
Visit the live app here
Features
- Automatic DOI Extraction: Uses Selenium and Beautiful Soup to scrape DOIs directly from the source page.
- Multiple Citation Styles: Provides multiple citation styles to choose from.
- Direct Copy-Paste: Easily copy the citation and use it in your research work.
- URL-Based Citation Generation: Just input the URL of the research article and get the citation.
- Flask and Railway Integration: Developed using Flask and hosted on Railway.
Future Goals
- Extended Support: Planning to extend support to all kinds of articles, web pages, and books.
- UI/UX Improvements: Ongoing UI/UX improvements to make the citation process seamless.
- Enhanced Recognition: Working on the system to even recognize book citations from URLs like Amazon without requiring any other details.
Support
If you like this project, don't forget to give it a ⭐ on GitHub!
For any queries or suggestions, please feel free to open an issue on GitHub or reach out to us directly.
With CiteMyWeb, let's make citation easy for everyone!
Owner
- Name: Mohil
- Login: m0hill
- Kind: user
- Location: Tokyo
- Repositories: 1
- Profile: https://github.com/m0hill
Citation (citation_fetcher.py)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
from bs4 import BeautifulSoup
import urllib.request
from urllib.error import HTTPError
import re
def setup_webdriver():
options = Options()
options.add_argument("start-maximized")
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument('--headless')
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option("detach", True)
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
return driver
def get_doi_from_soup(soup, url):
if "researchgate.net" in url:
meta_doi = soup.find("meta", {"property": "citation_doi"})
elif "thelancet.com" in url:
meta_doi = soup.find("meta", {"name": "citation_doi"})
else:
meta_doi = None
if meta_doi:
doi_value = meta_doi["content"]
else:
doi_regex = r"\b(10\.\d{4,}/[\w./-]+)\b"
doi_matches = re.findall(doi_regex, str(soup))
if doi_matches:
doi_value = max(doi_matches, key = doi_matches.count)
else:
doi_value = None
return doi_value
def get_citation(url, style):
driver = setup_webdriver()
try:
driver.get(url)
page_source = driver.page_source
soup = BeautifulSoup(page_source, "html.parser")
doi_value = get_doi_from_soup(soup, url)
if doi_value is None:
raise ValueError("DOI not found.")
BASE_URL = 'http://dx.doi.org/'
url = BASE_URL + doi_value.strip()
req = urllib.request.Request(url)
req.add_header('Accept', f'text/x-bibliography; style={style}')
with urllib.request.urlopen(req) as f:
citation = f.read().decode()
citation = citation.replace("Crossref", "")
citation = citation.replace("<i>", "")
citation = citation.replace("</i>", "")
except HTTPError as e:
if e.code == 404:
raise ValueError("DOI not found.")
else:
raise ValueError("Service unavailable.")
except Exception as e:
raise ValueError(str(e))
finally:
driver.quit()
return citation