Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: m0hill
  • License: mit
  • Language: HTML
  • Default Branch: master
  • Size: 23.4 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

CiteMyWeb

CiteMyWeb is a handy web app that simplifies the citation process for research articles. It's designed to save researchers time by extracting the appropriate Digital Object Identifier (DOI) directly from the article URL, rather than requiring manual input of the DOI or article name. CiteMyWeb is an open-source project, developed using Flask, and is hosted on Railway.

This tool currently supports more than 40 research article websites and provides a range of citation styles to choose from. Users can easily copy the generated citation and paste it into their research paper, thesis, or any other document.

In the future, I aim to extend CiteMyWeb to accommodate all types of articles, web pages, and books, broadening its usability beyond the academic community. This will include even deriving citations from Amazon URLs for books without the need for an International Standard Serial Number (ISSN) or other information.

The project is always looking for improvement, both in terms of additional features and the user interface/user experience (UI/UX).

Live App

Visit the live app here

Features

  • Automatic DOI Extraction: Uses Selenium and Beautiful Soup to scrape DOIs directly from the source page.
  • Multiple Citation Styles: Provides multiple citation styles to choose from.
  • Direct Copy-Paste: Easily copy the citation and use it in your research work.
  • URL-Based Citation Generation: Just input the URL of the research article and get the citation.
  • Flask and Railway Integration: Developed using Flask and hosted on Railway.

Future Goals

  • Extended Support: Planning to extend support to all kinds of articles, web pages, and books.
  • UI/UX Improvements: Ongoing UI/UX improvements to make the citation process seamless.
  • Enhanced Recognition: Working on the system to even recognize book citations from URLs like Amazon without requiring any other details.

Support

If you like this project, don't forget to give it a ⭐ on GitHub!

For any queries or suggestions, please feel free to open an issue on GitHub or reach out to us directly.

With CiteMyWeb, let's make citation easy for everyone!

Owner

  • Name: Mohil
  • Login: m0hill
  • Kind: user
  • Location: Tokyo

Citation (citation_fetcher.py)

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium_stealth import stealth
from bs4 import BeautifulSoup
import urllib.request
from urllib.error import HTTPError
import re


def setup_webdriver():
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("--no-sandbox") 
    options.add_argument("--disable-dev-shm-usage") 
    options.add_argument('--headless')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    options.add_experimental_option("detach", True)
    options.add_experimental_option('useAutomationExtension', False)

    driver = webdriver.Chrome(options=options)

    stealth(driver,
            languages=["en-US", "en"],
            vendor="Google Inc.",
            platform="Win32",
            webgl_vendor="Intel Inc.",
            renderer="Intel Iris OpenGL Engine",
            fix_hairline=True,
            )

    return driver


def get_doi_from_soup(soup, url):
    if "researchgate.net" in url:
        meta_doi = soup.find("meta", {"property": "citation_doi"})
    elif "thelancet.com" in url:
        meta_doi = soup.find("meta", {"name": "citation_doi"})
    else:
        meta_doi = None

    if meta_doi:
        doi_value = meta_doi["content"]
    else:
        doi_regex = r"\b(10\.\d{4,}/[\w./-]+)\b"
        doi_matches = re.findall(doi_regex, str(soup))

        if doi_matches:
            doi_value = max(doi_matches, key = doi_matches.count)
        else:
            doi_value = None

    return doi_value


def get_citation(url, style):
    driver = setup_webdriver()

    try:
        driver.get(url)

        page_source = driver.page_source
        soup = BeautifulSoup(page_source, "html.parser")

        doi_value = get_doi_from_soup(soup, url)
        if doi_value is None:
            raise ValueError("DOI not found.")

        BASE_URL = 'http://dx.doi.org/'
        url = BASE_URL + doi_value.strip()
        req = urllib.request.Request(url)
        req.add_header('Accept', f'text/x-bibliography; style={style}')

        with urllib.request.urlopen(req) as f:
            citation = f.read().decode()
        citation = citation.replace("Crossref", "")
        citation = citation.replace("<i>", "")
        citation = citation.replace("</i>", "")

    except HTTPError as e:
        if e.code == 404:
            raise ValueError("DOI not found.")
        else:
            raise ValueError("Service unavailable.")
    except Exception as e:
        raise ValueError(str(e))
    finally:
        driver.quit()

    return citation

GitHub Events

Total
Last Year