inforoots

SD&D Class Project

https://github.com/phantomlei3/inforoots

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

SD&D Class Project

Basic Info
  • Host: GitHub
  • Owner: phantomlei3
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 1.63 MB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 3
  • Open Issues: 8
  • Releases: 0
Created over 6 years ago · Last pushed over 3 years ago
Metadata Files
Readme Citation

README.md

InfoRoots

A Software Design & Documentation Project at RPI

Team Members

  • Siwen Zhang
  • Joseph Om
  • Jianing Lin
  • Lei Luo

Vision Statements

Executive Summary

Fake news is prevalent over the internet, especially on climate change and vaccinations. The online average readers do not have enough time and knowledge to identify false information. InfoRoots uses automated information retrieval to fight against fake news and misinformation. Our web platform facilitates users to investigate online news and articles by providing analytic information about authors, publishers, and contents. By using InfoRoots, users will make accurate judgments on false information with small effort.

Market Potential

Fact-checking is the traditional approach to intervene in fake news. Fact-checking websites provide analytic reviews of news and factual claims by using journalism experts. PolitiFact, Snopes, and FactCheck.org are three mainstream fact-checking organizations, providing fact-checked articles on their websites. Instead of focusing on articles, NewsGuard generates professional reviews of online news sources and publishers. Users can use NewsGuard’s browser extension to check reviews of publishers when reading news and articles. Since professional fact-checking requires a large amount of time, they cannot cover every claim and article over the internet. Crowdsourcing and machine learning are relatively new approaches to this market. Our.news is developing browser extension to provide readers both publishers’ information and crowdsourcing reviews. It is not effective due to a small user group so far. InfoRoots will be an innovative business based on machine learning in this market.

Stakeholders

InfoRoots has two groups of project stakeholders: team instructors and team members. Team instructors consist of one overall supervisor and several teaching assistants. In InfoRoots’ team instructors, John Sturman is the overall supervisor. Charly Huang and Vaishnavi Neema are two teaching assistants. Their responsibility is to facilitate InfoRoots to successfully develop and launch to its market. As an experienced project manager, John Sturman offers courses on the design and development of InfoRoots to team members. Two teaching assistances provide feedback to the deliverables of InfoRoots.

In team members, there are four undergraduate students. They are Siwen Zhang, Joseph Om, Jianing Lin, and Lei Luo. Under the Scrum framework, Lei Luo functions as both the project owner and Scrum Master. All team members function as designers and programmers to develop the InfoRoots web platform.

Major Features

InfoRoots web platform is designed to investigate online articles. When online readers enter one article link on InfoRoots, they will see three major features that can help them determine whether the contents in the article are false.

The first feature is the authors’ information. It presents not only the background information of authors from Wikipedia sources but also reliability scores measured by our machine learning algorithm. The algorithm produces scores based on examining recent articles written by the authors.

The second feature is the publisher’s information. It offers the professional publisher ratings from non-partisan fact-checking organizations, such as NewsGuard. Besides, it also presents the ratings of other publishers that generate similar content. Our users can evaluate the credibility of information by comparing different publishers.

The third feature is the citation and content analysis. The analysis system pinpoints all citations in the original article and extracts relevant paragraphs from these citations. The relevant paragraphs are shown to readers when they click at each citation. As our users read through the article on InfoRoots, they can check two reliability factors. The first one is whether the cited information came from reliable publishers. The second one is whether the content in the citations is presented accurately in the original article.

Major Risks

InfoRoots has two potential risks. The first risk is related to the completion of InfoRoots. Each proposed feature requires a certain level of knowledge on machine learning and data scraping. Since all team members have little experience in developing major features mentioned above, they will spend the majority of their time exploring and researching phase. The final deliverable might contain uncompleted features. Shorter work cycles can mitigate this risk as it allows agile reviews and revises on developing features.

The second risk is that all proposed features require a lot of computation powers. To test major features, InfoRoots might spend a lot of money on subscribing to cloud computing services. If one of the major features costs expensive computing resources, the team might revise the expensive feature in order to save money. That is, three major features are highly subjected to changes.

Owner

  • Name: phantomlei
  • Login: phantomlei3
  • Kind: user

Citation (citationsNetwork.py)

import hashlib
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from multiprocessing import Process
from PostgreSQL.database import database
from article import article


class citationsNetwork:
    '''
        citationsNetwork represents all citations in one article
        It communicated with database mediator to obtain data and invoke scrapy crawler to extract information.
        It contains all functions related to build citations network on the user interface.
        '''

    def __init__(self, article_id):
        '''
        The initiator of citationsNetwork only assigns variables from outside and create database mediator

        '''
        self.article_id = article_id
        self.db = database()

    def get(self):
        '''
        The main function in citationsNetwork to check if publisher information exists in citations table
        And use Article class to obtain information for each citation link
        :return: a json dictionary that contains:
                'article_paragraphs':  []
                'citation_links':   []
                'citation_info': {'link': {'article_title', 'article_content', 'article_credibility'}}
        :return None if article is not in database
        '''

        json_dict = dict()

        # extract information from database
        citation_results = self.db.lookup_citation(self.article_id)
        if citation_results is not None:
            article_paragraphs = citation_results[0]
            citation_links = citation_results[1]

            # obtain specific info for each citation link
            ## TODO: Advance parallel processing
            ## NOW: limit three citations only (for convenience)
            citation_info = dict()
            count = 0
            for i in range(len(citation_links)):
                one_info = dict()
                # skip non-existed citations
                if citation_links[i] == "None" or count >= 3:
                    citation_links[i] = "None"
                    continue
                cited_article = article(citation_links[i])
                article_result = cited_article.get()
                if article_result is not None:
                    # create inner dict to store information for one citation
                    one_info['article_title'] = article_result['article_title']
                    one_info['article_content'] = article_result['article_content']
                    one_info['article_credibility'] = article_result['article_reliability']
                    citation_info[citation_links[i]] = one_info
                    count += 1
                else:
                    # delete non-profile citation link
                    citation_links[i] = "None"

            json_dict['article_paragraphs'] = article_paragraphs
            json_dict['citation_links'] = citation_links
            json_dict['citation_info'] = citation_info

            return json_dict
        else:
            return None







GitHub Events

Total
Last Year

Dependencies

web_backend/package-lock.json npm
  • 132 dependencies
web_backend/package.json npm
  • axios ^0.19.2
  • cors ^2.8.5
  • express ^4.17.1
  • history ^4.10.1
  • react-router-dom ^5.1.2
  • zeromq ^5.2.0
web_userinterface/package-lock.json npm
  • 1398 dependencies
web_userinterface/package.json npm
  • @iconify/icons-oi ^1.0.3
  • @iconify/react ^1.1.3
  • axios ^0.19.0
  • bulma ^0.7.5
  • classnames ^2.2.6
  • jwt-decode ^2.2.0
  • react ^16.9.0
  • react-bulma-components ^2.3.0
  • react-countdown-circle-timer 1.0.6
  • react-dom ^16.9.0
  • react-rater ^5.1.1
  • react-redux ^7.1.1
  • react-router-dom ^5.1.2
  • react-scripts 3.1.1
  • react-scroll-up-button ^1.6.4
  • redux ^4.0.4
  • redux-thunk ^2.3.0
  • styled-components ^4.3.2