csc510_project_lectureaid
Project 1 for CSC510 SE Fall 21
Science Score: 41.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Repository
Project 1 for CSC510 SE Fall 21
Basic Info
- Host: GitHub
- Owner: mtkumar123
- License: mit
- Language: Python
- Default Branch: main
- Size: 15.9 MB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 22
- Open Issues: 8
- Releases: 0
Metadata Files
README.md
Project LectureAid
This repo contains the code for LectureAid v1.0.0, a project for CSC510 Fall 21.
What is Project LectureAid?
After a long class, ever had to come back home and google everything you supposedly learnt from the lecture handout for that day's class? Ever spend ~30 - 45 minutes just to search through google and compile a list of websites that explain what you need? And then a month later when midterms are around a corner, ever spend that same 30-45 minutes trying to find those websites again cause you forgot to bookmark them?
Project LectureAid hopes to solve that hassle for you.
Upload your lecture pdf to our user based terminal menu, and LectureAid will extract the text, process it, and search the internet for key topics from that lecture. Once it finds relevant results, LectureAid opens up a browser window with a list of questions relevant to your topic, and website links that should answer said questions, and also a wordcloud that highlights key words in the lecture.
Technologies Used
Text Extraction from pdfs was done with the help of PyMuPDF. Documentation can be viewed here: https://pymupdf.readthedocs.io/en/latest/
Word Processing Logic was done with the help of Spacy. Documentation can be viewed here: https://spacy.io/api/doc
Returning the questions with the relevant links was done with the help of peoplealsoask library. Documentation can be seen here: https://pypi.org/project/people-also-ask/
Requirements
- Python (atleast 3.8) and pip
- Microsoft Visual C++ Build tools
- google-api-python-client - Version 2.22.0 or greater
- peoplealsoask Version 0.0.6 or greater
- spacy - Version 3.1.2 or greater
- spacy-legacy - Version 3.0.8
- spacy models
- pyfiglet - Version 0.8.post1
- PyMuPDF - Version 1.18.19
- wordcloud - Version 1.8.1
- matplotlib - Version 3.4.3
Setup
- run
pip install -r requirements.txt- this installs all of the required python libraries
- run
pip install .- this installs the project as a python package
How to run
User uploads the lecture pdf through the terminal menu, LectureAid process the pdf and provides relevant results in questions and answers format through a browser window.
- Step 1: User Terminal Menu: (
python code/user_cli.py)
- Step 2: Press 1 to enter a pdf. Enter the path of the PDF to be uploaded, ( Upload any lecture PDF with relevant contents )
- Step 3: Browser Window displaying the search results and word cloud for the pdf uploaded.
Here is a GIF showing the complete process.

Documentation
More documentation can be viewed here: https://mtkumar123.github.io/CSC510ProjectLectureAid/
Troubleshooting
- When running the code/tests, I'm getting a
no such module named codeerror?- Try prefixing the command with
python -m, for example,python -m pytest
- Try prefixing the command with
- When I try to run pip install, I'm getting an error for wordcloud relating to Microsoft C++?
- Microsoft C++ build tools are needed to generate the wordcloud. See the requirements section for the download link.
Future work
Build a website for a GUI interface for the user
Our project is currently using a command line interface to get input, and output a .html file. A roadmap item would be to implement a website instead. This way the user would open up the LectureAid website, be able to add a file to the website, and click a button to process the file. Then, the website would display the results (wordcloud and question and answers). This will make it easier for users to use the project, without having to download/execute code locally.
Support for additional file types
Currently the project supports only PDF format for the uploaded files. In future other formats such as .ppt, .doc should be supported
Increase the concurrency efficiency
Currently, we are using the maximum number of threads (10) for running search queries, but could still be room for improvement using other multithreading/multiprocessing tools.
Improve Word Extraction Logic
Currently Spacy is being used to extract noun phrases from each slide/page of the document. Then the high frequency noun phrases are calculated and used in the final search query. However this causes an issue when every slide has the document’s author name and email address listed. The author name is considered as a noun phrase, and since it appears on every slide has a high frequency, and thus appears on the final search query.
Save favourite links to bookmarks
A button can be added beside each link in the results to save those links to browser bookmarks.
Build a browser extension
Build a browser extension which lets the user to select text from a webpage and send a request to the application and get the links of pdf webpages.
Contact Us
E-mail: lectureaidnscu@gmail.com
Team Members
- Ashley King
- Manoj Kumar
- Rakesh Muppala
- Sayali Parab
- Ashwin Das
- Renji Joseph Sabu
Owner
- Login: mtkumar123
- Kind: user
- Repositories: 3
- Profile: https://github.com/mtkumar123
Citation (CITATION.md)
[](https://doi.org/10.5281/zenodo.5528349)
```yaml
version: 1.0.0
authors:
- Ashley King
- Manoj Kumar
- Rakesh Muppala
- Sayali Parab
- Ashwin Das
- Renji Joseph Sabu
license: MIT License
repository-code: https://github.com/mtkumar123/CSC510_Project_LectureAid
identifiers:
- description: This is the collection of archived snapshots of version 1.0.0 of CSC510_Project_LectureAid
type: doi
value: 10.5281/zenodo.5528349
```
GitHub Events
Total
Last Year
Dependencies
- PyMuPDF ==1.18.19
- google-api-python-client >=2.22.0
- matplotlib ==3.4.3
- people_also_ask >=0.0.6
- pyfiglet ==0.8.post1
- spacy >=3.1.2
- spacy-legacy ==3.0.8
- wordcloud ==1.8.1
- PyMuPDF *
- en-core-web-lg *
- google-api-python-client *
- people_also_ask *
- pyfiglet *
- spacy ==3.1.2
- spacy-legacy ==3.0.8