thesis-report-management-system-ci
A microservices Thesis Report Management System tailored for the School of Computer Science and Engineering at the International University
https://github.com/nhathuy1305/thesis-report-management-system-ci
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary
Keywords
Repository
A microservices Thesis Report Management System tailored for the School of Computer Science and Engineering at the International University
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
INTEGRATING DEVOPS AND NATURAL LANGUAGE PROCESSING TO STREAMLINE THESIS MANAGEMENT
This is the CI Pipeline Repository. For the CD Pipeline, check it out here.
1. About The Project
The current thesis report system at universities lacks advanced tools for content analysis and grammar detection, which are limited to faculty and difficult to integrate. This leads to errors and inefficiencies, especially during peak submission times.
I propose a new system that standardizes content analysis, grammar correction, and format validation, accessible to both students and faculty. This would reduce faculty workload and improve the quality of student work.
Such a system would create a fair academic environment, providing essential resources and enabling timely feedback.
1.1. Features
This system consists of:
- A thesis submission point that allows for multiple attempts.
- Evaluation services that generate comprehensive feedback based on the provided thesis writing guidelines.
- An interface that displays information of the thesis document with their visualized evaluation, customized based on the user role.
- Automate the build, test, and deployment process for the application services.
- Automate the deployment and management of application services on Kubernetes clusters.
- Perform running application services on the infrastructure (virtual machines).
- Collects and visualizes metrics about application performance and health.
1.2. Built With
The primary tools that are used to develop this application are: - Node.js - Express.js - React - RabbitMQ - Python libraries - PostgreSQL - Google Cloud Storage
The Open-source tools that are used to handle the CI/CD, Automation Test, Orchestration, Operating, Monitoring are: - Docker - Jenkins - Sonarqube - Trivy - ArgoCD - Kubernetes - Prometheus - Grafana
2. Getting Started
To set up and run the project locally, the following prerequisites should be satisfied before moving to the installation steps.
2.1. Prerequisites
Make sure the tools below are installed. The instructions are given based on different operating systems: - Docker: Encapsulates the dependencies that are needed to run the application (https://docs.docker.com/get-docker) - Git: Allows for cloning of this repository (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)
2.2. Installation
- Clone the repository.
sh git clone https://github.com/Nhathuy1305/Thesis-Report-Management-System-CI.git - Create a Google Cloud Storage bucket named thesisfilebucket.
- Upload the requirements folder to this bucket.
- Copy your Google Cloud credentials to google_credentials.json.
- Edit the submission deadline to your preferred date at line 193 of /postgresql/init.sql. It is recommended that this value should be after the current date on your system, otherwise, submission will be closed.
sh INSERT INTO public.deadline (deadline) VALUES (deadline); - cd to the file path of this project.
sh cd file_path - Run Docker compose command.
sh docker-compose up -d - If you want to stop the program.
sh docker-compose down
3. ARCHITECTURE
Before we can begin any action, we must identify all of the system's needs and its architecture. These are alterations to the system's structure so that all the best solutions are performed in detail.
3.1. Use Case Diagram
The student and teacher, the derivatives of the actor, are some of the system's main characters. They have a few singular activities but can also be exchanged for others.
3.2. Simplified System Architecture
The client-side provides an interface for entering data and seeing results. A REST API that answers such queries sends and processes messages to the message broker. Messages are routed and forwarded to the analysis services via the message broker. To save and retrieve the results, these services and the REST API have access to the file system.
3.3. Simplified Event-driven Architecture
The system comprises RabbitMQ for event brokering, various services as event producers (e.g., Format Checking and Grammar Detection), and event consumers that process and validate theses, with PostgreSQL for secure data storage.
3.4. Event-driven Microservices Architecture
The system uses RabbitMQ for message brokering, connecting modular services via a REST API built with Express.js, which handles client interactions and database integration for user data and metadata storage. Google Cloud Storage manages larger files, while RabbitMQ facilitates communication through fanout and direct exchanges, distributing tasks to various analysis services operating independently. Data flows from the REST API through RabbitMQ to analysis services, with outputs stored in Google Cloud Storage or returned to clients.
3.5. Database Design
Data management is crucial for any system, starting with a well-organized database to ensure speed and functionality. This system uses a relational database architecture to support complex queries, such as a teacher viewing all thesis submissions by students. There are two primary database structures: one for analytical services and one for the REST APIs (Express.js).
3.6. Git Workflow
The updates in Git trigger the automated build and push of container images to a registry. DevOps tooling synchronizes these updates with the Kubernetes cluster.
3.7. CI/CD Pipeline and Cluster Layers
Continuous Integration (CI) and Continuous Deployment (CD) are managed using Jenkins, with ArgoCD automatically monitoring and deploying changes to the Kubernetes cluster. Prometheus and Grafana handle monitoring, providing real-time insights into the application's performance and health. Code quality is ensured through SonarQube during the CI process.
3.8. Kubernetes Cluster
It is a system broken down into smaller components, each of which can act independently and has a specific role. Such an approach is modular, which means it can be developed, deployed, and scaled independently. This means we can change the service without changing the whole system.
Docker containerization packages all dependencies, the runtime environment, and the code for each microservice. It ensures that each environment consistently executes the same way anywhere, making deployment and maintenance easier.
4. Usage
Access http://localhost:3000 on your browser. Sign in with any student ID, instructor ID or admin ID given in the /postgresql/init.sql file.
Examples: - Student ID: ITITIU20043 - Instructor ID: ITITEACH001 - Admin ID: ITITADMIN01
Provide inputs to the thesis submission form.
Select any service on the left-hand navigation area to view the evaluation.
Options to download the thesis document, services' results, viewing the guidelines and resubmission are available.
Instructors can view the thesis reports and evaluations of students they supervise and give manual feedback.
Admins can edit the deadline, access the list of student submissions, send notifications to students and instructors.
5. Evaluation
The chart above illustrates the performance of the Kubernetes node across three different tests, comparing the total response time and throughput as the number of requests increases from 100 to 100,000. While the total time increases with the number of requests, the throughput also substantially rises, indicating the node's effective scaling and handling capabilities under varying loads.
I performed on all the Python services, precisely its upload functionality in this part. I sent ten reports from each user to the system. The CPU utilization of node 3 (worker 2) peaked at approximately 97.3%. The controller node was crucial in this scenario, facilitating data distribution between workers one and 2. This spike in CPU usage, depicted in the graph, reflects the system's response to the increased workload, underscoring the effectiveness of the controller node in managing and balancing the load between the worker nodes.
6. Future Work
The architecture used to develop this system allows for ease of updating, adding or removing services due to loose coupling.
New services can be added with the following steps. For example, a service that detects and checks figure captions:
- Copy a folder of any service and assign it a new name, for example "figurecaptioncheck".
- Add the information for this service in the docker-compose.yml file.
- Add a table for it in /postgresql/init.sql.
- Update the name of this service in the client's environment variable.
sh
REACT_APP_SERVICE_LIST: "...,figure_caption_check"
- Update the name of this service in the backend's environment variable.
sh
SERVICE_LIST: "...,figure_caption_check"
- If needed, add a requirement (guidelines) file for the service by uploading to its file path /requirements/figurecaptioncheck/requirement.txt in the cloud storage bucket.
```sh
example requirement.txt
Figure num.num. Text text text Correct: Figure 1.1. Tech stack Incorrect: Figure 1-1 tech stack, Picture 1. Tech stack, No figure caption... ``` - The main algorithm can then be updated in /figurecaptioncheck/processor.py.
7. Contact
- LinkedIn: https://www.linkedin.com/in/nhathuy1305
- Email: dangnhathuy.work@gmail.com
8. Acknowledgement
I would like to use this opportunity to thank Dr. Tran Thanh Tung from the bottom of my heart. His advice has been crucial throughout my thesis assignment. His insightful advice pushed me to develop my work in a much stronger direction, and I couldn't have done it without him.
I also want to thank the School of Computer Science and Engineering at the International University. The fundamental information covered in the curriculum gives me the confidence to carry out this study. It has been a pleasant experience, and I was able to participate in the Computer Science program and learn from experienced lecturers.
Owner
- Name: Dang Nhat Huy
- Login: Nhathuy1305
- Kind: user
- Location: Ho Chi Minh, Viet Nam
- Company: International University - VNUHCMC
- Website: https://daniel-cv.vercel.app
- Repositories: 3
- Profile: https://github.com/Nhathuy1305
💻 Implementing Ideas @ IUProjectTeam 🕵️♀️ Web App Dev | Data Science | DevOps ✍ DSA + AI (Learning and practicing) 🎓 CSE Undergraduate @ IU'20
Citation (citation/.env)
APP_NAME=citation RABBITMQ_HOST="amqp://rabbitmq:5672" RABBITMQ_USER=guest RABBITMQ_PASSWORD=guest RABBITMQ_PORT=5672 RABBITMQ_FILE_LOCATION_EXCHANGE=uploaded_file_location RABBITMQ_OUTPUT_LOCATION_EXCHANGE=output_location DATABASE_NAME=citation DATABASE_USER=postgres DATABASE_PASSWORD=123456 DATABASE_HOST=postgresql DATABASE_PORT=5432 GOOGLE_CLOUD_STORAGE_BUCKET=thesis_file_bucket GOOGLE_APPLICATION_CREDENTIALS=/app/google_credentials.json ROOT_DIR=/app
GitHub Events
Total
- Fork event: 2
Last Year
- Fork event: 2
Dependencies
- python 3.9-slim-bullseye build
- python 3.9-slim-bullseye build
- python 3.9-slim-bullseye build
- nginx stable-alpine build
- node 18-alpine build
- chapter_summarization latest
- chapter_title latest
- citation latest
- client latest
- format_check latest
- grammar latest
- page_count latest
- postgresql latest
- rabbitmq latest
- rest latest
- table_figure_detection latest
- table_of_content latest
- word_frequency latest
- python 3.9-slim-bullseye build
- python 3.9-slim-bullseye build
- python 3.9-slim-bullseye build
- postgres 11-alpine build
- rabbitmq 3-management build
- node 18-alpine build
- python 3.9-slim-bullseye build
- python 3.9-slim-bullseye build
- 1262 dependencies
- react-scripts ^5.0.1 development
- sass ^1.64.2 development
- axios ^1.4.0
- d3 ^7.8.5
- react ^18.2.0
- react-datepicker ^4.16.0
- react-dom ^18.2.0
- react-router-dom ^6.14.2
- 290 dependencies
- @types/amqplib ^0.10.1 development
- @types/cors ^2.8.13 development
- @types/express ^4.17.15 development
- @types/multer ^1.4.7 development
- @types/node ^18.11.18 development
- @types/pdfkit ^0.12.8 development
- @types/uuid ^9.0.0 development
- cors ^2.8.5 development
- nodemon ^2.0.20 development
- pdfkit ^0.13.0 development
- ts-node ^10.9.1 development
- typescript ^4.9.4 development
- uuid ^9.0.0 development
- @google-cloud/storage ^6.10.1
- amqplib ^0.10.3
- body-parser ^1.20.1
- dotenv ^16.0.3
- express ^4.18.2
- multer ^1.4.5-lts.1
- pg ^8.8.0
- prom-client ^15.1.0
- punycode ^2.3.1
- sequelize ^6.28.0
- google-cloud-storage *
- numpy *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- sumy *
- Levenshtein *
- google-cloud-storage *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- google-cloud-storage *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- refextract *
- google-cloud-storage *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- google-cloud-storage *
- language_tool_python *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- google-cloud-storage *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- Abstract *
- Appendix *
- Chapter1Introduction *
- Chapter2LiteratureReview *
- Chapter3Implementations *
- Chapter4ImplementandResults *
- Chapter5DiscussionandEvaluation *
- Chapter6ConclusionandFutureWork *
- References *
- Chaptertitles *
- Mainsectiontitles *
- Titleformattingguidelines *
- CHAPTER1 *
- CHAPTER2 *
- CHAPTER3 *
- CHAPTER4 *
- CHAPTER5 *
- CHAPTER6 *
- REFERENCES *
- TOTAL *
- additionally *
- after *
- all *
- allow *
- allowed *
- allows *
- also *
- although *
- am *
- an *
- and *
- another *
- are *
- as *
- at *
- be *
- because *
- been *
- before *
- but *
- by *
- can *
- chapter *
- consequently *
- could *
- despite *
- due *
- each *
- either *
- example *
- few *
- fewest *
- figure *
- finally *
- first *
- for *
- from *
- had *
- has *
- have *
- he *
- her *
- hers *
- his *
- however *
- if *
- in *
- included *
- includes *
- including *
- into *
- is *
- it *
- its *
- least *
- lot *
- made *
- make *
- makes *
- many *
- more *
- most *
- neither *
- no *
- not *
- of *
- on *
- one *
- only *
- or *
- other *
- same *
- second *
- she *
- some *
- specifically *
- such *
- table *
- that *
- the *
- their *
- them *
- there *
- therefore *
- these *
- they *
- third *
- this *
- those *
- though *
- to *
- us *
- use *
- used *
- uses *
- using *
- was *
- we *
- when *
- where *
- which *
- who *
- whom *
- will *
- with *
- would *
- yes *
- you *
- your *
- google-cloud-storage *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *
- google-cloud-storage *
- pdfplumber *
- pika *
- psycopg2-binary ==2.9.1
- python-dotenv *