thesis-report-management-system-ci

A microservices Thesis Report Management System tailored for the School of Computer Science and Engineering at the International University

https://github.com/nhathuy1305/thesis-report-management-system-ci

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary

Keywords

argocd docker event-driven jenkins kubernetes

Last synced: 6 months ago · JSON representation ·

Repository

A microservices Thesis Report Management System tailored for the School of Computer Science and Engineering at the International University

Basic Info

Host: GitHub
Owner: Nhathuy1305
Language: Python
Default Branch: master
Homepage:
Size: 7.2 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

argocd docker event-driven jenkins kubernetes

Created about 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation Security

INTEGRATING DEVOPS AND NATURAL LANGUAGE PROCESSING TO STREAMLINE THESIS MANAGEMENT

This is the CI Pipeline Repository. For the CD Pipeline, check it out here.

1. About The Project

The current thesis report system at universities lacks advanced tools for content analysis and grammar detection, which are limited to faculty and difficult to integrate. This leads to errors and inefficiencies, especially during peak submission times.

I propose a new system that standardizes content analysis, grammar correction, and format validation, accessible to both students and faculty. This would reduce faculty workload and improve the quality of student work.

Such a system would create a fair academic environment, providing essential resources and enabling timely feedback.

1.1. Features

This system consists of:

A thesis submission point that allows for multiple attempts.
Evaluation services that generate comprehensive feedback based on the provided thesis writing guidelines.
An interface that displays information of the thesis document with their visualized evaluation, customized based on the user role.
Automate the build, test, and deployment process for the application services.
Automate the deployment and management of application services on Kubernetes clusters.
Perform running application services on the infrastructure (virtual machines).
Collects and visualizes metrics about application performance and health.

1.2. Built With

The primary tools that are used to develop this application are: - Node.js - Express.js - React - RabbitMQ - Python libraries - PostgreSQL - Google Cloud Storage

The Open-source tools that are used to handle the CI/CD, Automation Test, Orchestration, Operating, Monitoring are: - Docker - Jenkins - Sonarqube - Trivy - ArgoCD - Kubernetes - Prometheus - Grafana

2. Getting Started

To set up and run the project locally, the following prerequisites should be satisfied before moving to the installation steps.

2.1. Prerequisites

Make sure the tools below are installed. The instructions are given based on different operating systems: - Docker: Encapsulates the dependencies that are needed to run the application (https://docs.docker.com/get-docker) - Git: Allows for cloning of this repository (https://git-scm.com/book/en/v2/Getting-Started-Installing-Git)

2.2. Installation

Clone the repository. sh git clone https://github.com/Nhathuy1305/Thesis-Report-Management-System-CI.git
Create a Google Cloud Storage bucket named thesisfilebucket.
Upload the requirements folder to this bucket.
Copy your Google Cloud credentials to google_credentials.json.
Edit the submission deadline to your preferred date at line 193 of /postgresql/init.sql. It is recommended that this value should be after the current date on your system, otherwise, submission will be closed. sh INSERT INTO public.deadline (deadline) VALUES (deadline);
cd to the file path of this project. sh cd file_path
Run Docker compose command. sh docker-compose up -d
If you want to stop the program. sh docker-compose down

3. ARCHITECTURE

Before we can begin any action, we must identify all of the system's needs and its architecture. These are alterations to the system's structure so that all the best solutions are performed in detail.

3.1. Use Case Diagram

Use Case Diagram

The student and teacher, the derivatives of the actor, are some of the system's main characters. They have a few singular activities but can also be exchanged for others.

3.2. Simplified System Architecture

Simplified System Architecture

The client-side provides an interface for entering data and seeing results. A REST API that answers such queries sends and processes messages to the message broker. Messages are routed and forwarded to the analysis services via the message broker. To save and retrieve the results, these services and the REST API have access to the file system.

3.3. Simplified Event-driven Architecture

Simplified Event-driven Architecture

The system comprises RabbitMQ for event brokering, various services as event producers (e.g., Format Checking and Grammar Detection), and event consumers that process and validate theses, with PostgreSQL for secure data storage.

3.4. Event-driven Microservices Architecture

System Architecture

The system uses RabbitMQ for message brokering, connecting modular services via a REST API built with Express.js, which handles client interactions and database integration for user data and metadata storage. Google Cloud Storage manages larger files, while RabbitMQ facilitates communication through fanout and direct exchanges, distributing tasks to various analysis services operating independently. Data flows from the REST API through RabbitMQ to analysis services, with outputs stored in Google Cloud Storage or returned to clients.

3.5. Database Design

Database Design

Data management is crucial for any system, starting with a well-organized database to ensure speed and functionality. This system uses a relational database architecture to support complex queries, such as a teacher viewing all thesis submissions by students. There are two primary database structures: one for analytical services and one for the REST APIs (Express.js).

3.6. Git Workflow

Git Workflow

The updates in Git trigger the automated build and push of container images to a registry. DevOps tooling synchronizes these updates with the Kubernetes cluster.

3.7. CI/CD Pipeline and Cluster Layers

CI/CD Pipeline and Cluster Layers

Continuous Integration (CI) and Continuous Deployment (CD) are managed using Jenkins, with ArgoCD automatically monitoring and deploying changes to the Kubernetes cluster. Prometheus and Grafana handle monitoring, providing real-time insights into the application's performance and health. Code quality is ensured through SonarQube during the CI process.

3.8. Kubernetes Cluster

Detail Kubernetes Cluster

It is a system broken down into smaller components, each of which can act independently and has a specific role. Such an approach is modular, which means it can be developed, deployed, and scaled independently. This means we can change the service without changing the whole system.

Docker containerization packages all dependencies, the runtime environment, and the code for each microservice. It ensures that each environment consistently executes the same way anywhere, making deployment and maintenance easier.

4. Usage

Access http://localhost:3000 on your browser. Sign in with any student ID, instructor ID or admin ID given in the /postgresql/init.sql file.

Examples: - Student ID: ITITIU20043 - Instructor ID: ITITEACH001 - Admin ID: ITITADMIN01

Provide inputs to the thesis submission form.

Submission Form

Select any service on the left-hand navigation area to view the evaluation.

Result

Chart

Options to download the thesis document, services' results, viewing the guidelines and resubmission are available.

Options

Instructors can view the thesis reports and evaluations of students they supervise and give manual feedback.

Student Submissions

Manual Feedback

Admins can edit the deadline, access the list of student submissions, send notifications to students and instructors.

Admin UI

5. Evaluation

Benchmark Tesing

The chart above illustrates the performance of the Kubernetes node across three different tests, comparing the total response time and throughput as the number of requests increases from 100 to 100,000. While the total time increases with the number of requests, the throughput also substantially rises, indicating the node's effective scaling and handling capabilities under varying loads.

CPU

I performed on all the Python services, precisely its upload functionality in this part. I sent ten reports from each user to the system. The CPU utilization of node 3 (worker 2) peaked at approximately 97.3%. The controller node was crucial in this scenario, facilitating data distribution between workers one and 2. This spike in CPU usage, depicted in the graph, reflects the system's response to the increased workload, underscoring the effectiveness of the controller node in managing and balancing the load between the worker nodes.

6. Future Work

The architecture used to develop this system allows for ease of updating, adding or removing services due to loose coupling.

New services can be added with the following steps. For example, a service that detects and checks figure captions: - Copy a folder of any service and assign it a new name, for example "figurecaptioncheck". - Add the information for this service in the docker-compose.yml file. - Add a table for it in /postgresql/init.sql. - Update the name of this service in the client's environment variable. sh REACT_APP_SERVICE_LIST: "...,figure_caption_check" - Update the name of this service in the backend's environment variable. sh SERVICE_LIST: "...,figure_caption_check" - If needed, add a requirement (guidelines) file for the service by uploading to its file path /requirements/figurecaptioncheck/requirement.txt in the cloud storage bucket. ```sh

example requirement.txt

Figure num.num. Text text text Correct: Figure 1.1. Tech stack Incorrect: Figure 1-1 tech stack, Picture 1. Tech stack, No figure caption... ``` - The main algorithm can then be updated in /figurecaptioncheck/processor.py.

7. Contact

LinkedIn: https://www.linkedin.com/in/nhathuy1305
Email: dangnhathuy.work@gmail.com

8. Acknowledgement

I would like to use this opportunity to thank Dr. Tran Thanh Tung from the bottom of my heart. His advice has been crucial throughout my thesis assignment. His insightful advice pushed me to develop my work in a much stronger direction, and I couldn't have done it without him.

I also want to thank the School of Computer Science and Engineering at the International University. The fundamental information covered in the curriculum gives me the confidence to carry out this study. It has been a pleasant experience, and I was able to participate in the Computer Science program and learn from experienced lecturers.

Owner

Name: Dang Nhat Huy
Login: Nhathuy1305
Kind: user
Location: Ho Chi Minh, Viet Nam
Company: International University - VNUHCMC

Website: https://daniel-cv.vercel.app
Repositories: 3
Profile: https://github.com/Nhathuy1305

💻 Implementing Ideas @ IUProjectTeam 🕵️‍♀️ Web App Dev | Data Science | DevOps ✍ DSA + AI (Learning and practicing) 🎓 CSE Undergraduate @ IU'20

Citation (citation/.env)

APP_NAME=citation
RABBITMQ_HOST="amqp://rabbitmq:5672"
RABBITMQ_USER=guest
RABBITMQ_PASSWORD=guest
RABBITMQ_PORT=5672
RABBITMQ_FILE_LOCATION_EXCHANGE=uploaded_file_location
RABBITMQ_OUTPUT_LOCATION_EXCHANGE=output_location
DATABASE_NAME=citation
DATABASE_USER=postgres
DATABASE_PASSWORD=123456
DATABASE_HOST=postgresql
DATABASE_PORT=5432
GOOGLE_CLOUD_STORAGE_BUCKET=thesis_file_bucket
GOOGLE_APPLICATION_CREDENTIALS=/app/google_credentials.json
ROOT_DIR=/app

GitHub Events

Total

Fork event: 2

Last Year

Fork event: 2

Dependencies

chapter_summarization/Dockerfile docker

python 3.9-slim-bullseye build

chapter_title/Dockerfile docker

python 3.9-slim-bullseye build

citation/Dockerfile docker

python 3.9-slim-bullseye build

client/Dockerfile docker

nginx stable-alpine build
node 18-alpine build

docker-compose.yml docker

chapter_summarization latest
chapter_title latest
citation latest
client latest
format_check latest
grammar latest
page_count latest
postgresql latest
rabbitmq latest
rest latest
table_figure_detection latest
table_of_content latest
word_frequency latest

format_check/Dockerfile docker

python 3.9-slim-bullseye build

grammar/Dockerfile docker

python 3.9-slim-bullseye build

page_count/Dockerfile docker

python 3.9-slim-bullseye build

postgresql/Dockerfile docker

postgres 11-alpine build

rabbitmq/Dockerfile docker

rabbitmq 3-management build

rest/Dockerfile docker

node 18-alpine build

table_figure_detection/Dockerfile docker

python 3.9-slim-bullseye build

table_of_content/Dockerfile docker

python 3.9-slim-bullseye build

client/package-lock.json npm

1262 dependencies

client/package.json npm

react-scripts ^5.0.1 development
sass ^1.64.2 development
axios ^1.4.0
d3 ^7.8.5
react ^18.2.0
react-datepicker ^4.16.0
react-dom ^18.2.0
react-router-dom ^6.14.2

rest/package-lock.json npm

290 dependencies

rest/package.json npm

@types/amqplib ^0.10.1 development
@types/cors ^2.8.13 development
@types/express ^4.17.15 development
@types/multer ^1.4.7 development
@types/node ^18.11.18 development
@types/pdfkit ^0.12.8 development
@types/uuid ^9.0.0 development
cors ^2.8.5 development
nodemon ^2.0.20 development
pdfkit ^0.13.0 development
ts-node ^10.9.1 development
typescript ^4.9.4 development
uuid ^9.0.0 development
@google-cloud/storage ^6.10.1
amqplib ^0.10.3
body-parser ^1.20.1
dotenv ^16.0.3
express ^4.18.2
multer ^1.4.5-lts.1
pg ^8.8.0
prom-client ^15.1.0
punycode ^2.3.1
sequelize ^6.28.0

chapter_summarization/requirements.txt pypi

google-cloud-storage *
numpy *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *
sumy *

chapter_title/requirements.txt pypi

Levenshtein *
google-cloud-storage *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *

citation/requirements.txt pypi

google-cloud-storage *
pika *
psycopg2-binary ==2.9.1
python-dotenv *
refextract *

format_check/requirements.txt pypi

google-cloud-storage *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *

grammar/requirements.txt pypi

google-cloud-storage *
language_tool_python *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *

page_count/requirements.txt pypi

google-cloud-storage *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *

requirements/chapter_title/requirements.txt pypi

Abstract *
Appendix *
Chapter1Introduction *
Chapter2LiteratureReview *
Chapter3Implementations *
Chapter4ImplementandResults *
Chapter5DiscussionandEvaluation *
Chapter6ConclusionandFutureWork *
References *

requirements/format_check/requirements.txt pypi

Chaptertitles *
Mainsectiontitles *
Titleformattingguidelines *

requirements/page_count/requirements.txt pypi

CHAPTER1 *
CHAPTER2 *
CHAPTER3 *
CHAPTER4 *
CHAPTER5 *
CHAPTER6 *
REFERENCES *
TOTAL *

requirements/word_frequency/requirements.txt pypi

additionally *
after *
all *
allow *
allowed *
allows *
also *
although *
am *
an *
and *
another *
are *
as *
at *
be *
because *
been *
before *
but *
by *
can *
chapter *
consequently *
could *
despite *
due *
each *
either *
example *
few *
fewest *
figure *
finally *
first *
for *
from *
had *
has *
have *
he *
her *
hers *
his *
however *
if *
in *
included *
includes *
including *
into *
is *
it *
its *
least *
lot *
made *
make *
makes *
many *
more *
most *
neither *
no *
not *
of *
on *
one *
only *
or *
other *
same *
second *
she *
some *
specifically *
such *
table *
that *
the *
their *
them *
there *
therefore *
these *
they *
third *
this *
those *
though *
to *
us *
use *
used *
uses *
using *
was *
we *
when *
where *
which *
who *
whom *
will *
with *
would *
yes *
you *
your *

table_figure_detection/requirements.txt pypi

google-cloud-storage *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *

table_of_content/requirements.txt pypi

google-cloud-storage *
pdfplumber *
pika *
psycopg2-binary ==2.9.1
python-dotenv *