https://github.com/anas-elhounsri/profile-based-ir-using-glove-embeddings
A small information retrieval project where using GloVe and machine learning algorithms, we classify news articles and direct them to a set of generated users depending on their interests.
https://github.com/anas-elhounsri/profile-based-ir-using-glove-embeddings
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Repository
A small information retrieval project where using GloVe and machine learning algorithms, we classify news articles and direct them to a set of generated users depending on their interests.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Profile Based Information Retrieval With GloVe Embedding:
This is a small information reteival project, where we generate 5 sets of users that have different interests in different categories of news, and using GloVe for word embedding, we trained two models (SVM and Random Forest) for classification, the script takes an input of corpus of an article, and determines which users would like the given article after classifying it.
The data processing and the results of the evaluation for both models are in the notebook as well. The dataset was obtained from Kaggle
I used Google Colab when making this script (you can use Jupyter but you will have to upload the dataset and glove file to the notebook), so to use it:
Data and Preprocessing
- Upload your data: Upload the dataset file (e.g., dataset.csv) containing news articles and their categories to your Google Drive.
- Specify Path: In your code, define the path to the dataset.csv file within your mounted Drive for example:
```python path = "/content/drive/your/path/dataset.csv"
GloVe Word Embeddings
Download Pre-trained Model: Since we're using a pre-trained Glove model, navigate to Standford university GloVe project repository and download the appropriate Glove model file (In my case I used glove.6B.zip).
Upload to Drive: Upload the downloaded glove.6B.zip file to the same location in your Google Drive where you uploaded the dataset.
Specify Path: In your code, define the path to the downloaded glove.6B.zip file within your mounted Drive:
bash !unzip /content/drive/your/path/glove.6B.zip
This project offers a starting point to learn about information retieval. There is still plenty room to explore further by:
- Training our own GloVe instead of implementing a pretrained one.
- Implementing additional features.
- Experimenting with different classification algorithms.
- Using a larger and more comprehensive dataset.
Owner
- Name: Papa Zodd
- Login: Anas-Elhounsri
- Kind: user
- Repositories: 1
- Profile: https://github.com/Anas-Elhounsri
I'm a CE student aspiring to become a a Solution Architect with Data Science and Deep Learning orientation.