automating-poi-categorization-agcg
A hybrid system combining semantic embeddings and rule-based logic to classify Points of Interest (POIs) using a hierarchical category tree.
https://github.com/project-terraforma/automating-poi-categorization-agcg
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.3%) to scientific vocabulary
Repository
A hybrid system combining semantic embeddings and rule-based logic to classify Points of Interest (POIs) using a hierarchical category tree.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Automating POI Categorization
A hybrid system combining semantic embeddings and rule-based logic to classify Points of Interest (POIs) using a hierarchical category tree.
Description
This project is designed to automatically categorize POIs (like restaurants, gyms, or clinics) into a structured taxonomy. POI data is collected by scraping publicly available information from business websites. For each POI, we use the business name and website content (extracted via web scraping) to generate a descriptive input.
The model then uses Sentence-BERT (SBERT) embeddings to match this input against a tree of categories and subcategories. This is enhanced by a rule-based scoring system that matches category-specific keywords to improve prediction accuracy. This combination ensures scalable, interpretable, and flexible POI classification, especially when dealing with sparse or noisy data.
Getting Started
Dependencies
- Operating System: Windows 10, macOS, or Linux (Python 3.8+ recommended)
- Python Libraries:
transformers– for loading sentence embedding modelssentence-transformers– high-level wrapper for semantic embeddingtorch– backend for running SBERT model inferencepandas– used for handling and filtering POI datasetsnumpy– array computations for scoring and embedding mathbeautifulsoup4– used for HTML parsing in the web scraperrequests– makes HTTP calls to fetch POI websitesjupyter– local interactive developmentgoogle-colab– cloud-based alternative to Jupyter Notebooks
Setup
- Clone the repo
sh git clone https://github.com/project-terraforma/Automating-POI-Categorization-AGCG.git - Navigate into the project directory
sh cd Automating-POI-Categorization-AGCG - Install Python dependencies
Make sure you have a Python environment set up, then install required packages:
sh
pip install -r requirements.txt
5. Prevent accidental pushes to the base repository
Change the Git remote to your own fork or local version:
sh
git remote set-url origin https://github.com/<your-username>/<your-repo-name>.git
git remote -v # Confirm the remote URL was updated
8. Start the project
Launch Jupyter Notebook and open the main notebook to begin:
sh
jupyter notebook
Navigate to the notebooks/ folder and open main.ipynb.
Authors
Adam Axtopani Gonzales – adamurlnum2@gmail.com
Carlos Garcia
Version History
- 0.1
- Initial Release
Acknowledgments
[ Project Sponsor ] Overture Maps Foundation
Sponsored this project and gave us the opportunity to approach this problem with their open source data
Overture Maps POC's from Microsoft Corporation
Krill Fedotov, Marko Radoicic, & Nikola Bozovic
A source of guidance and expertise when tackling this project together
Owner
- Name: project-terraforma
- Login: project-terraforma
- Kind: organization
- Repositories: 1
- Profile: https://github.com/project-terraforma
Citation (CITATION.cff)
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
GitHub Events
Total
- Push event: 9
Last Year
- Push event: 9
Dependencies
- beautifulsoup4 *
- numpy *
- overturemaps *
- pandas *
- pyarrow *
- requests *
- sentence-transformers *