platform
Platform for machine learning experiments developed in the project NEWSGAC
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Repository
Platform for machine learning experiments developed in the project NEWSGAC
Basic Info
- Host: GitHub
- Owner: newsgac
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://research-software.nl/projects/newsgac
- Size: 3.83 MB
Statistics
- Stars: 5
- Watchers: 5
- Forks: 1
- Open Issues: 29
- Releases: 1
Topics
Metadata Files
README.md
NEWSGAC
NEWSGAC is a research project which aims at transparent automatic classification of genres of newspaper articles. The project is a cooperation between the University of Groningen, the Amsterdam Center for Mathematics and Computer Science and the Netherlands eScience Center.
In the project, we developed an online platform for applying machine learning models to text data, with the opportunity to closely analyze the performance of the models. This repository contains the code of this platform.
Setup Instructions
In order to run the platform at your computer, you need to have docker available on your system. Then execute the following commands in a command line environment (instructions for Linux):
git clone https://github.com/newsgac/platform.gitcd platformdocker build . -t "newsgac/newsgac"export $(egrep -v '^#' .env.default | xargs)docker stack deploy -c docker-compose.yml -c docker-compose.dev.yml newsgacdev
When these commands have successfully completed, the platform will be available as a web server on the address: http://YOUR-IP-ADDRESS:5050
Steps 1, 2 and 3 need to be executed only once for installing the system. Both step 4 and step 5 are required each time when you start the system.
Optional steps:
- For adaption to local environment: edit file .env.default or create your own version
- For Jupyter notebook support:
docker build . -f jupyter/Dockerfile -t "newsgac/jupyterhub" - During production:
docker build ./nginx -t newsgac/nginx - installation instructions for usage of a kubernetes cluster (CLARIAH)
Stopping the system:
docker service rm newsgacdev_database newsgacdev_frog newsgacdev_frogworker newsgacdev_redis newsgacdev_web newsgacdev_worker
Note that it takes a few seconds to completely stop all parts of the system.
Run flask web app locally (through IDE)
You might want to run flask outside of Docker (because it is e.g. easier to attach a debugger).
- Follow
Setup Instructions for DOCKERinstructions so that all services are online (Mongo, Redis, FROG, celery workers). - Make sure the flask docker container is DOWN:
docker service rm newsgacdev_web
- Set up a virtual environment (python 3.7) and install the requirements:
pip install -r requirements.txt
- Setup the correct environmental variables (
.env.local) e.g. by running
export $(cat .env.local | xargs)
- To run from command line, navigate to
platform/and run:
PYTHONPATH=. python newsgac/app.py
- The local web server will be running on
http://localhost:5050.
Debugging tasks
Typically tasks are executed by celery workers. If you want to debug a task you can do one of two things:
- Run a celery worker in debug mode
- Make sure
CELERY_EAGER=True(or unset). This will cause celery to run tasks in the main thread instead of offloading it to workers.
Running the tests (Docker)
docker run --name=mongo -it --rm -d mongo-
docker run \ --name=newsgactest \ -it \ --network=container:mongo \ --mount type=bind,src="$(pwd)"/newsgac,destination=/newsgac/newsgac \ --entrypoint=sh \ newsgac/newsgac -c pytest --cov=newsgac --cov-report=xml docker stop newsgactest mongo
Running the tests (Local)
- Setup local (virtual) environment as when running flask locally
- Load the test env vars:
export $(cat .env.test | xargs)
- Make sure the database, Frog and redis are running (e.g.
docker stack deploy -c docker-compose.yml -c docker-compose.dev.yml newsgacdev - Load env variables, then run tests using
pytest .
Python console
E.g. to create a user:
- Start console using docker (or from you local environment using
python):
docker exec -it newsgac_dev web python
- Import database & user model
from newsgac import database
from newsgac.users.models import User
- Create new user
u = User(email='testuser@test.com', password='testtest', name='Test', surname='User')
u.save()
- You can now login from the frontend as this user.
Useful commands
docker stack ps newsgacdevdocker service ps newsgacdev_workerdocker service inspect newsgacdev_workerdocker service logs newsgacdev_worker
References
A. Bilgin, E. Tjong Kim Sang, K. Smeenk, L. Hollink, J. van Ossenbruggen, F. Harbers and M. Broersma, Utilizing a Transparency-driven Environment toward Trusted Automatic Genre Classification: A Case Study in Journalism History (2018)
@inproceedings{bilgin2018utilizing,
title={Utilizing a Transparency-driven Environment toward Trusted Automatic Genre Classification: A Case Study in Journalism History},
author={Bilgin, Aysenur and Tjong Kim Sang, Erik and Smeenk, Kim and Hollink, Laura and van Ossenbruggen, Jacco and Harbers, Frank and Broersma, Marcel},
booktitle={2018 IEEE 14th International Conference on e-Science (e-Science)},
pages={486--496},
year={2018},
organization={IEEE}
}
K. Smeenk, A. Bilgin, T. Klaver, E. Tjong Kim Sang, L. Hollink, J. van Ossenbruggen, F. Harbers and M. Broersma, Grounding Paradigmatic Shifts In Newspaper Reporting In Big Data. Analysing Journalism History By Using Transparent Automatic Genre Classification (2019)
@inproceedings{smeenk2019dh,
author = "Kim Smeenk and Aysenur Bilgin and Tom Klaver and Erik Tjong Kim Sang and Laura Hollink and Jacco van Ossenbruggen and Frank Harbers and Marcel Broersma",
title = "{Grounding Paradigmatic Shifts In Newspaper Reporting In Big Data. Analysing Journalism History By Using Transparent Automatic Genre Classification}",
booktitle = "{Digital Humanities Conference 2019 (DH2019)}",
publisher = "{Utrecht, The Netherlands}",
year = "2019"
}
T. Klaver, E. Tjong Kim Sang, A. Bilgin, K. Smeenk, L. Hollink, J. van Ossenbruggen, F. Harbers and M. Broersma, Introducing a transparency-driven platform for creating, comparing and explaining machine learning pipelines (2019)
@inproceedings{klaver2019ictopen,
author = "Tom Klaver and Erik Tjong Kim Sang and Aysenur Bilgin and Kim Smeenk and Laura Hollink and Jacco van Ossenbruggen and Frank Harbers and Marcel Broersma",
title = "{Introducing a transparency-driven platform forcreating, comparing and explaining machinelearning pipelines}",
booktitle = "{ICT-Open}",
publisher = "{Hilversum, The Netherlands}",
year = "2019",
note = "(demo presentation abstract)"
}
Contributors
- Aysenur Bilgin (aysenur.bilgin@cwi.nl)
- Erik Tjong Kim Sang (e.tjongkimsang@esciencecenter.nl)
- Tom Klaver (t.klaver@esciencecenter.nl)
Owner
- Name: NEWSGAC
- Login: newsgac
- Kind: organization
- Email: erikt@xs4all.nl
- Location: Amsterdam, The Netherlands
- Website: https://www.esciencecenter.nl/project/newsgac
- Repositories: 2
- Profile: https://github.com/newsgac
NEWSGAC project, Netherlands eScience Center
Citation (CITATION.cff)
# YAML 1.2
---Platform for machine learning experiments developed in the project NEWSGAC"
authors:
-
affiliation: CWI
family-names: Bilgin
given-names: Aysenur
-
affiliation: "Netherlands eScience Center"
family-names: Klaver
given-names: Tom
-
affiliation: "Netherlands eScience Center"
family-names: "Tjong Kim Sang"
given-names: Erik
orcid: "https://orcid.org/0000-0002-8431-081X"
cff-version: "1.1.0"
date-released: 2021-03-15
license: "Apache-2.0"
message: "If you use this software, please cite it using these metadata."
title: "NEWSGAC Platform"
version: "1.0"
...
GitHub Events
Total
- Delete event: 23
- Issue comment event: 26
- Push event: 2,064
- Pull request event: 49
- Create event: 26
Last Year
- Delete event: 23
- Issue comment event: 26
- Push event: 2,064
- Pull request event: 49
- Create event: 26
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 17
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.82
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 17
- Average time to close issues: N/A
- Average time to close pull requests: 8 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.82
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- pyup-bot (67)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- python 3.7 build