cdhf
CERN Data Handling Framework is a small framework to work with the CERN Anonymized Mattermost Data set
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Repository
CERN Data Handling Framework is a small framework to work with the CERN Anonymized Mattermost Data set
Basic Info
Statistics
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
CERN Data Handling Framework
Introduction ☕️
Dataset Description
Mattermost is an open-source communication platform similar to slack that is widely used at CERN. The CERN Anonymized Mattermost Dataset includes Mattermost data from January 2018 to November 2021 with 20794 CERN users, 2367 Mattermost teams, 12773 Mattermost channels, 151 CERN buildings, and 163 CERN organizational units. The data set states the relationship between Mattermost teams, Mattermost channels, and CERN users, and holds various information such as channel creation, channel deletion times, user channel joining and leave times, and user-specific information such as building and organizational units. To hide identifiable information (e.g. Team Name, User Name, Channel Name, etc.), the dataset was anonymized. The anonymization was done by omitting some attributes, hashing string values, and removing connections between users/teams/channels.
Dataset License: CC BY-NC Creative Commons Attribution Non-Commercial Licence
Dataset Link: CERN Anonymized Mattermost Data | Zenodo
```bibtex @dataset{jakovljevicigor2022_6319684, author = {Jakovljevic, Igor and Wagner, Andreas and Gütl, Christian and Pobaschnig, Martin and Mönnich, Adrian}, title = {CERN Anonymized Mattermost Data}, month = mar, year = 2022, publisher = {Zenodo}, version = 1, doi = {10.5281/zenodo.6319684}, url = {https://doi.org/10.5281/zenodo.6319684} }
```
Getting Started 🏁
Setup 💻
1. Retrieving the Dataset
Retrieve Mattermost Data (mmdata.json) from Zenodo.
2. Install cdhf from pypi
Install the cdhf package by
sh
$ pip install cdhf
3. Import and use cdhf
Include the cdhf package:
python
from cdhf.data import Data
Create the Data object to and load the data set:
```python data = Data("path/to/mmdata.json/file")
data.load_all() ```
You can find examples on how to work with the data at the cdhf-examples repository.
Documentation 🖨️
API documentation is available at https://mpobaschnig.github.io/cdhf/.
Citation ✍️
If you happen to mention or use this project as part of one of your scientific works, please cite the following paper:
- Jakovljevic, I., Pobaschnig, M., Gütl, C. and Wagner, A., 2022. Privacy Aware Identification of User Clusters in Large Organisations based on Anonymized Mattermost User and Channel Information. In: DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics.
```bibtex
@inproceedings{DataAnalytics2022, author = { Jakovljevic, I., Pobaschnig, M., Gütl, C. AND Wagner, A. }, year = { 2022 }, month = { 11 }, title = { Privacy Aware Identification of User Clusters in Large Organisations based on Anonymized Mattermost User and Channel Information } }
```
Latest publications 📚
- Jakovljevic, I., Gütl, C., Wagner, A. and Nussbaumer, A. Compiling Open Datasets in Context of Large Organizations while Protecting User Privacy and Guaranteeing Plausible Deniability. In Proceedings of the 11th International Conference on Data Science, Technology and Applications (DATA 2022)
```bibtex
@article{Data22, author={Igor Jakovljevic. and Christian Gütl. and Andreas Wagner. and Alexander Nussbaumer.}, title={Compiling Open Datasets in Context of Large Organizations while Protecting User Privacy and Guaranteeing Plausible Deniability}, booktitle={Proceedings of the 11th International Conference on Data Science, Technology and Applications - DATA,}, year={2022}, pages={301-311}, publisher={SciTePress}, organization={INSTICC}, doi={10.5220/0011265700003269}, isbn={978-989-758-583-8}, issn={2184-285X}, }
```
Involved institutions 🏫
Contributors from the following institutions were involved in the development of this project: * CERN
* Graz University of Technology
Visual Exploration & Analysis 👁️🗨️
In case you would like to visually explore the CERN Mattermost dataset without any programming you can use Collaboration Spotting X.
It is a web-based visual network analytics application which includes various convenient features which enable exploration of network datasets on the fly.
To get started with exploring the CERN Mattermost dataset read the instructions of CSX.
Acknowledgements 🙏
We would like to express our gratitude to CERN, for allowing us to publish the dataset as open data and use it for research purposes.
Owner
- Name: Martin Pobaschnig
- Login: mpobaschnig
- Kind: user
- Repositories: 11
- Profile: https://github.com/mpobaschnig
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Pobaschnig"
given-names: "Martin"
orcid: "0000-0002-4077-6300"
- family-names: "Jakovljevic"
given-names: "Igor"
orcid: "0000-0003-1893-9553"
title: "CERN Data Handling Framework"
version: 1.1
doi: 10.5281/zenodo.6575935
date-released: 2022-05-24
url: "https://github.com/mpobaschnig/cdhf"
preferred-citation:
type: conference-paper
authors:
- family-names: "Jakovljevic"
given-names: "Igor"
orcid: "0000-0003-1893-9553"
- family-names: "Pobaschnig"
given-names: "Martin"
orcid: "0000-0002-4077-6300"
- family-names: "Gütl"
given-names: "Christian"
orcid: "0000-0001-9589-1966"
- family-names: "Wagner"
given-names: "Andreas"
orcid: "0000-0001-9589-2635"
month: 11
title: "Privacy Aware Identification of User Clusters in Large Organisations based on Anonymized Mattermost User and Channel Information"
year: 2022
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 0
- Total pull requests: 8
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.25
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- IgorJakovljevic (7)
- Oblynx (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 15 last-month
- Total dependent packages: 0
- Total dependent repositories: 2
- Total versions: 1
- Total maintainers: 1
pypi.org: cdhf
CERN Data Handling Framework is a small framework to work with the CERN Anonymized Mattermost Data set
- Homepage: https://github.com/mpobaschnig/cdhf
- Documentation: https://cdhf.readthedocs.io/
- License: GNU Lesser General Public License v3 (LGPLv3)
-
Latest release: 1.1
published about 4 years ago