publicationclassification
Java package for creating a multi-level classification of scientific publications based on citation links between publications.
Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 15 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.5%) to scientific vocabulary
Keywords
Repository
Java package for creating a multi-level classification of scientific publications based on citation links between publications.
Basic Info
Statistics
- Stars: 7
- Watchers: 3
- Forks: 1
- Open Issues: 1
- Releases: 2
Topics
Metadata Files
README.md
publicationclassification
Introduction
This Java package can be used to create a multi-level classification of scientific publications based on citation links between publications.
The package uses the direct citation approach introduced by Waltman and Van Eck (2012) combined with the Leiden algorithm introduced by Traag et al. (2019). The package also supports the extended direct citation approach introduced by Waltman et al. (2020).
The publicationclassification package was developed by Nees Jan van Eck at the Centre for Science and Technology Studies (CWTS) at Leiden University. It relies on the networkanalysis package that was developed by Nees Jan van Eck, Vincent Traag, and Ludo Waltman.
Documentation
Documentation of the source code of publicationclassification is provided in the code in javadoc format. The documentation is also available in a compiled format.
Installation
Maven
<dependency>
<groupId>nl.cwts</groupId>
<artifactId>publicationclassification</artifactId>
<version>1.1.0</version>
</dependency>
Gradle
implementation group: 'nl.cwts', name: 'publicationclassification', version: '1.1.0'
Usage
The publicationclassification package requires Java 8 or higher. The latest version of the package is available as a pre-compiled jar file on Maven Central and GitHub Packages.
Instructions for compiling the source code of the package are provided below.
Use the command-line tool PublicationClassificationCreator to create a publication classification. The tool can be run as follows:
java -cp publicationclassification-1.1.0.jar nl.cwts.publicationclassification.PublicationClassificationCreator
If no further arguments are provided, the following usage notice will be displayed:
``` PublicationClassificationCreator version 1.1.0 By Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University
Usage: PublicationClassificationCreator
or PublicationClassificationCreator
Arguments:
Example
The following example illustrates the use of the PublicationClassificationCreator tool. Suppose you have a text file pubs.txt:
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
…
You also have a text file cit_links.txt
1 1516 0.5
1 1988 1
1 25388 1
2 821 0.142857142857143
2 2504 0.0714285714285714
2 24459 0.5
2 24656 0.5
3 1841 0.2
3 2009 0.166666666666667
3 5337 0.0833333333333333
…
The PublicationClassificationCreator tool can then be run as follows:
java -cp publicationclassification-1.1.0.jar nl.cwts.publicationclassification.PublicationClassificationCreator pubs.txt cit_links.txt classification.txt true 100 4e-4 25 2e-4 250 7e-5 1000
The publication classification created by the tool can be found in the text file classification.txt:
1 83 9 3
2 1 1 2
3 43 14 2
4 1 1 2
5 7 7 1
18 49 2 0
19 4 5 0
20 24 0 1
21 33 20 0
22 2 3 0
…
The tool displays the following output:
``` PublicationClassificationCreator version 1.1.0 By Nees Jan van Eck Centre for Science and Technology Studies (CWTS), Leiden University
Reading citation network from file... Finished! Reading citation network from file took 0h 0m 0s. Citation network: Number of publications: 26800 Number of citation links: 150613 Total publication weight: 18643 Total citation link weight: 18321
Identifying largest connected component in citation network... Finished! Identifying largest connected component in citation network took 0h 0m 0s. Largest connected component: Number of publications: 20988 Number of citation links: 150387 Total publication weight: 17206 Total citation link weight: 18131
Creating publication classification... Clustering algorithm: Leiden algorithm Number of iterations: 100 Random seed: 0
Adding micro-level classification... Creating clustering... Finished! 335 clusters created. Reassigning small clusters... Finished! 98 clusters remaining. Adding micro-level classification took 0h 0m 2s. Micro-level classification: Resolution: 4.0E-4 Threshold: 25 Number of clusters: 98
Adding meso-level classification... Creating clustering... Finished! 63 clusters created. Reassigning small clusters... Finished! 25 clusters remaining. Adding meso-level classification took 0h 0m 0s. Meso-level classification: Resolution: 2.0E-4 Threshold: 250 Number of clusters: 25
Adding macro-level classification... Creating clustering... Finished! 9 clusters created. Reassigning small clusters... Finished! 4 clusters remaining. Adding macro-level classification took 0h 0m 0s. Macro-level classification: Resolution: 7.0E-5 Threshold: 1000 Number of clusters: 4
Writing publication classification to file... Finished! Writing publication classification to file took 0h 0m 0s. ```
License
The publicationclassification package is distributed under the MIT license.
Issues
If you encounter any issues, please report them using the issue tracker on GitHub.
Contribution
You are welcome to contribute to the development of the publicationclassification package. Please follow the typical GitHub workflow: Fork from this repository and make a pull request to submit your changes. Make sure that your pull request has a clear description and that the code has been properly tested.
Development and deployment
The latest stable version of the source code is available in the main branch on GitHub. The most recent version of the source code, which may be under development, is available in the develop branch.
Compilation
To compile the source code of the publicationclassification package, a Java Development Kit needs to be installed on your system (version 8 or higher). Having Gradle installed is optional as the Gradle Wrapper is also included in this repository.
On Windows systems, the source code can be compiled as follows:
gradlew build
On Linux and MacOS systems, use the following command:
./gradlew build
The compiled class files can be found in the directory build/classes.
The compiled jar file can be found in the directory build/libs.
The compiled javadoc files can be found in the directory build/docs.
The class nl.cwts.publicationclassification.run.PublicationClassificationCreator has a main method. After compiling the source code, the PublicationClassificationCreator tool can be run as follows:
java -cp build/libs/publicationclassification-<version>.jar nl.cwts.publicationclassification.run.PublicationClassificationCreator
References
Traag, V.A., Waltman, L., & Van Eck, N.J. (2019). From Louvain to Leiden: Guaranteeing well-connected communities. Scientific Reports, 9, 5233. https://doi.org/10.1038/s41598-019-41695-z
Waltman, L., Boyack, K.W., Colavizza, G., & Van Eck, N.J. (2020). A principled methodology for comparing relatedness measures for clustering publications. Quantitative Science Studies, 1(2), 691-713. https://doi.org/10.1162/qssa00035
Waltman, L., & Van Eck, N.J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology, 63(12), 2378-2392. https://doi.org/10.1002/asi.22748
Owner
- Name: Centre for Science and Technology Studies
- Login: CWTSLeiden
- Kind: organization
- Email: info@cwts.leidenuniv.nl
- Location: Leiden, the Netherlands
- Website: https://www.cwts.nl/
- Repositories: 2
- Profile: https://github.com/CWTSLeiden
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it using the metadata from this file."
type: "software"
authors:
- family-names: "Van Eck"
given-names: "Nees Jan"
orcid: "https://orcid.org/0000-0001-8448-4521"
email: "ecknjpvan@cwts.leidenuniv.nl"
affiliation: "Centre for Science and Technology Studies (CWTS), Leiden University"
title: "publicationclassification"
abstract: "This Java package can be used to create a multi-level classification of scientific publications based on citation links between publications."
keywords:
- publication classification
- scientific publications
- citation links
- multi-level classification
- clustering
- community detection
- clustering algorithm
- Leiden algorithm
- Java
url: "https://github.com/CWTSLeiden/publicationclassification#readme"
repository-code: "https://github.com/CWTSLeiden/publicationclassification"
repository-artifact: "https://central.sonatype.com/artifact/nl.cwts/publicationclassification/"
license: MIT
doi: 10.5281/zenodo.8263452
version: 1.1.0
date-released: 2023-08-21
GitHub Events
Total
- Watch event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Fork event: 1
Committers
Last synced: almost 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Nees Jan van Eck | 3****k | 8 |
| Nees Jan van Eck | e****n@c****l | 8 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- ravwojdyla (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v3 composite
- actions/setup-java v3 composite
- actions/upload-artifact v3 composite
- actions/checkout v3 composite
- actions/setup-java v3 composite