portageclusterutils

Cluster and parallelization utilities that came ouf of the Portage SMT project — Outils de parallélisation sur grappe de calcul issus du projet Portage de TAS

https://github.com/nrc-cnrc/portageclusterutils

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary

Keywords

machine-translation parallel-computing
Last synced: 4 months ago · JSON representation ·

Repository

Cluster and parallelization utilities that came ouf of the Portage SMT project — Outils de parallélisation sur grappe de calcul issus du projet Portage de TAS

Basic Info
  • Host: GitHub
  • Owner: nrc-cnrc
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Homepage:
  • Size: 468 KB
Statistics
  • Stars: 2
  • Watchers: 8
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
machine-translation parallel-computing
Created almost 5 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

Français

Portage Cluster Utilities

This repo contains scripts that are used to facilitate parallelization of jobs on a cluster or on a multi-core machine. These scripts were originally written as part of the Portage Statistical Machine Translation, but were extracted here because they are of more general interest.

Installation

The simplest installation procedure is to activate PortageClusterUtils in place from a clone.

git clone https://github.com/nrc-cnrc/PortageClusterUtils.git

and then add this line to your .profile or .bashrc for GPSC or GPSCC

source /path/to/PortageClusterUtils/GPSC_SETUP.bash

or add this line below to your .profile or .bashrc for TRIXIE

source /path/to/PortageClusterUtils/TRIXIE_SETUP.bash

Alternatively, you can also install it to the destination of your choice like this:

cd bin/ make install INSTALL_DIR=/install/path

which will copy all the scripts into /install/path/bin/. By default, the destination in $HOME/bin.

Dependencies

PortageClusterUtilities requires: - Perl >= 5.14, as perl on your PATH; - any version of Python 3, as python3 on your PATH;

Usage

Main Scripts

The main tools provided by this repository are these scripts:

  • parallelize.pl: take any non-parallel pipeline processing tool, say, a tokenizer, and parallelize it. E.g., parallelize.pl -n 10 'utokenize.pl < input > output' will produce the same results as utokenize.pl < input > output but it will run it 10-ways parallel, either on 10 cores, or in 10 scheduled cluster jobs if you're running on a cluster.

Run "parallelize.pl -h" for more details.

  • run-parallel.sh: given a list of M bash commands to run, independently from one another, launch N worker jobs (scheduled jobs on a cluster or worker threads on a multi-core machine) which will run the all the jobs, N-ways parallel, until all are done.

Run "run-parallel.sh -h" for more details.

  • psub: different computing clusters require different syntax to submit jobs. Those differences are encapsulated inside psub, so that run-parallel.sh does not have to be aware of them. psub -mem 24G -cpus 4 some-command -and -its -options will run "some-command -and -its -options" in a job with the requested resources.

Currently, psub is highly specicialized the the clusters we use at the NRC. To use PortageClusterUtils on your cluster, you must adapt psub to write job scripts compatible with your cluster configuration.

Run "psub -h" for more details.

Other Scripts

| Script | Brief Description | | ------------------------------- | ------------------------------------------------------------ | | analyze-run-parallel-log.pl | Summarize started/done/failed jobs in a run-parallel.sh log. | | jobsig.pl | Send a signal to a job. | | jobtree | Display the jobstat output as a tree of jobs (jobsub). | | on-cluster.sh | Detect if we're running on a cluster. | | process-memory-usage.pl | Tally the memory usage of a process tree. | | qstatdir | Run qstat, showing where commands were run from (qsub). | | qstatn | Run qstat -n with a more compact output (qsub). | | qstattree | Display the qstat output as a tree of jobs (qsub). | | r-parallel-d.pl | Daemon for run-parallel.sh. | | r-parallel-worker.pl | Worker for run-parallel.sh. | | r-scheduler.py | Monitor run-parallel.sh and maximize cluster usage (obsolete)|. | rp-mon-totals.pl | Helper for run-parallel.sh, for tallying run-time stats. | | rsync-with-restart.sh | For really unstable connections, rsync with retries until success. Warning: never gives up! | | stripe.py | Helper for parallelize.pl. | | sum.pl | Sum/avg/max a column or list of numbers. | | which-test.sh | "which" with reliable exit code, for scripting. |

Each script accepts the -h option to output its full documentation.

Scripts with a cluster/scheduler type in parentheses might only work on such a cluster.

Citation

bib @misc{Portage_Cluster_Utils, author = {Joanis, Eric and Stewart, Darlene and Larkin, Samuel and Leger, Serge}, license = {MIT}, title = {{Portage Cluster Utils}}, url = {https://github.com/nrc-cnrc/PortageClusterUtils} year = {2022}, }

Copyright

Traitement multilingue de textes / Multilingual Text Processing \ Centre de recherche en technologies numériques / Digital Technologies Research Centre \ Conseil national de recherches Canada / National Research Council Canada \ Copyright 2022, Sa Majesté le Roi du Chef du Canada / His Majesty the King in Right of Canada \ Published under the MIT License (see LICENSE)

Owner

  • Name: National Research Council of Canada — Conseil national de recherches du Canada
  • Login: nrc-cnrc
  • Kind: organization
  • Email: info@nrc-cnrc.gc.ca
  • Location: Canada

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Portage Cluster Utils
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Eric
    family-names: Joanis
    email: Eric.Joanis@nrc-cnrc.gc.ca
    affiliation: National Research Council Canada
  - given-names: Darlene
    family-names: Stewart
    email: Darlene.Stewart@nrc-cnrc.gc.ca
    affiliation: National Research Council Canada
  - given-names: Samuel
    family-names: Larkin
    email: Samuel.Larkin@nrc-cnrc.gc.ca
    affiliation: National Research Council Canada
  - given-names: Serge
    family-names: Leger
    email: Serge.Leger@nrc-cnrc.gc.ca
    affiliation: National Research Council Canada
repository-code: 'https://github.com/nrc-cnrc/PortageClusterUtils'
abstract: >-
  Cluster and parallelization utilities that came ouf
  of the Portage SMT project — Outils de
  parallélisation sur grappe de calcul issus du
  projet Portage de TAS
keywords:
  - Machine Translation
  - Parallel Processing
license: MIT

GitHub Events

Total
  • Watch event: 1
  • Delete event: 2
  • Issue comment event: 2
  • Push event: 7
  • Pull request review event: 5
  • Pull request review comment event: 3
  • Pull request event: 4
  • Create event: 2
Last Year
  • Watch event: 1
  • Delete event: 2
  • Issue comment event: 2
  • Push event: 7
  • Pull request review event: 5
  • Pull request review comment event: 3
  • Pull request event: 4
  • Create event: 2

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 0
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: about 11 hours
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 14 hours
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • SamuelLarkin (5)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/test-suite.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite