Recent Releases of job-defense-shield

job-defense-shield - v1.2.1

  • Added ability to use an external SMTP server
  • Added gpu_mem_eff_pct setting to --low-gpu-efficiency so that jobs with high GPU memory usage can be ignored

- Python
Published by jdh4 10 months ago

job-defense-shield - v1.2.0

  • The "GPU Model Too Powerful" alert now supports multi-GPU jobs. This introduced breaking changes to the names of some settings (e.g., num_cores_threshold has been replaced by num_cores_per_gpu).
  • The -E and -S options can be used to set the starttime and endtime for the call to sacct.
  • Reports for low CPU/GPU utilization and excessive CPU/GPU time limits can now show all users with the show_all_users flag. Previously, only the offending users were shown in the report.
  • A debug option (--dump-files) has been added to write the raw and processed dataframes.

- Python
Published by jdh4 11 months ago

job-defense-shield - v1.1.2

  • When the sliding window cancellation method is used, the minimum elapsed time for a job to receive a warning is the max of cancel_minutes plus sampling_period_minutes and sliding_warning_minutes.
  • The default for warnings_to_admin was changed to False.
  • Admins are encouraged to use warning_frac: 0.5 instead of the default of 1.0.

- Python
Published by jdh4 about 1 year ago

job-defense-shield - v1.1.1

Added support for multiple alert entries for cancelling GPU jobs at 0% utilization - fractionofperiod can have a max value of 0.7 divided by number of entries - cache filename is different for each entry

- Python
Published by jdh4 about 1 year ago

job-defense-shield - v1.1.0

Can now cancel jobs with 0% GPU utilization over any time window of a specified length. Previously, job cancellations could only be done during the first N minutes of the job.

- Python
Published by jdh4 about 1 year ago

job-defense-shield - v1.0.2

  • Code can identify jobs with 0% GPU utilization over the last N minutes
  • Logging information sent to stdout and reports

- Python
Published by jdh4 about 1 year ago

job-defense-shield - v1.0.1

  • Fixed printing of emails
  • Fixed log path for report demo in docs
  • Removed dates from reports

- Python
Published by jdh4 about 1 year ago

job-defense-shield - v1.0.0

First release

Code was published to PyPI today.

- Python
Published by jdh4 about 1 year ago