Recent Releases of paleomix
paleomix - PALEOMIX v1.3.10
Fixed
- Fixed
examplecommand for the phylogenetic pipeline failing. - Removed 'defaults' conda channel
- Python
Published by MikkelSchubert about 1 year ago
paleomix - PALEOMIX v1.3.9
This is a small bug-fix release before support for several very old versions of Python is officially dropped.
Fixed
- Fixed use of deprecated
pipesmodule - Fixed use of deprecated functions for accessing bundled resources
- Fixed regression in tests
- Fixed the
dupcheckcommand not being exposed, despite error commands indicating how to run it
- Python
Published by MikkelSchubert over 1 year ago
paleomix - PALEOMIX v1.3.8
Fixed
- Added 'genotyping' alias to match phylo pipeline documentation (issue #48).
- Fixed options for BWA
alnbeing applied tosamseandsampe(issue #49).
- Python
Published by MikkelSchubert about 3 years ago
paleomix - PALEOMIX v1.3.7
Added
- Added example to BAM pipeline YAML template, showing how to increase the maximum allowed Phred score for AdapterRemoval. This is needed due to the value being capped at 41 by default, lower than the maximum observed in some modern data.
Fixed
- Fixed regression in config file parsing, that would cause failure if no value was specified for an option.
- Fixed error message not being printed correctly when attempting to use Phred+64 data with BWA mem/bwasw.
- Fixed regressions that prevented the use of "regions of interest" in the BAM pipeline.
- Fixed failure when using
--list-output-filesand auxilary files were missing or dependecies were unmet. Output files are now printed.
- Python
Published by MikkelSchubert almost 4 years ago
paleomix - PALEOMIX v1.3.6
Added
- Added explicit support for the AdapterRemoval
--trim5pand--trim3poptions, which may take one or two values (as a list).
Changed
- User options for AdapterRemoval are no longer restricted by a whitelist.
- Python
Published by MikkelSchubert over 4 years ago
paleomix - PALEOMIX v1.3.5
Added
- Added command-line option to reduce/turn off validation with picard ValidateSamFile.
- Python
Published by MikkelSchubert over 4 years ago
paleomix - PALEOMIX v1.3.4
Added
- Added support for the
--collapse-conservativelyAdapterRemoval option.
Changed
- Avoid creating log files on invalid commandline arguments.
- The directory for the log-file is created automatically if it does not exist.
- No longer prints stack-trace if the user terminates a pipeline wiht Ctrl + C.
- Log-level command-line options are now case insensitive.
- The default number of threads used by AdapterRemova, Bowtie2, and BWA are now scaled based on the available number of cores instead of defaulting to 1 thread.
- Less exhaustive validation of .bai index files using picard ValidateSamFile. The overhead of validating these files was excessive in light of the small benefit.
Fixed
- Fixed regression causing certain option to not be applied when mapping with BWA.
- Fixed --log-level not having an effect.
- Fixed possible infinite recursion when using lazily created log-files.
- Fixed BAM pipeline failing if mapDamage feature was not explicitly set.
- Fixed default values of 0 or 1 not being listed in commnad-line help text.
- Python
Published by MikkelSchubert over 4 years ago
paleomix - PALEOMIX v1.3.3
Fixed
- Fixed regression in BAM pipeline summary node, causing failing if there were zero hits or reads.
- Fixed BAM validation always being run in big-genome mode, resulting in some checks being disabled despite being applicable.
- Python
Published by MikkelSchubert about 5 years ago
paleomix - PALEOMIX v1.3.2
Minor bug-fix release.
Added
- Added
-vand--versionto all command-line tools. - Added the default values (if any) to the help-strings of all command-line options.
Changed
- Decoupled
--log-levelfrom command-line logging. Changed default log-level to ERROR and made it apply to automatically generated log files as well.
Fixed
- Fixed the pipeline failing on
jre_options(nowjre-option) in config files. - Fixed the pipeline failing on empty options in config files from PALEOMIX v1.2.x.
- Fixed Bowtie2 using command-line options from the BWA makefile section
- Fixed conda installation instructions and environment file for PALEOMIX 1.3.x.
- Python
Published by MikkelSchubert almost 6 years ago
paleomix - PALEOMIX v1.3.1
Minor bug-fix release.
Fixed
- Updated shebangs to 'python3'. Patch courtesy of Andreas Tille.
- Added minimal support for previously removed command-line options, to prevent the pipelines from failing when used with old configuration files.
- Python
Published by MikkelSchubert almost 6 years ago
paleomix - PALEOMIX v1.3.0
PALEOMIX v1.3.0 is a major maintenance release, with the goal of porting PALEOMIX to Python 3 and to prepare for further work to update and expand the pipeline. A number of deprecated tools and options have been removed, as has support for very old versions of tools used by the pipelines.
Existing makefiles are compatible with PALEOMIX 1.3.0 with a few notable exceptions:
- BAM pipeline support for the GATK Indel Realigner has been removed, and the options 'RealignedBAM' and 'RawBam' no longer have any effect. These are now simply ignored and a "raw" BAM is always produced. The Indel Realigner tool was removed from GATK as of GATK4 (released 2018) and continued support is not deemed worthwhile due to the minor benefit from running the Indel Realigner.
- BAM pipeline support for generating PCR duplicate histogram files for use with PreSeq has been removed. The option is simply ignored.
- BAM pipeline support for AdapterRemoval options --pcr1 and --pcr2 has been removed, as these options are long deprecated and will be removed from AdapterRemoval. Use the --adapter1 and --adapter2 options as described in the BAM pipeline documentation.
- Phylo pipeline options for BCFTools must be be updated to replace the option invoking the consensus caller ("-g") with "-c", or with "-m" for the multiallelic caller.
- The Phylo pipeline genotyping methods 'Random Sampling' and 'Reference Sequence' are no longer supported.
Please open an issue if features or options import to your work have been removed.
Added
- The BAM and Phylo pipelines print warnings when deprecated/removed options are used
- A log-file is automatically created if errors are encountered during run-time.
Changed
- Converted project from Python 2.7 to Python 3.5+.
- Removed internal copy of pyyaml and added dependency on ruamel.yaml.
- Command-line output was changed to a simpler, log-log output using coloredlogs.
- Bumped minimum version requirements for most tools used by the pipelines; minimum versions were largely informed by availability in Debian stretch.
- Changed naming of BAM index files created by the BAM pipeline from 'filename.bai' to 'filename.bam.bai' in order to match the behavior of standard tools (e.g. samtools).
- The filenames of input FASTQ files are now used in the intermediate file-structure, with the goal of making the pipeline more robust to changes in input files.
- The pipeline no longer fails if a command generates more files than expected, instead this merely triggers a warning.
- Moved PCR duplicate filtering and rescaling to 'Features' in BAM pipeline makefiles.
Fixed
- Fixed spurious warnings from pysam (htslib) when opening BAMs without index files.
Removed
- Removed limited support for 32 bit systems
- Removed the 'cat' command.
- Removed the 'duphist' command and the corresponding BAM pipeline feature.
- Removed the 'ena' command.
- Removed the 'sample_pileup' command.
- Removed the 'retable' command. A more performant standalone version can be found at https://github.com/MikkelSchubert/retable
- Removed the bam_pipeline 'remap' command.
- Removed entry-points other than the 'paleomix' command; that is to say the stand- alone 'bampipeline', 'phylopipeline', etc. commands.
- Removed data for the original publication of PALEOMIX. The instructions in that publication are outdated and cannot be carried out for current versions of PALEOMIX.
- Removed support for configuration files with per-host sections. Files are now assumed to contain only one set of command-line options.
- Removed --to-dot option for pipelines.
- Removed keyboard shortcuts for modifying pipeline behavior during runtime.
- Removed undocumented options from Zonkey.
- Removed undocumented codeml support from the Phylo pipeline.
- Removed 'Random Sampling' and 'Reference Sequence' genotyping methods.
- Removed makefile metadata (filename, hash, mtime) from BAM pipeline summary reports.
- Removed support for compressing intermediate FASTQ files using bzip2. Reads are now always compressed using gzip.
- Removed ability to merge FASTQ files with the the SplitLanesByFilenames option. Files are now always split, meaning that individual FASTQ files or pairs are mapped.
- Removed support for indel realignment using GATK due to its removal from GATK.
- Removed creation of FASTA sequence dictionaries as they were only needed by GATK.
- Removed support for labels for BAM pipeline prefixes.
- Python
Published by MikkelSchubert almost 6 years ago
paleomix - PALEOMIX v1.2.14
Changed
- Improved handling of K-groups in zonkey database files
- Change BAM pipeline version requirement for GATK to < v4.0, as the the Indel Realigner has been removed in GATK v4.0
Fixed
- Fixed version detection of GATK for v4.0 (issue #23)
- Python
Published by MikkelSchubert over 6 years ago
paleomix - PALEOMIX v1.2.13.5
Fixed
- Ignore ValidateSamFile warning REFSEQTOOLONGFOR_BAI warning when processing genomes with contigs too large for BAI index files.
- Python
Published by MikkelSchubert over 6 years ago
paleomix - PALEOMIX v1.2.13.4
Fixed
- Improved detection of Picard versions in cases where 'java' outputs additional text.
- Python
Published by MikkelSchubert about 7 years ago
paleomix - PALEOMIX v1.2.13.3
Fixed
- Fixed validation/read counting of pre-trimmed reads not including the mate 1 files of paired-end reads. This resulted in the 'seqretainedreads' count being half the expected value.
- Python
Published by MikkelSchubert over 7 years ago
paleomix - PALEOMIX v1.2.13.2
Fixed
- Additional fixes to divisions by zeros in summary calculations.
- Fixed 'empty file' message if FASTA file ends with empty sequence.
- Renamed pre-trimmed FASTQ validation/statistics file, to avoid failure if an older run was resumed.
- Python
Published by MikkelSchubert about 8 years ago
paleomix - PALEOMIX v1.2.13.1
Fixed
- Fixed divisions by zero if empty files are listed as pre-trimmed reads.
- Python
Published by MikkelSchubert about 8 years ago
paleomix - PALEOMIX v1.2.13
Added
- Added 'retable' command for pretty-printing whitespace separated data in the previously used by the BAM pipeline.
- Basic statistics are collected for pre-trimmed reads in the BAM pipeline.
Changed
- BAM pipeline tables are now saved as tab separated columns. The old pretty-printed format may be produced by running the 'retable' tool on the resulting files.
- Memory usage for the 'coverage' and 'depths' commands were reduced when using very big BED files.
Fixed
- Fixed input / output files not being listed in 'pipe.errors' files.
- Use the same max open files limit for picard (ulimit -n minus headroom) when determining if the default should be changed and as the final value.
- Removed explicit test for JRE version, which was failing on some (valid) runtimes. Java programs are still checked prior to running pipelines.
- Fixed changes to recent version of Pysam breaking the alignment step in the BAM pipeline.
- Fixed various test failures resulting in different environments.
- Fixed validation of pre-trimmed FASTQ files in BAM pipelines stopping early if empty files are encountered.
Removed
- Removed automatic migrating of configuration files created PALEOMIX PALEOMIX prior to v1.2.0.
- Previously deprecated support for existing BAM files was removed from the BAM pipeline.
- Python
Published by MikkelSchubert about 8 years ago
paleomix - PALEOMIX v1.2.12
Fixed
- Fixed input / output files not being listed in 'pipe.errors' files.
- Use the same max open files limit for picard (ulimit -n minus headroom) when determining if the default should be changed and as the final value.
Added
- The 'vcftofasta' command now supports VCFs containing haploid genotype calls, courtesy of Graham Gower.
Changed
- Require Pysam version 0.10.0 or later.
- Python
Published by MikkelSchubert almost 9 years ago
paleomix - PALEOMIX v1.2.11
Fixed
- Fixed unhandled exception if a FASTA file for a prefix is missing in a BAM pipeline makefile.
- Fixed the 'RescaleQualities' option not being respected for non-global options in BAM pipeline makefiles.
- Python
Published by MikkelSchubert about 9 years ago
paleomix - PALEOMIX v1.2.10
Added
- Preliminary support for CSI indexed BAM files, required for genomes with chromosomes > 2^29 - 1 bp in size. Support is still missing in HTSJDK, so GATK cannot currently be used with such genomes. CSI indexing is enabled automatically when required.
Fixed
- Reference sequences placed in the current directory no longer cause the BAM pipeline to complain about non-writable directories.
- The maximum number of temporary files used by picard will no longer be increased above the default value used by the picard tools.
Changed
- The 'Status' of processes terminated by the pipeline will now be reported as 'Automatically terminated by PALEOMIX'. This is to help differentiate between processes that failed or were killed by an external source, and processes that were cleaned up by the pipeline itself.
- Pretty-printing of commands shown when commands fail have been revised to make it more readable, including explicit descriptions when output is piped from one process to another and vice versa.
- Commands are now shown in a format more suitable for running on the command-line, instead of as a Python list, when a node fails. Pipes are still specified separately.
- Improved error messages for missing programs during version checks, and for exceptions raised when calling Popen during version checks.
- Strip MC tags from reads with unmapped mates during cleanup; this is required since Picard (v2.9.0) ValidateSamFile considers such tags invalid.
- Python
Published by MikkelSchubert about 9 years ago
paleomix - PALEOMIX v1.2.9
Fixed
- Improved handling of BAM tags to prevent unintended type changes.
- Fixed 'rmdup_collapsed' underreporting the number of duplicate reads (in the 'XP' tag), when duplicates with different CIGAR strings were processed.
Changed
- PCR duplicates detected for collapsed reads using 'rmdup_collapsed' are now identified based on alignments that include clipped bases. This matches the behavior of the Picard 'MarkDuplicates' command.
- Depending on work-load, 'rmdup_collapsed' may now run up to twice as fast.
- Python
Published by MikkelSchubert about 9 years ago
paleomix - PALEOMIX v1.2.8
This is minor release of PALEOMIX, which includes an important bug-fix to BAM Pipelines using the BWA 'mem' or 'bwasw' algorithms. Previously, user-specified command-line parameters would not be correctly applied to the BWA commands, when either of these two algorithms were used. This has now been corrected.
Added
- Added FILTER entry for 'F' filter used in vcf_filter. This corresponds to heterozygous sites where the allele frequency was not determined.
- Added 'dupcheck' command. This command roughly corresponds to the DetectInputDuplication step that is part of the BAM pipeline, and attempts to identify duplicate data (not PCR duplicates), by locating reads mapped to the same position, with the same name, sequence, and quality scores.
- Added link to sample data used in publication to the Zonkey documentation.
Changed
- Only letters, numbers, and '-', '_', and '.' are allowed in sample-names used in Zonkey, in order to prevent invalid filenames and certain programs breaking on whitespace. Trailing whitespace is stripped.
- Show more verbose output when building Zonkey pipelines.
- Picard tools version 1.137 or later is now required by the BAM pipeline. This is nessesary as newer BAM files (header version 1.5) would fail to validate when using earlier versions of Picard tools.
Fixed
- Fixed validation nodes failing on output paths without a directory.
- Fixed possible uncaught exceptions when terminating cat commands used by FASTQ validation nodes resulting in loss of error messages.
- Fixed makefile validation failing with an unhandled TypeError if unhashable types were found in unexpected locations. For example, a dict found where a subset of strings were allowed. These now result in a proper MakeFileError.
- Fixed user options in the 'BWA' section of the BAM Pipeline makefiles not being correctly applied when using the 'mem' or the 'bwasw' algorithms.
- Fixed some unit tests failing when the environment caused getlogin to fail.
- Python
Published by MikkelSchubert about 9 years ago
paleomix - PALEOMIX v1.2.7
Added
- PALEOMIX now includes the 'Zonkey' pipeline, a pipeline for detecting equine F1 hybrids from archeological remains. Usage is described in the documentation.
Changed
- The wrongly named per-sample option 'Gender' in the phylogenetic pipeline makefile has been replaced with a 'Sex' option. This does not break backwards compatibility, and makefiles using the old name will still work correctly.
- The 'RescaleQualities' option has been merged with the 'mapDamage' Feature in the BAM pipeline makefile. The 'mapDamage' feature now takes the options 'plot', 'model', and 'rescale', allowing more fine-grained control.
Fixed
- Fixed the phylogenetic pipeline complaining about missing sample genders (now sex) if no regions of interest had been specified. The pipeline will now complain about there being no regions of interest, instead.
- The 'random sampling' genotyper would misinterpret mapping qualities 10 (encoded as '+') and 12 (encoded as '-') as indels, resulting in the genotyping failing. These mapping qualities are now correctly ignored.
- Python
Published by MikkelSchubert over 9 years ago
paleomix - Minor revision
Changed
- PALEOMIX now uses the 'setproctitle' for better compatibility; installing / upgraing PALEOMIX using pip (or equivalent tools) should automatically install this dependency.
Fixed
- mapDamage plots should not require indexed BAMs; this fixed missing file errors for some makefile configurations.
- Version check for java did now works correctly for OpenJDK JVMs.
- Pressing 'l' or 'L' to list the currently running tasks now correctly reports the total runtime of the pipeline, rather than 0s.
- Fixed broken version-check in setup.py breaking on versions of python older than than 2.7, preventing meaningful message (patch by beeso018).
- The total runtime is now correctly reported when pressing the 'l' key during execution of a pipeline.
- The logger will automatically create the output directory if this does not already exist; previously logged messages could cause the pipeline to fail, even if these were not in themselves fatal.
- Executables required executables for version checks are now included in the prior checks for missing executables, to avoid version-checks failing due to missing executables.
Added
- PALEOMIX will attempt to automatically limit the per-process maximum number of file-handles used by when invoking Picard tools, in order to prevent failures due to exceeding the system limits (ulimit -n).
- Python
Published by MikkelSchubert over 9 years ago
paleomix - PALEOMIX v1.2.5
Minor revision:
Changed
- Improved information capture when a node raises an unexpected exception, mainly for nodes implementing their own 'run' function (not CommandNodes).
- Improved printing of the state of output files when using the command-line option --list-output-files. Outdated files are now always listed as outdated, where previously these could be listed as 'Missing' if the task in question was queued to be run next.
- Don't attempt to validate prefixes when running 'trim_pipeline'; note that the structure of the Prefix section the makefile still has to be valid.
- Reverted commit normalizing the strand of unmapped reads.
- The commands 'paleomix coverage' and 'paleomix depths' now accept records
lacking read-group information by default; these are record as
in the sample and library columns. It is further possible to ignore all read-group information using the --ignore-readgroups command-line option. - The 'bam_pipeline mkfile' command now does limited validation of input 'SampleSheet.csv', prints generated targets sorted alphabetically, and automatically generates unique names for identically named lanes. Finally, the target template is not included automatically generating a makefile.
- The 'coverage' and 'depth' commands are now capable of processing files containing reads with and without read-groups, without requiring the use of the --ignore-readgroups command-line option. Furthermore, reads for which the read-group is missing in the BAM header are treated as if no readgroup was specified for that read.
- The 'coverage' and 'depth' command now checks that input BAM files are sorted during startup and while processing a file.
- Normalized information printed by different progress UIs (--progress-ui), and included the maximum number of threads allowed.
- Restructured CHANGELOG based on http://keepachangelog.com/
Fixed
- Fixed mislabeling of BWA nodes; all were labeled as 'SE'.
- Terminate read duplication checks when reaching the trailing, unmapped reads; this fixes uncontrolled memory growth when an alignment produces a large number of unmapped reads.
- Fixed the pipeline demanding the existence of files from lanes that had been entirely excluded due to ExcludeReads settings.
- Fixed some tasks needlessly depending on BAM files being indexed (e.g. depth histograms of a single BAM), resulting in missing file errors for certain makefile configurations.
- Fixed per-prefix scan for duplicate input data not being run if no BAMs were set to be generated in the makefile, i.e. if both 'RawBAM' and 'RealignedBAM' was set to 'off'.
Deprecated
- Removed the BAM file from the bam_pipeline example, and added deprecation warning; support for including preexisting BAMs will be removed in a future version of PALEOMIX.
- Python
Published by MikkelSchubert about 10 years ago
paleomix - PALEOMIX v1.2.4
Minor revision: - Fix regression causing 'fixmate' not to be run on paired-end reads. This would occasionally cause paired-end mapping to fail during validation. - Include PATH in 'pipe.errors' file, to assist debugging of failed nodes.
- Python
Published by MikkelSchubert over 10 years ago
paleomix - PALEOMIX v1.2.3
Minor revision:
2016-03-11: Added the ability to the pipelines to output the list of input
files required for a given makefile, excluding any file built
by the pipeline itself. Use the --list-output-files command-
line option to view these.
2016-03-11: Updated 'bam_pipeline' makefile template; prefixes and targets
are described more explicitly, and values for the prefix are
commented out by default. The 'Label' option is no included in
the template, as it is considered deprecated.
2016-03-11: Allow the 'trim_pipeline' to be run on a makefile without any
prefixes; this eases use of this pipeline in the case where a
latter mapping is not wanted.
2016-03-11: Improved handling of unmapped reads in 'paleomix cleanup';
additional flags (in particular 0x2; proper alignment) are now
cleared if the mate is unmapped, and unmapped reads are always
represented on the positive strand (clearing 0x4 and / or 0x20).
- Python
Published by MikkelSchubert over 10 years ago