Recent Releases of regulondbht-etl
regulondbht-etl - RegulonDB HT ETL 2.0.1
RegulonDB HT ETL
This version solves the changes on RegulonDB 13.6.0
2.0.1 - 2025-03-18
Added
- Without changes.
Changed
- AuthorsData, handle commas in tsv files, now .bed files can be readed.
- RNA collection now generates metadata info.
- Growth Conditions can be loaded from DatasetMetadata file and can handle arrays of GC.
- Some readed files starts with an empty colum, FIXED.
- Experiments IDs can be readed correctly.
- Peaks score property error in reading, FIXED.
- NLP Growth Conditions, adjust to have more than one term.
Deprecated
- Without changes.
Removed
- Without changes.
Fixed
- Without changes.
Security
- Without changes.
- Python
Published by PhillBet over 1 year ago
regulondbht-etl - RegulonDB HT ETL 2.0.0
RegulonDB HT ETL
This version solves the changes on RegulonDB 13.5.0
2.0.0 - 2024-10-24
New updates in the input data requires constantly software updates and code was not properly modulated, now to avoid future problems in maintenance we decide to rework the full code and make it more dev friendly.
Added
- Feb, 2024
- Publications from datasets pmids can be extracted.
- Source Serie object.
- ObjectTested object.
- ObjectTested Genes.
- Sample object.
- LinkedDatasets object.
- ReleaseControl object.
- Temporal ID property.
- Reference Genome property.
- Assembly Genome ID property.
- Five Prime Enrichment property.
- Experiment Condition property.
- Cut Off property.
- Public Notes property.
- Source Reference Genome property.
- Collection Data object property.
- Metadata object property.
- Mar, 2024
- Final JSON file are generated with respective names.
- External Reference property added to dataset object.
- Aug, 2024
- Metadata objects now have an ID.
- New utility function added, findmanyindictlist(), Finds dictionaries in a dictionary List by certain key.
- Oct, 2024
- Growth Conditions function that process GC from dataset metadata.
- Objects Tested now have AbbName property.
Changed
- Feb, 2024
- New code structure is now implemented.
- Mar, 2024
- DB Links field contains DBName and DBLink
- Apr, 2024
- Properties that returns a dictionary now are assembled in an upper level.
- Authors Data is processed in the dataset object.
- Uniformized Data is processed in the dataset object, tfBinfings (sites, peaks) ready.
- May, 2024
- Uniformized Data is processed in the dataset object, TUs ready.
- Uniformized Data is processed in the dataset object, TSS ready.
- Uniformized Data is processed in the dataset object, TTS ready.
- Uniformized Data is processed in the dataset object, multiple sources ready.
- Utils module cleaned, repeated functions, functions that must be classes and unused functions.
- Added modules for Gene Expression processing with the uniformized data.
- Added class Summary, there are properties to define that need to be reviewed by the project manager.
- Aug, 2024
- GeneExpression's IDs are now more readable, removed properties without values.
- Snakemake Config files modified to adjust for new project schema.
- Oct, 2024
- Update Gene Expression Temp ID and ID format.
Deprecated
- Old code structure.
Removed
- May, 2024
- src/htetl/datasetmetadata.py
- src/htetl/peaksdatasets.py
- src/htetl/sitesdataset.py
- src/htetl/tssdatasets.py
- src/htetl/ttsdatasets.py
- src/htetl/tudatasets.py
- src/htetl/geneexpressiondatasetmetadata.py
- src/htetl/geneexp_datasets.py
- src/htetl/nlpgrowth_conditions.py
Fixed
- Feb, 2024
- ObjectTested returned null objects that can't be read and have to be a list.
- ObjectTested tried to process TF that doesn't exist.
- Mar,2024
- Sample Replicates IDs and Linked Datasets now are List of List of IDs.
- Aug, 2024
- TUs objects were missing name. Added.
- Sites' left and right position were String instead of Integer. Fixed
- NLPGC's terms were not object arrays, temporalid were generating wrong, dataset ids does not match with correspond Dataset ID. Fixed.
- GeneExpression's temporal_id were generating wrong, dataset ids does not match with correspond Dataset ID. Fixed.
- UniformizedData Domain Objects Base were not adding datasetids. Fixed.
- UniformizedData Peaks and Sites were not adding datasetids. Fixed
- Uniformized_Data TSS, TTS and TU LeftEnd and RightEnd properties were wrong written. Fixed
- ObjectTested externalCrossReferenceID property were wrong written. Fixed
- Genes were not generating correctly geneNames. Fixed
- Many JSON files were generated with incorrect properties. Fixed
Security
- Without changes.
- Python
Published by PhillBet over 1 year ago
regulondbht-etl - RegulonDB HT-ETL 1.0.0
RegulonDB HT ETL
This version solves the changes on RegulonDB 12.0.0
1.0.0 - 2023-10-24
Added
- New config file for genes by bnumbers.
Changed
- Gene Expression now gets a bnumbers file to get genes.
- Input files changed format and directries structure.
Deprecated
- Individuals queries to MongoDB for get genes by bnumbers.
Removed
- Without changes.
Fixed
- Without changes.
Security
- Without changes.
- Python
Published by PhillBet over 2 years ago
regulondbht-etl - Pre-Release RegulonDBHT-ETL
HT ETL Changelog
This version is a pre-release for extraction process with previus HT data.
0.0.1 - 2023-09-14
Added
- In process
Changed
- In process
Deprecated
- In process
Fixed
- In process
To Fix
- In process
- Python
Published by PhillBet almost 3 years ago