Recent Releases of cdm-reader-mapper

cdm-reader-mapper - v2.1.0

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

New features and enhancements

  • implement both wrapper functions read and write that call the appropriate function based on mode argument (PR/238):

    • mode == "mdf"; calls cdm_reader_mapper.read_mdf
    • mode == "data"; calls cdm_reader_mapper.read_data or cdm_reader_mapper.write_data
    • mode == "tables"; calls cdm_reader_mapper.read_tables or cdm_reader_mapper.write_tables
  • optionally, call cdm_reader_mapper.read_tables with either source file or source directory path (PR/238).

  • apply attribute to DataBundle.data if attribute is nor defined in DataBundle (PR/248).

  • apply pandas functions directly to DataBundle.data by calling DataBundle.<pandas-func> (PR/248).

  • make DataBundle support item assignment for DataBundle.data (PR/248).

  • optionally, apply selections to DataBundle.mask in DataBundle.select_* functions (PR/248).

  • cdm_reader.reader.read_tables: optionally, set null_label (PR/242)

  • new method function: DataBundle.select_where_all_false (PR/242)

  • new method functions: DataBundle.split_* which split a DataBundle into two new DataBundles containing data selected and rejected after user-defined selection criteria (PR/242)

    • DataBundle.split_by_boolean_true
    • DataBundle.split_by_boolean_false
    • DataBundle.split_by_column_entries
    • DataBundle.split_by_index
  • implement pandas indexer like iloc for not chunked data (PR/242)

Internal changes

  • cdm_reader_mapper.common.select: restructure, simplify and summarize functions (PR/242)
  • split DataBundle class into main class (cdm_reader_mapper.core._utilities) and method function class (cdm_reader_mapper.core.databundle) (PR/242)

Breaking changes

  • remove property tables from DataBundle object. Instead, DataBundle.map_model overwrites .DataBundle.data (PR/238).
  • set default overwrite values from True to False that is consistent with pandas inplace argument and rename overwrite to inplace (PR/238, PR/248).
  • inplace returns None that is consistent with pandas (PR/242)
  • DataBundle method functions return a DataBundle instead of a pandas.DataFrame (PR/248).
  • DataBundle.select_* functions write only selected entries to DataBundle.data and do not take other list entries from common.select_* function returns into account (PR/248).
  • select functions do not reset indexes by default (PR/242)
  • rename DataBundle.select_* functions:

    • DataBundle.select_true -> DataBundle.select_where_all_boolean
    • DataBundle.select_from_list -> DataBundle.select_where_entry_isin
    • DataBundle.select_from_index -> DataBundle.select_where_index_isin
  • rename cdm_reader_mapper.common.select_* functions and make them returning a tuple of selected and rejected data after user-defined selection criteria (PR/242):

    • select_true -> split_by_boolean_true
    • select_from_list -> split_by_column_entries
    • select_from_index -> spit_by_index

Bug fixes

  • cdm_reder_mapper.metmetpy: set deck keys from ??? to d??? in icoads json files which makes values accessible again (PR/238).
  • cdm_reder_mapper.metmetpy: set imma1 to icoads and immt to gcc in icoads/gcc json files which makes properties accessible again (PR/238).
  • DataBundle.copy function now makes a real deepcopy of DataBundle object (PR/248).
  • correct key index->section for self.df.attrs in open_netcdf (PR/252)
  • cdm_reader_mapper.map_model: return null_label if conversion fails (PR/242)
  • keep indexes during duplicate check (PR/242)

- Python
Published by ludwiglierhammer 11 months ago

cdm-reader-mapper - v2.0.1

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

Announcements

This release drops support for Python 3.9 and adds support for Python 3.13 (PR/228, PR/229)

New features and enhancements

  • add environment.yml file (PR/229)
  • cdmreadermapper now separates the optional dependencies into dev and docs recipes (PR/232).

    • $ python -m pip install cdmreadermapper # Install minimum dependency version
    • $ python -m pip install cdmreadermapper[dev] # Install optional development dependencies in addition
    • $ python -m pip install cdmreadermapper[docs] # Install optional dependencies for the documentation in addition
    • $ python -m pip install cdmreadermapper[all] # Install all the above for complete dependency version

Internal changes

  • GitHub workflow for testing_suite now uses uv for environment management, replacing micromamba (PR/228)
  • rename ci/requirements to CI and tidy up requirements/dependencies (PR/229)

- Python
Published by ludwiglierhammer about 1 year ago

cdm-reader-mapper - v2.0.0

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

New features and enhancements

  • New core DataBundle object including callable cdm_mapper, metmemtpy and operations methods (#84, #188, #197)
  • Update readthedocs documentation (#191, #197)
  • new function: write_data to write MDF data and validation mask according to write_tables for writing CDM tables (#201)
  • new function: read_data to read MDF data and validation mask according to read_tables for reading CDM tables (#201)
  • new property: DataBundle.encoding (#222)
  • add overwrite option to some DataBundel method functions (#224)

Breaking changes

  • cdm_mapper: map_model returns pandas.DataFrame instead of CDM dictionary (#189)
  • cdm_mapper: rename function cdm_to_ascii to write_tables (#182, #185)
  • cdm_mapper: update parameter names and list of functions read_tables and write_tables (#185)
  • main cdm_mapper, mdf_reader and duplicates modules are directly callable from cdm_reader_mapper (#188)
  • new list of imported submodules: map_model, cdm_tables, read_tables, write_tables, duplicate_check and read_mdf
  • removed list of imported submodules: cdm_mapper, common, mdf_reader, metmetpy, operations
  • remove imported submodules from cdm_mapper, mdf_reader (#188)
  • read_tables: returning DataBundle object (#188)
  • read_tables: resulting dataframe always includes multi-indexed columns (#188)
  • duplicates is now a direct submodule of cdm_reader_mapper (#188)
  • import read function from mdf_reader.read as read_mdf (#188)
  • read_mdf: returning DataBundle object (#188)
  • read_mdf: remove parameter out_path to dump attribute information on disk (#201)
  • move function open_code_table from common.json_dict to cdm_mapper.codes.codes (#221)
  • operations to common (#224)
  • cdm_mapper: rename tablewriter to writer and tablereader to reader (#224)
  • mdf_reader: rename write to writer and read to reader (#224)
  • metmetpy: gather correction functions to correct module and validation functions to validate module (#224)
  • DataBundle: remove properties selected, deselected, tablesdupflagged and tablesdupsremoved (#224)

Internal changes

  • cdm_mapper: dtype conversion from write_tables to new submodule _conversions of map_model (#189)
  • cdm_mapper: rename mappings to _mapping_functions (#189)
  • cdm_mapper: mapping functions from mapper to new submodule _mappings (#189)
  • cdm_mapper: save utility functions from table_reader.py and table_writer.py to _utilities.py (#185)
  • reduce complexity of several functions (#25, #200):

    • mdf_reader.read.read
    • mdf_reader.validate.validate
    • mfd_reader.utils.decoders.signed_overpunch
    • cdm_mapper._mappings._mapping
    • metmetmpy.station_id.validate.validate
  • split mdf_reader.utils.auxiliary into mdf_reader.utils.filereader, mdf_reader.utils.configurator and mdf_reader.utils.utilities (#25, #200)

  • simplify cdm_mapper.read_tables function (#192)

  • mdf_reader: Refactored Configurator class, Configurator.open_pandas method, to handle looping through rows (#208, #210)

  • mdf_reader: Refactored Configurator class, Configurator.open_data method, to avoid creating a pre-validation missing_value mask (#216)

  • mdf_reader: move validate to utils.validators (#216)

  • mdf_reader: no need for multi-column key codes (e.g. ("core", "VS")) (#221)

  • mdf_reader.utils.validator: simplify function code_validation (#221)

  • cdm_mapper.codes.common: convert range-key properties to list (#221)

  • testing_suite: new chunksize test with icoadsr300d721 (#222)

  • mdf_reader, cdm_nmapper: use model-depending encoding while writing data on disk (#222)

  • code restructuring (#224)

  • remove unused functions and methods (#224)

Bug fixes

  • Solve SettingWithCopyWarning (#151, #184)
  • mdf_reader: utils.converters.decode returns values not only None (#214)
  • mdf_reader: solving misleading reading due to German "umlauts"(#212, #214, #222)

- Python
Published by ludwiglierhammer about 1 year ago

cdm-reader-mapper - v1.0.2

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer)

Announcements

  • New PyPi Classifiers:

    • Development Status :: 5 - Production/Stable
    • Development Status :: Intended Audience :: Science/Research
    • License :: OSI Approved :: Apache Software License
    • Operating System :: OS Independent

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v1.0.1

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer)

Announcements

  • set package version to v1.0.1

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v1.0.0

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer)

Announcements

  • Final version used for GLAMOD marine processing release 7.0

Bug fixes

  • cdm_mapper: Two reports that describe each other as best duplicates are not flagged as duplicates (DupDetect) (:pull:149)
  • cdm_mapper: Reindex only if null values available (DupDetect) (:pull:153)

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v0.4.3

Contributors to this version: Ludwig Lierhammer (:user:ludwiglierhammer)

Announcements ^^^^^^^^^^^^^ * First release on pypi (:issue:17) * First release on zenodo (:issue:18)

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v0.4.2

Contributors to this version: Ludwig Lierhammer (:user:ludwiglierhammer)

Announcements ^^^^^^^^^^^^^ * First release on pypi (:issue:17) * First release on zenodo (:issue:18)

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v0.4.1

Contributors to this version: Ludwig Lierhammer (:user:ludwiglierhammer)

Announcements ^^^^^^^^^^^^^ * First release on pypi (:issue:17) * First release on zenodo (:issue:18)

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v0.4.0

Contributors to this version: Ludwig Lierhammer (:user:ludwiglierhammer) and Joseph Siddons (:user:jtsiddons)

Announcements ^^^^^^^^^^^^^ * Now under Apache v2.0 license (:pull:69) * First release on pypi (:issue:17) * First release on zenodo (:issue:18)

New features and enhancements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * common.getting_files.load_file: optionally, load data within data reference syntax (:pull:41) * common.getting_files.load_file: optionally, clear cache directory (:pull:45) * reworked readthedocs documentation for gathered cdm_reader_mapper package (:issue:19, :pull:83) * mdf_reader: new validation function for datetime objects (:pull:89) * mdf_reader: select time period with new arguments year_init ad year_end (:pull:98) * cdm_mapper: duplicate check using recordlinkage (:pull:81) * mdf_reader.read: optionally, set left and right time bounds (year_init and year_end) (:issue:11, :pull:97) * mdf_reader.read: optionally, set both external schema and code table paths and external schema file (:issue:47, :pull:111) * cdm_mapper: Change both columns history and reportquality during duplicatecheck (:pull:112) * cdm_mapper: optionally, set column names to be ignored while duplicate check (:pull:115) * cdm_mapper: optionally, set offset values for duplicatecheck (:pull:119) * ``cdmmapper: optionally, set column entries to be ignored while duplicate_check (:pull:`119`) *cdmmapper: add both column namesstationspeedandstationcourseto default duplicate check list (:pull:`119`) *cdmmapper`: optionally, re-index data in ascending order according to the number of nulls in each row (:pull:119`)

Breaking changes ^^^^^^^^^^^^^^^^ * set chunksize from 10000 to 3 in testing suite (:pull:35) * cdm_mapper: read header column location_quality from (c1, LZ) and set fillvalue to 0 (:issue:36, :pull:37) * ``cdmmapper: set default value of header columnreportqualityto2(:issue:`36`, :pull:`37`) * reading C-RAID data: set decimal places according to input file data precision (:pull:`60`) * always convert data types of bothintandfloatin schemas into default data types (:issue:`59`, :pull:`60`) *cdmmapper.mapmodel: call function without input parameterdataatts(:issue:`66`, :pull:`67`) *decimalplacesinformation is moved frommdfreader.schematocdmmapper.tables;decimalplacesin user-given schemas will be ignored (:issue:`66`, :pull:`67`) *cdmmapperdoes not need any attribute information frommdfreader(:issue:`66`, :pull:`67`) *cdmmapper: map ICOADS wind direction data (361->0;362->np.nan) (:pull:`82`) *cdmmapper: set fill_value toUNKNOWNfor C-RAID'sprimarystationid(:pull:`93`) *cdmmapper: map C-RAID quality flags to CDM quality flags (:pull:`94`) *mdfreader: summarize schema and code tables (:issue:`11`, :pull:`97`) *mdfreader: renamecraidtocraid,gccimmttogccandimma1toicoads(:issue:`11`, :pull:`97`) *cdmmapper: summarize tables and code tables (:issue:`11`, :pull:`97`) *cdmmapper: renamecraidtocraidandgccmappingtogcc(:issue:`11`, :pull:`97`) *metmetpy: renameimmttogccandimmatoicoads(:issue:`11`, :pull:`97`) *cdmmapper.mapmodel``: use standardized imodelname as (e.g. icoadsr300d701) (:issue:11, :pull:97) * mdf_reader.read: use standardized imodelname as <datamodel> (e.g. icoadsr300d701) (:issue:11, :pull:97) * mdf_reader: (core, VS) set columntype to key for all ICOADS decks (:issue:11, :pull:97) * ``cdmmapper: rename pub47_noc mapping to pub47 (:pull:`102`) * Note by each function call: renamedatamodelintoimodel`` e.g. imodel=icoadsr300d704 (:pull:103) * ``cdmmapper.mapmodel: call with (data, imodel=imodel) (:pull:`103`) *mdfreader.read: call with (source, imodel=imodel) (:pull:`103`) * Re-order arguments tomdfreader.validate, and create argument forexttablepath(:pull:`105`) *operations: delete corrections module (:pull:`104`) *cdmmapper: duplicate check is available for header table only (:pull:`115`) *cdmmapper``: set reportquality to 1 for bad duplicates (:pull:115) * cdm_mapper: set default primarystationid to 4 for C-RAID mapping (:issue:117, :pull:121) * renamed some element names in icoads_r300_d730 schema for consistency (InsName to InstName, InsPlace to InstPlace, InsLand to InstLand, No_data_entry to NumArchiveSet) (:pull:110)

Internal changes ^^^^^^^^^^^^^^^^ * replace deprecated datetime.datetime.utcnow() with datetime.datetime.now(datetime.UTC) (see: https://github.com/python/cpython/issues/103857) (:pull:39, :pull:43) * make use of cdm-testdata release v2024.06.07 https://github.com/glamod/cdm-testdata/releases/tag/v2024.06.07 (:issue:44, :pull:45) * migration to setup-micromamba: https://github.com/mamba-org/provision-with-micromamba#migration-to-setup-micromamba (:pull:48) * update actions to use Node.js 20: https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#example-using-versioned-actions (:pull:48) * mdf_reader.auxiliary.utils: rename variable for missing values to missing_values (:pull:56) * add pre-commit hooks: codespell, pylint and vulture (:pull:56) * use pytest.parametrize for testing suite (:pull:61) * use ast.literal_eval instead of eval (:pull:64) * remove unused code tables in mdf_reader (:issue:10, :pull:65) * cdm_mapper.mappings: use datetime to convert float into hours and minutes. * add FOSSA license scanning to github workflows (:pull:80) * add cdm_reader_mapper author list including ORCID iD's (:pull:38, :pull:49) * mdf_reader: replace empty strings with missing values (:pull:89) * metmetpy: use function overwrite_data in all platform type correction functions (:pull:89) * rename data_model into imodel (:pull:103) * implement assertion tests for module operations (:pull:104) * cdm_mapper: put settings for duplicate check in duplicatesettings (:pull:119) * cdm_mapper: use pandas.apply function instead of for loops in duplicatecheck (:pull:119) * adding some more duplicate checks to testing suite (:pull:119) * ``cdmmapper`: re-adding conserderation of indexes of nan values during transformation (:pull:125`)

Bug fixes ^^^^^^^^^ * indexing working with user-given chunksize (:pull:35) * fix reading of custom schema in mdf_reader.read (:pull:40) * ensure format schema field for delimited files is passed correctly, avoiding "...Please specify either format or field_layout in your header schema..." error (:pull:40) * there is a loss of data precision due to data type conversion. Hence, use default data types of both int and float (:issue:59, :pull:60) * reading C-RAID data: adjust datetime formats to read dates into MDFFileReader (:pull:60) * ensure external code tables are used when using an external schema in mdf_reader.read (:pull:105) * update readme and example Jupyter notebooks to :pull:103 (:pull:110) * restructure CLIWOC_datamodel Jupyter notebook to add an example of data model construction (:pull:110) * remove create_data_model.ipynb example Jupyter notebook (:pull:110)

- Python
Published by ludwiglierhammer over 1 year ago

cdm-reader-mapper - v0.3.0

Contributors to this version: Ludwig Lierhammer (:user:ludwiglierhammer, :user:jtsiddons)

New features and enchancements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ * mdf_reader: read C-RAID netCDF buoy data (:issue:13, :pull:24, :pull:28) * adding both GCC IMMT and C-RAID netCDF data to test_data (:pull:24, :pull:28) * cdm_mapper: adding C-RAID mapping and code tables (:issue:13, :pull:28) * cdm_mapper: add load_tables to __init.py__ (:pull:32)

Breaking changes ^^^^^^^^^^^^^^^^ * adding tests for IMMT and C-Raid data (:issue:26, :pull:24, :pull:28) * cdm_mapper.map_model: drop dulicated lines in pd.DataFrame before writing CDM table on disk (:pull:28) * add pyarrow (see: https://github.com/pandas-dev/pandas/issues/54466) to requirements * solving pyarrow-snappy issue (see: openforcefield/openff-nagl#106) (:issue:33, :pull:28, :pull:34)

Internal changes ^^^^^^^^^^^^^^^^ * do not diferentiate between tuple and single column names (:pull:24) * metmetpy: Do not raise erros if validate_datetime, correct_datetime, correct_pt and/or validate_id do not find any entries (:pull:24) * get rid of warnings (:issue:9, :pull:27) * adding python 3.12 to testing suite (:pull:29) * set time out for testing suite to 10 minutes (:pull:29)

Bug fixes ^^^^^^^^^^ * cdm_mapper: set debugging logger into if statement (:pull:24) * cdm_mapper: do not use code table qc_flag with report_id (:pull:24) * metmetpy: fixing ICOADS 30000 NRT functions for pandas>=2.2.0 (:pull:31) * cdm_mapper.read_tables: if table not available return empty pd.DataFrame (:pull:32)

- Python
Published by ludwiglierhammer almost 2 years ago

cdm-reader-mapper - v0.2.0

Contributors to this version: Ludwig Lierhammer (@ludwiglierhammer) and Joseph Siddons (@jtsiddons)

Breaking changes

  • move converters and decoders from common to mdf_reader/utils (PR #3)
  • delete redundant functions from cdm_reader_mapper.common
  • cdm_reader_mapper: import common in __init__.py
  • remove unused modules from metmetpy
  • cdm_reader_mapper.mdf_reader split datamodels into codetables and schema
  • logging: Allow for use of log file (PR #6)
  • cannot use as command-line tool anymore (PR #22)
  • outsource input and result data to cdm-testdata (GH #16, PR #21)

Internal changes

  • adding tests to cdmreadermapper testing suite (GH #12, PR #2, #20, #22)
  • adding testing result data (PR #4)
  • use slugify insted of unidecde for licening reasons
  • remove pip install instruction (PR #2)
  • HISTORY.rst has been renamed CHANGES.rst, to follow xclim-like conventions (PR #7).
  • speed up mapping functions with swifter (PR #4)
  • mdf_reader: adding auxiliary functions and classes (PR #4)
  • mdf_reader: read tables line-by-line (PR #20)

Bug fixes

  • Fixed an issue with missing conda dependencies in the cdm_reader_mapper documentation (PR #14)

- Python
Published by ludwiglierhammer almost 2 years ago