Recent Releases of lydata

lydata - 0.2.5

What's New

A little maintenance and a small bug fix.

๐Ÿ› Bug Fixes

  • get_repo() did not return repo
  • Respect "method" kwarg in combining mods

๐Ÿงช Testing

  • Run dvc repro to check new lyscripts

โš™๏ธ Miscellaneous Tasks

  • Bump requirements

Change

  • Slightly improve logging

- Jupyter Notebook
Published by rmnldwg about 1 year ago

lydata - 0.2.4

What's New

Just some improvements to the cos and the switch to loguru for logging. No functionality should have been altered.

๐Ÿ“š Documentation

  • Add __repr__ & explanation to C
  • Mention private attribute _column_map
  • Mention execute method of Q objects
  • Fix unfinished sentence in utils

Change

  • In __repr__, add parentheses around combination of AndQ and OrQ.
  • Switch to loguru for logging

- Jupyter Notebook
Published by rmnldwg about 1 year ago

lydata -

What's New

๐Ÿš€ Features

  • Add central to short name columns

๐Ÿ› Bug Fixes

  • & and | with None return original Q. Previously, Q(...) | None would return a query that evaluated to True everywhere.

๐Ÿ“š Documentation

  • List defined operators on Q (&, |, ~, ==) in the docstring of CombineQMixin.

๐Ÿงช Testing

  • ensure that & and | with None return original Q.

- Jupyter Notebook
Published by rmnldwg about 1 year ago

lydata -

What's New

Another bug fix: Previously, sub- and superlevel involvement was only computed for columns not already present in the table. Now, it is by default computed and correctly replaces unknown values.

๐Ÿš€ Features

  • (utils) Add better update func for pandas

๐Ÿ› Bug Fixes

  • Order of sub-/superlevel inference
  • Don't ignore present sub-/superlvl cols

- Jupyter Notebook
Published by rmnldwg about 1 year ago

lydata -

What's New

This release fixes a bug where completely unobserved LNLs would be reported as healthy when using the ly.combine() method. Also, This method is now roughly 20x faster than before ๐Ÿš€

๐Ÿ› Bug Fixes

  • If an LNL of a patient was unobserved (i.e., all diagnoses None), then the method ly.combine() returns None for that patient's LNL. Fixes #13

๐Ÿงช Testing

  • Change the doctest of ly.combine() to check whether #13 was fixed.

- Jupyter Notebook
Published by rmnldwg about 1 year ago

lydata -

What's New

This is a clean-up update. Some stuff I thought might be useful turned out to be unnecessary, while other things got better names. Two small features have also made it.

๐Ÿš€ Features

  • Can now combine Q with None to yield Q again.
  • Add contains operator to C, Q objects. This calls pandas' str.contains method.

๐Ÿงช Testing

  • Fix wrong name in doctests

Change

  • [breaking] Add, rename, delete several methods:
    • LyDatasetConfig is now just LyDataset
    • the path property is now path_on_disk
    • the get_url() method has been removed
    • the get_description() method has been removed
    • added get_content_file() method to fetch and store remove content
    • load() was renamed to get_dataframe()
    • the repo argument was changed to repo_name
  • (utils) [breaking] Rename enhance func to infer_and_combine_levels.

Remove

  • [breaking] Two unused funcs for markdown processing were removed
  • (load) [breaking] Drop join_datasets, since it's not needed. All it did was run pd.concat(...).

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

What's New

Just a quick hotfix.

๐Ÿ› Bug Fixes

  • (load) Fix a bug where datasets with multiple subsites (e.g. 2024-umcg-hypopharynx-larynx) would cause an error because of a missing maxsplit=2 argument.

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

What's New

Small features and refactorings.

๐Ÿš€ Features

  • (load) add get_repo() method that fetches remote repository information for a `LyDatasetConfig
  • (load) make authentication more flexible
  • (utils) put sub-/superlevel inference in its own utility function

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

What's New

With this release, we make the switch from rapidly evolving 0.0.X versions to something that changes a little more slowly. However, we still consider the library experimental and breaking changes may still occur frequently.

๐Ÿš€ Features

  • (utils) Add often needed enhance function to complete sub-/superlevel involvement and infer maximum likelihood status.

๐Ÿ› Bug Fixes

  • Avoid KeyError in infer_superlevels

โš™๏ธ Miscellaneous Tasks

  • Add link to release 0.0.4

Change

  • infer_su(b|per)levels skips inferring involvement of sub-/super LNLs that are already present
  • (load) Rename skip_disk to use_github
  • (query) Rename in_ to isin for C object

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

What's New

๐Ÿš€ Features

  • [breaking] Make several helper functions private (e.g., _max_likelihood())
  • (utils) Add more shortname columns, like surgery for ("patient", "#", "neck_dissection")
  • (load) Allow search for datasets at different locations on disk
  • (query) Add C object for easier Q creation
  • (query) Add in_ to C object
  • (validate) Add transform_to_lyprox function

๐Ÿ› Bug Fixes

  • (load) Resolve circular import of _repo

๐Ÿ“š Documentation

  • Add intersphinx mapping to pandera
  • Expand module docstrings
  • Update README.md with library examples

๐Ÿงช Testing

  • Fix failure due to changing order of items in set

Change

  • (validate) Add args to renamed validation
  • Import useful stuff as top-level
  • Make main() funcs private

Remove

  • (load) [breaking] load_dataset() not needed, one can just use next(load_datasets())

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

What's New

๐Ÿš€ Features

  • Add method to infer sublevel involvement #2
  • Add method to infer superlevel involvement #2
  • (load) Allow loading from different repository and/or reference (tag, commit, ...) #4

๐Ÿ› Bug Fixes

  • Make align_diagnoses() safer
  • Make combine() method work as intended
  • (load) Year may be equal to current year, not only smaller

๐Ÿ“š Documentation

  • Make accessor method docstring more detailed
  • Mention panda's update() in methods

โš™๏ธ Miscellaneous Tasks

  • Add documentation link to metadata
  • Add changelog
  • Remove pyright setting (where from?)
  • Ignore B028 ruff rule

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

[!WARNING] This is still very much experimental. Anything might change at any time.

What's New

๐Ÿš€ Features

  • Add some basic logging
  • Add percent and invert to portion

๐Ÿ“š Documentation

  • Host documentation on https://lydata.readthedocs.io
  • Ensure intersphinx links work

๐Ÿงช Testing

  • Add doctest to join_datasets()

Change

  • Switch to pydantic for dataset definition
  • Shorten accessor name to ly

Refac

  • Make load funcs/methods clean & consistent

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata - 2023 CLB Multisite v2

This updates the previously published dataset due to an error in the diagnostic_consensus column. The data did no actually come with this information, but we assume the diagnosis based on imaging to be negative (i.e., healthy) when no neck dissection was performed. However, due to a bug, it was also set to "healthy", when the pathology after neck dissection reported a healthy LNL. This is obviously wrong and was corrected here.

The full diff can be found here. Note that this also includes many other changes to other datasets since the first release.

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata -

First Version ๐Ÿฅณ

This marks the first (highly experimental) release of a little Python package for loading, accessing, querying, and validating the datasets in this repo.

If it turns out to be useful, I'll continue to develop this into a mature set of tools that could ultimately deduplicate a lot of code in the lyprox and lyscripts repos.

- Jupyter Notebook
Published by rmnldwg over 1 year ago

lydata - 2021 USZ Oropharynx v2

With this release, we update the previous one about the same dataset.

Note that the data itself remains unchanged, but we spotted a bug in the figures.ipynb that is supposed to reproduce the plots. That bug was fixed with this release.

full diff to previous release (also includes diff to meanwhile added 2021-clb-oropharynx dataset)

- Jupyter Notebook
Published by rmnldwg over 3 years ago

lydata - 2021 USZ Oropharynx

This release contains the detailed patterns of lymphatic progression of 287 patients with squamous cell carcinomas (SCCs) in the oropharynx, treated at the University Hospital Zurich (USZ) between 2013 and 2019.

The archive 2021-usz-oropharynx.zip contains

  • a README.md that explains how the dataset was extracted and what it contains
  • the data itself as data.csv
  • a citation file CITATION.cff that can be used to cite his dataset (one may also cite our Data in Brief article).
  • a jupyter notebook figures.ipynb for rendering figures visualizing different aspects of the data
  • the folder figures containing the already rendered figures which we also used in our publication for Radiation & Oncology

- Jupyter Notebook
Published by rmnldwg over 3 years ago