Recent Releases of lydata
lydata - 0.2.5
What's New
A little maintenance and a small bug fix.
๐ Bug Fixes
get_repo()did not return repo- Respect "method" kwarg in combining mods
๐งช Testing
- Run
dvc reproto check new lyscripts
โ๏ธ Miscellaneous Tasks
- Bump requirements
Change
- Slightly improve logging
- Jupyter Notebook
Published by rmnldwg about 1 year ago
lydata - 0.2.4
What's New
Just some improvements to the cos and the switch to loguru for logging. No functionality should have been altered.
๐ Documentation
- Add
__repr__& explanation toC - Mention private attribute
_column_map - Mention
executemethod ofQobjects - Fix unfinished sentence in utils
Change
- In
__repr__, add parentheses around combination ofAndQandOrQ. - Switch to loguru for logging
- Jupyter Notebook
Published by rmnldwg about 1 year ago
lydata -
What's New
๐ Features
- Add
centralto short name columns
๐ Bug Fixes
&and|withNonereturn originalQ. Previously,Q(...) | Nonewould return a query that evaluated toTrueeverywhere.
๐ Documentation
- List defined operators on
Q(&,|,~,==) in the docstring ofCombineQMixin.
๐งช Testing
- ensure that
&and|withNonereturn originalQ.
- Jupyter Notebook
Published by rmnldwg about 1 year ago
lydata -
What's New
Another bug fix: Previously, sub- and superlevel involvement was only computed for columns not already present in the table. Now, it is by default computed and correctly replaces unknown values.
๐ Features
- (utils) Add better update func for pandas
๐ Bug Fixes
- Order of sub-/superlevel inference
- Don't ignore present sub-/superlvl cols
- Jupyter Notebook
Published by rmnldwg about 1 year ago
lydata -
What's New
This release fixes a bug where completely unobserved LNLs would be reported as healthy when using the ly.combine() method. Also, This method is now roughly 20x faster than before ๐
๐ Bug Fixes
- If an LNL of a patient was unobserved (i.e., all diagnoses
None), then the methodly.combine()returnsNonefor that patient's LNL. Fixes #13
๐งช Testing
- Change the doctest of
ly.combine()to check whether #13 was fixed.
- Jupyter Notebook
Published by rmnldwg about 1 year ago
lydata -
What's New
This is a clean-up update. Some stuff I thought might be useful turned out to be unnecessary, while other things got better names. Two small features have also made it.
๐ Features
- Can now combine
QwithNoneto yieldQagain. - Add
containsoperator toC,Qobjects. This calls pandas'str.containsmethod.
๐งช Testing
- Fix wrong name in doctests
Change
- [breaking] Add, rename, delete several methods:
LyDatasetConfigis now justLyDataset- the
pathproperty is nowpath_on_disk - the
get_url()method has been removed - the
get_description()method has been removed - added
get_content_file()method to fetch and store remove content load()was renamed toget_dataframe()- the
repoargument was changed torepo_name
- (utils) [breaking] Rename
enhancefunc toinfer_and_combine_levels.
Remove
- [breaking] Two unused funcs for markdown processing were removed
- (load) [breaking] Drop
join_datasets, since it's not needed. All it did was runpd.concat(...).
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
What's New
Just a quick hotfix.
๐ Bug Fixes
- (load) Fix a bug where datasets with multiple subsites (e.g.
2024-umcg-hypopharynx-larynx) would cause an error because of a missingmaxsplit=2argument.
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
What's New
Small features and refactorings.
๐ Features
- (load) add
get_repo()method that fetches remote repository information for a `LyDatasetConfig - (load) make authentication more flexible
- (utils) put sub-/superlevel inference in its own utility function
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
What's New
With this release, we make the switch from rapidly evolving 0.0.X versions to something that changes a little more slowly. However, we still consider the library experimental and breaking changes may still occur frequently.
๐ Features
- (utils) Add often needed
enhancefunction to complete sub-/superlevel involvement and infer maximum likelihood status.
๐ Bug Fixes
- Avoid
KeyErrorininfer_superlevels
โ๏ธ Miscellaneous Tasks
- Add link to release 0.0.4
Change
infer_su(b|per)levelsskips inferring involvement of sub-/super LNLs that are already present- (load) Rename
skip_disktouse_github - (query) Rename
in_toisinforCobject
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
What's New
๐ Features
- [breaking] Make several helper functions private (e.g.,
_max_likelihood()) - (utils) Add more shortname columns, like
surgeryfor("patient", "#", "neck_dissection") - (load) Allow search for datasets at different locations on disk
- (query) Add
Cobject for easierQcreation - (query) Add
in_toCobject - (validate) Add
transform_to_lyproxfunction
๐ Bug Fixes
- (load) Resolve circular import of
_repo
๐ Documentation
- Add intersphinx mapping to pandera
- Expand module docstrings
- Update
README.mdwith library examples
๐งช Testing
- Fix failure due to changing order of items in set
Change
- (validate) Add args to renamed validation
- Import useful stuff as top-level
- Make
main()funcs private
Remove
- (load) [breaking]
load_dataset()not needed, one can just usenext(load_datasets())
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
What's New
๐ Features
- Add method to infer sublevel involvement #2
- Add method to infer superlevel involvement #2
- (load) Allow loading from different repository and/or reference (tag, commit, ...) #4
๐ Bug Fixes
- Make
align_diagnoses()safer - Make
combine()method work as intended - (load) Year may be equal to current year, not only smaller
๐ Documentation
- Make accessor method docstring more detailed
- Mention panda's
update()in methods
โ๏ธ Miscellaneous Tasks
- Add documentation link to metadata
- Add changelog
- Remove pyright setting (where from?)
- Ignore B028 ruff rule
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
[!WARNING] This is still very much experimental. Anything might change at any time.
What's New
๐ Features
- Add some basic logging
- Add
percentandinvertto portion
๐ Documentation
- Host documentation on https://lydata.readthedocs.io
- Ensure intersphinx links work
๐งช Testing
- Add doctest to
join_datasets()
Change
- Switch to pydantic for dataset definition
- Shorten accessor name to
ly
Refac
- Make load funcs/methods clean & consistent
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata - 2023 CLB Multisite v2
This updates the previously published dataset due to an error in the diagnostic_consensus column. The data did no actually come with this information, but we assume the diagnosis based on imaging to be negative (i.e., healthy) when no neck dissection was performed. However, due to a bug, it was also set to "healthy", when the pathology after neck dissection reported a healthy LNL. This is obviously wrong and was corrected here.
The full diff can be found here. Note that this also includes many other changes to other datasets since the first release.
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata -
First Version ๐ฅณ
This marks the first (highly experimental) release of a little Python package for loading, accessing, querying, and validating the datasets in this repo.
If it turns out to be useful, I'll continue to develop this into a mature set of tools that could ultimately deduplicate a lot of code in the lyprox and lyscripts repos.
- Jupyter Notebook
Published by rmnldwg over 1 year ago
lydata - 2021 USZ Oropharynx v2
With this release, we update the previous one about the same dataset.
Note that the data itself remains unchanged, but we spotted a bug in the figures.ipynb that is supposed to reproduce the plots. That bug was fixed with this release.
full diff to previous release (also includes diff to meanwhile added 2021-clb-oropharynx dataset)
- Jupyter Notebook
Published by rmnldwg over 3 years ago
lydata - 2021 USZ Oropharynx
This release contains the detailed patterns of lymphatic progression of 287 patients with squamous cell carcinomas (SCCs) in the oropharynx, treated at the University Hospital Zurich (USZ) between 2013 and 2019.
The archive 2021-usz-oropharynx.zip contains
- a
README.mdthat explains how the dataset was extracted and what it contains - the data itself as
data.csv - a citation file
CITATION.cffthat can be used to cite his dataset (one may also cite our Data in Brief article). - a jupyter notebook
figures.ipynbfor rendering figures visualizing different aspects of the data - the folder
figurescontaining the already rendered figures which we also used in our publication for Radiation & Oncology
- Jupyter Notebook
Published by rmnldwg over 3 years ago