Releases | Open Source Science

Changelog

Added

Added support for multiple loggers (tensorboard, wandb, comet_ml, aim, mlflow, clearml, dvclive, csv, json, rich) in edsnlp.train via the logger parameter. Default is [json and rich] for backward compatibility.
Sub batch sizes for gradient accumulation can now be defined as simple "splits" of the original batch, e.g. batch_size = 10000 tokens and sub_batch_size = 5 splits to accumulate batches of 2000 tokens.
Parquet writer now has a pyarrow_write_kwargs to pass to pyarrow.dataset.write_dataset
LinearSchedule (mostly used for LR scheduling) now allows a end_value parameter to configure if the learning rate should decay to zero or another value.
New eds.explode pipe that splits one document into multiple documents, one per span yielded by its span_getter parameter, each new document containing exactly that single span.
New Training a span classifier tutorial, and reorganized deep-learning docs
ScheduledOptimizer now warns when a parameter selector does not match any parameter.

Fixed

use_section in eds.history should now correctly handle cases when there are other sections following history sections.
Added clickable snippets in the documentation for more registered functions
Pyarrow dataset writing with multiprocessing should be faster, as we removed a useless data transfer
We should now correctly support loading transformers in offline mode if they were already in huggingface's cache
We now support words[-10:10] syntax in trainable span classifier context_getter parameter
:ambulance: Until now, post_init was applied after the instantiation of the optimizer : if the model discovered new labels, and therefore changed its parameter tensors to reflect that, these new tensors were not taken into account by the optimizer, which could likely lead to subpar performance. Now, post_init is applied before the optimizer is instantiated, so that the optimizer can correctly handle the new tensors.
Added missing entry points for readers and writers in the registry, including write_parquet and support for polars in pyproject.toml. Now all implemented readers and writers are correctly registered as entry points.

Changed

Sections cues in eds.history are now section titles, and not the full section.
:boom: Validation metrics are now found under the root field validation in the training logs (e.g. metrics['validation']['ner']['micro']['f'])
It is now recommended to define optimizer groups of ScheduledOptimizer as a list of dicts of optim hyper-parameters, each containing a selector regex key, rather than as a single dict with a selector as keys and a dict of optim hyper-parameters as values. This allows for more flexibility in defining the optimizer groups, and is more consistent with the rest of the EDS-NLP API. This makes it easier to reference groups values from other places in config files, since their path doesn't contain a complex regex string anymore. See the updated training tutorials for more details.

Pull Requests

chore: bump version to 0.18.0 by @percevalw in https://github.com/aphp/edsnlp/pull/439
fix: use_sections in eds.history should now work by @percevalw in https://github.com/aphp/edsnlp/pull/430
docs: fix read parquet parameters docs by @percevalw in https://github.com/aphp/edsnlp/pull/425
Explode pipe + span classifier training tutorial by @percevalw in https://github.com/aphp/edsnlp/pull/432
Update, fix and refactor doc dependencies by @percevalw in https://github.com/aphp/edsnlp/pull/438
fix: entrypoints by @aricohen93 in https://github.com/aphp/edsnlp/pull/420
fix: take filter_expr into account in dependency parsing evaluation by @percevalw in https://github.com/aphp/edsnlp/pull/382

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.2...v0.18.0

- Python
Published by percevalw 6 months ago

Changelog

Added

Added support for multiple loggers (tensorboard, wandb, comet_ml, aim, mlflow, clearml, dvclive, csv, json, rich) in edsnlp.train via the logger parameter. Default is [json and rich] for backward compatibility.
Sub batch sizes for gradient accumulation can now be defined as simple "splits" of the original batch, e.g. batch_size = 10000 tokens and sub_batch_size = 5 splits to accumulate batches of 2000 tokens.
Parquet writer now has a pyarrow_write_kwargs to pass to pyarrow.dataset.write_dataset
LinearSchedule (mostly used for LR scheduling) now allows a end_value parameter to configure if the learning rate should decay to zero or another value.
New eds.explode pipe that splits one document into multiple documents, one per span yielded by its span_getter parameter, each new document containing exactly that single span.
New Training a span classifier tutorial, and reorganized deep-learning docs
ScheduledOptimizer now warns when a parameter selector does not match any parameter.

Fixed

use_section in eds.history should now correctly handle cases when there are other sections following history sections.
Added clickable snippets in the documentation for more registered functions
Pyarrow dataset writing with multiprocessing should be faster, as we removed a useless data transfer
We should now correctly support loading transformers in offline mode if they were already in huggingface's cache
We now support words[-10:10] syntax in trainable span classifier context_getter parameter
:ambulance: Until now, post_init was applied after the instantiation of the optimizer : if the model discovered new labels, and therefore changed its parameter tensors to reflect that, these new tensors were not taken into account by the optimizer, which could likely lead to subpar performance. Now, post_init is applied before the optimizer is instantiated, so that the optimizer can correctly handle the new tensors.
Added missing entry points for readers and writers in the registry, including write_parquet and support for polars in pyproject.toml. Now all implemented readers and writers are correctly registered as entry points.

Changed

Sections cues in eds.history are now section titles, and not the full section.
:boom: Validation metrics are now found under the root field validation in the training logs (e.g. metrics['validation']['ner']['micro']['f'])
It is now recommended to define optimizer groups of ScheduledOptimizer as a list of dicts of optim hyper-parameters, each containing a selector regex key, rather than as a single dict with a selector as keys and a dict of optim hyper-parameters as values. This allows for more flexibility in defining the optimizer groups, and is more consistent with the rest of the EDS-NLP API. This makes it easier to reference groups values from other places in config files, since their path doesn't contain a complex regex string anymore. See the updated training tutorials for more details.

Pull Requests

chore: bump version to 0.18.0 by @percevalw in https://github.com/aphp/edsnlp/pull/439
fix: use_sections in eds.history should now work by @percevalw in https://github.com/aphp/edsnlp/pull/430
docs: fix read parquet parameters docs by @percevalw in https://github.com/aphp/edsnlp/pull/425
Explode pipe + span classifier training tutorial by @percevalw in https://github.com/aphp/edsnlp/pull/432
Update, fix and refactor doc dependencies by @percevalw in https://github.com/aphp/edsnlp/pull/438
fix: entrypoints by @aricohen93 in https://github.com/aphp/edsnlp/pull/420
fix: take filter_expr into account in dependency parsing evaluation by @percevalw in https://github.com/aphp/edsnlp/pull/382

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.2...v0.18.0

- Python
Published by percevalw 6 months ago

Changelog

Added

Handling intra-word linebreak as pollution : adds a pollution pattern that detects intra-word linebreak, which can then be removed in the get_text method
Qualifiers can process Span or Doc : this feature especially makes it easier to nest qualifiers components in other components
New labelweights parameter in eds.spanclassifier`, which allows the user to set per label-value loss weights during training
New edsnlp.data.converters.MarkupToDocConverter to convert Markdown or XML-like markup to documents, which is particularly useful to create annotated documents from scratch (e.g., for testing purposes).
New Metrics documentation page to document the available metrics and how to use them.

Fixed

Various disorders/behaviors patches

Changed

Deduplicate spans between doc.ents and doc.spans during train: previously, a span_getter requesting entities from both ents and spans could yield duplicates.

Pull Requests

feat: Various patches by @Thomzoy in https://github.com/aphp/edsnlp/pull/391
Metrics doc by @percevalw in https://github.com/aphp/edsnlp/pull/417
chore: bump version to 0.17.2 by @percevalw in https://github.com/aphp/edsnlp/pull/424

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.1...v0.17.2

- Python
Published by percevalw 8 months ago

Changelog

Added

Added grad spike detection to the edsnlp.train script, and per weight layer gradient logging.

Fixed

Fixed mini-batch accumulation for multi-task training
Fixed a pickling error when applying a pipeline in multiprocessing mode. This occurred in some cases when one of the pipes was declared in a "difficultly importable" module (e.g., causing a "PicklingWarning: Cannot locate reference to <class...").
Fixed typo in eds.consultation_dates towns: berck.sur.mer.
Fixed a bug where relative date expressions with bounds (e.g. 'depuis hier') raised an error when converted to durations.
Fixed pipe ADICAP to deal with cases where not code is found after 'codification'/'adicap'
Support "00"-like hours and minutes in the eds.dates component
Fix arc minutes, arc seconds and degree unit scales in eds.quantities, used when converting between different time (or angle) units

Pull Requests

fix: add grad spike detection by @percevalw in https://github.com/aphp/edsnlp/pull/375
fix: avoid pickling error in multiprocessing mode by @percevalw in https://github.com/aphp/edsnlp/pull/408
fix: correct town name typo (berck.sur.mer) by @percevalw in https://github.com/aphp/edsnlp/pull/409
fix: error when converting relative date expressions with bounds to durations by @percevalw in https://github.com/aphp/edsnlp/pull/411
Fix adicap by @aricohen93 in https://github.com/aphp/edsnlp/pull/410
Fix time matching by @LoickChardon in https://github.com/aphp/edsnlp/pull/413
chore: bump version to 0.17.1 by @percevalw in https://github.com/aphp/edsnlp/pull/416

New Contributors

@LoickChardon made their first contribution in https://github.com/aphp/edsnlp/pull/413

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.0...v0.17.1

- Python
Published by percevalw 9 months ago

Changelog

Added

Support for numpy>2.0, and formal support for Python 3.11 and Python 3.12
Expose the defaults patterns of eds.negation, eds.hypothesis, eds.family, eds.history and eds.reported_speech under a eds.negation.default_patterns attribute
Added a context_getter SpanGetter argument to the eds.matcher class to only retrieve entities inside the spans returned by the getter
Added a filter_expr parameter to scorers to filter the documents to score
Added a new required field to eds.contextual_matcher assign patterns to only match if the required field has been found, and an include parameter (similar to exclude) to search for required patterns without assigning them to the entity
Added context strings (e.g., "words[0:5] | sent[0:1]") to the eds.contextual_matcher component to allow for more complex patterns in the selection of the window around the trigger spans.
Include and exclude patterns in the contextual matcher now dismiss matches that occur inside the anchor pattern (e.g. "anti" exclude pattern for anchor pattern "antibiotics" will not match the "anti" part of "antibiotics")
Pull Requests will now build a public accessible preview of the docs

Changed

Improve the contextual matcher documentation.

Fixed

edsnlp.package now correctly detect if a project uses an old-style poetry pyproject or a PEP621 pyproject.toml.
PEP621 projects containing nested directories (e.g., "my_project/pipes/foo.py") are now supported.
Try several paths to find current pip executable
The parameter "value_extract" of eds.score now correctly handles lists of patterns.
"Zero variance error" when computing param tuning importance are now catched and converted as a warning

Pull Requests

Fix packaging by @percevalw in https://github.com/aphp/edsnlp/pull/395
fix: avoid non-standard (pytoml) syntax in pyproject.toml by @percevalw in https://github.com/aphp/edsnlp/pull/399
fix: try several paths to find current pip executable by @percevalw in https://github.com/aphp/edsnlp/pull/401
Fix optuna issue by @LucasDedieu in https://github.com/aphp/edsnlp/pull/398
Improve contextual matcher by @percevalw in https://github.com/aphp/edsnlp/pull/289

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.16.0...v0.17.0

- Python
Published by percevalw 10 months ago

Changelog

Added

Hyperparameter Tuning for EDS-NLP: introduced a new script edsnlp.tune for hyperparameter tuning using Optuna. This feature allows users to efficiently optimize model parameters with options for single-phase or two-phase tuning strategies. Includes support for parameter importance analysis, visualization, pruning, and automatic handling of GPU time budgets.
Provided a detailed tutorial on hyperparameter tuning, covering usage scenarios and configuration options.
ScheduledOptimizer (e.g., @core: "optimizer") now supports importing optimizers using their qualified name (e.g., optim: "torch.optim.Adam").
eds.ner_crf now computes confidence score on spans.

Changed

The loss of eds.ner_crf is now computed as the mean over the words instead of the sum. This change is compatible with multi-gpu training.
Having multiple stats keys matching a batching pattern now warns instead of raising an error.

Fixed

Support packaging with poetry 2.0
Solve pickling issues with multiprocessing when pytorch is installed
Allow deep attributes like a.b.c for span_attributes in Standoff and OMOP doc2dict converters
Fixed various aspects of stream shuffling:
- Ensure the Parquet reader shuffles the data when shuffle=True
- Ensure we don't overwrite the RNG of the data reader when calling stream.shuffle() with no seed
- Raise an error if the batch size in stream.shuffle(batch_size=...) is not compatible with the stream
eds.split now keeps doc and span attributes in the sub-documents.

Pull Requests

fix: support packaging with poetry 2.0 by @percevalw in https://github.com/aphp/edsnlp/pull/362
Solve pickling issues with multiprocessing when pytorch is installed by @percevalw in https://github.com/aphp/edsnlp/pull/367
Feat: add hyperparameters tuning by @LucasDedieu in https://github.com/aphp/edsnlp/pull/361
Fix issue 368: Add metric parameter and write optimal config.yml at the end of tuning. by @LucasDedieu in https://github.com/aphp/edsnlp/pull/369
Fix issue 370: two-phase tuning now write phase 1 frozen best values into phase 2 results_summary.txt by @LucasDedieu in https://github.com/aphp/edsnlp/pull/371
fix: allow deep attributes in Standoff and OMOP doc2dict converters by @percevalw in https://github.com/aphp/edsnlp/pull/381
fix: improve various aspect of stream shuffling by @percevalw in https://github.com/aphp/edsnlp/pull/380
fix: eds.split now keeps doc and span attributes in the sub-documents by @percevalw in https://github.com/aphp/edsnlp/pull/363
feat: allow importing optims using qualified names in ScheduledOptimizer by @percevalw in https://github.com/aphp/edsnlp/pull/383
feat: compute eds.ner_crf loss as mean over words by @percevalw in https://github.com/aphp/edsnlp/pull/384
Fix issue 372: resulting tuning config file now preserve comments by @LucasDedieu in https://github.com/aphp/edsnlp/pull/373
Feat: add checkpoint management for tuning by @LucasDedieu in https://github.com/aphp/edsnlp/pull/385
feat: add ner confidence score by @LucasDedieu in https://github.com/aphp/edsnlp/pull/387
chore: bump version to 0.16.0 by @LucasDedieu in https://github.com/aphp/edsnlp/pull/393

New Contributors

@LucasDedieu made their first contribution in https://github.com/aphp/edsnlp/pull/361

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.15.0...v0.16.0

- Python
Published by LucasDedieu 11 months ago

Changelog

Added

edsnlp.data.read_parquet now accept a work_unit="fragment" option to split tasks between workers by parquet fragment instead of row. When this is enabled, workers do not read every fragment while skipping 1 in n rows, but read all rows of 1/n fragments, which should be faster.
Accept no validation data in edsnlp.train script
Log the training config at the beginning of the trainings
Support a specific model output dir path for trainings (output_model_dir), and whether to save the model or not (save_model)
Specify whether to log the validation results or not (logger=False)
Added support for the CoNLL format with edsnlp.data.read_conll and with a specific eds.conll_dict2doc converter
Added a Trainable Biaffine Dependency Parser (eds.biaffine_dep_parser) component and metrics
New eds.extractive_qa component to perform extractive question answering using questions as prompts to tag entities instead of a list of predefined labels as in eds.ner_crf.

Fixed

Fix join_thread missing attribute in SimpleQueue when cleaning a multiprocessing executor
Support huggingface transformers that do not set cls_token_id and sep_token_id (we now also look for these tokens in the special_tokens_map and vocab mappings)
Fix changing scorers dict size issue when evaluating during training
Seed random states (instead of using random.RandomState()) when shuffling in data readers : this is important for
1. reproducibility
2. in multiprocessing mode, ensure that the same data is shuffled in the same way in all workers
Bubble BaseComponent instantiation errors correctly
Improved support for multi-gpu gradient accumulation (only sync the gradients at the end of the accumulation), now controled by the optiona sub_batch_size argument of TrainingData.
Support again edsnlp without pytorch installed
We now test that edsnlp works without pytorch installed
Fix units and scales, ie 1l = 1dm3, 1ml = 1cm3

Pull Requests

fix: check join_thread attribute in queue when cleaning mp exec by @percevalw in https://github.com/aphp/edsnlp/pull/345
fix: support hf transformers with clstokenid and septokenid set to None by @percevalw in https://github.com/aphp/edsnlp/pull/346
fix: changing scorers dict size issue when evaluating during training by @percevalw in https://github.com/aphp/edsnlp/pull/347
Fix streams by @percevalw in https://github.com/aphp/edsnlp/pull/350
Various trainer fixes by @percevalw in https://github.com/aphp/edsnlp/pull/352
Trainable biaffine dependency parser by @percevalw in https://github.com/aphp/edsnlp/pull/353
feat: new eds.extractive_qa component by @percevalw in https://github.com/aphp/edsnlp/pull/351
Fix training and multiprocessing by @percevalw in https://github.com/aphp/edsnlp/pull/354
fix: correct conversions for volumes, areas by @etienneguevel in https://github.com/aphp/edsnlp/pull/349
chore: bump version to 0.15.0 by @percevalw in https://github.com/aphp/edsnlp/pull/355

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.14.0...v0.15.0

- Python
Published by percevalw about 1 year ago

Changelog

Added

Support for setuptools based projects in edsnlp.package command
Pipelines can now be instantiated directly from a config file (instead of having to cast a dict containing their arguments) by putting the @core = "pipeline" or "load" field in the pipeline section)
edsnlp.load now correctly takes disable, enable and exclude parameters into account
Pipeline now has a basic repr showing is base langage (mostly useful to know its tokenizer) and its pipes
New python -m edsnlp.evaluate script to evaluate a model on a dataset
Sentence detection can now be configured to change the minimum number of newlines to consider a newline-triggered sentence, and disable capitalization checking.
New eds.split pipe to split a document into multiple documents based on a splitting pattern (useful for training)
Allow converter argument of edsnlp.data.read/from_... to be a list of converters instead of a single converter
New revamped and documented edsnlp.train script and API
Support YAML config files (supported only CFG/INI files before)
Most of EDS-NLP functions are now clickable in the documentation
ScheduledOptimizer now accepts schedules directly in place of parameters, and easy parameter selection:
ScheduledOptimizer( optim="adamw", module=nlp, total_steps=2000, groups={ "^transformer": { # lr will go from 0 to 5e-5 then to 0 for params matching "transformer" "lr": {"@schedules": "linear", "warmup_rate": 0.1, "start_value": 0 "max_value": 5e-5,}, }, "": { # lr will go from 3e-4 during 200 steps then to 0 for other params "lr": {"@schedules": "linear", "warmup_rate": 0.1, "start_value": 3e-4 "max_value": 3e-4,}, }, }, )

Changed

eds.span_context_getter's parameter context_sents is no longer optional and must be explicitly set to 0 to disable sentence context
In multi-GPU setups, streams that contain torch components are now stripped of their parameter tensors when sent to CPU Workers since these workers only perform preprocessing and postprocessing and should therefore not need the model parameters.
The batch_size argument of Pipeline is deprecated and is not used anymore. Use the batch_size argument of stream.map_pipeline instead.

Fixed

Sort files before iterating over a standoff or json folder to ensure reproducibility
Sentence detection now correctly match capitalized letters + apostrophe
We now ensure that the workers pool is properly closed whatever happens (exception, garbage collection, data ending) in the multiprocessing backend. This prevents some executions from hanging indefinitely at the end of the processing.
Propagate torch sharing strategy to other workers in the multiprocessing backend. This is useful when the system is running out of file descriptors and ulimit -n is not an option. Torch sharing strategy can also be set via an environment variable TORCH_SHARING_STRATEGY (default is file_descriptor, consider using file_system if you encounter issues).

Data API changes

LazyCollection objects are now called Stream objects
By default, multiprocessing backend now preserves the order of the input data. To disable this and improve performance, use deterministic=False in the set_processing method
:rocket: Parallelized GPU inference throughput improvements !
- For simple {pre-process → model → post-process} pipelines, GPU inference can be up to 30% faster in non-deterministic mode (results can be out of order) and up to 20% faster in deterministic mode (results are in order)
- For multitask pipelines, GPU inference can be up to twice as fast (measured in a two-tasks BERT+NER+Qualif pipeline on T4 and A100 GPUs)
The .map_batches, .map_pipeline and .map_gpu methods now support a specific batch_size and batching function, instead of having a single batch size for all pipes
Readers now have a loop parameter to cycle over the data indefinitely (useful for training)
Readers now have a shuffle parameter to shuffle the data before iterating over it
In multiprocessing mode, file based readers now read the data in the workers (was an option before)
We now support two new special batch sizes
- "fragment" in the case of parquet datasets: rows of a full parquet file fragment per batch
- "dataset" which is mostly useful during training, for instance to shuffle the dataset at each epoch.
  These are also compatible in batched writer such as parquet, where each input fragment can be processed and mapped to a single matching output fragment.
:boom: Breaking change: a map function returning a list or a generator won't be automatically flattened anymore. Use flatten() to flatten the output if needed. This shouldn't change the behavior for most users since most writers (topandas, topolars, to_parquet, ...) still flatten the output
:boom: Breaking change: the chunk_size and sort_chunks are now deprecated : to sort data before applying a transformation, use .map_batches(custom_sort_fn, batch_size=...)

Training API changes

We now provide a training script python -m edsnlp.train --config config.cfg that should fit many use cases. Check out the docs !
In particular, we do not require pytorch's Dataloader for training and can rely solely on EDS-NLP stream/data API, which is better suited for large streamable datasets and dynamic preprocessing (ie different result each time we apply a noised preprocessing op on a sample).
Each trainable component can now provide a stats field in its preprocess output to log info about the sample (number of words, tokens, spans, ...):
- these stats are both used for batching (e.g., make batches of no more than "25000 tokens")
- for logging
- for computing correct loss means when accumulating gradients over multiple mini-mini-batches
- for computing correct loss means in multi-GPU setups, since these stats are synchronized and accumulated across GPUs
Support multi GPU training via hugginface accelerate and EDS-NLP Stream API consideration of env['WOLRDSIZE'] and env['LOCALRANK'] environment variables

Pull Requests

Improve training tutorials by @percevalw in https://github.com/aphp/edsnlp/pull/331
Various fixes by @percevalw in https://github.com/aphp/edsnlp/pull/332
Multiprocessing related fixes by @percevalw in https://github.com/aphp/edsnlp/pull/333
chore: bump version to 0.14.0 by @percevalw in https://github.com/aphp/edsnlp/pull/334

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.13.1...v0.14.0

- Python
Published by percevalw over 1 year ago

Changelog

Added

eds.tables accepts a minimumtablesize (default 2) argument to reduce pollution
RuleBasedQualifier now expose a process method that only returns qualified entities and token without actually tagging them, deferring this task to the __call__ method.
Added new patterns for metastasis detection developed on CT-Scan reports.
Added citation of articles

Fixed

Disorder and Behavior pipes don't use a "PRESENT" or "ABSENT" status anymore. Instead, status=None by default, and ent._.negation is set to True instead of setting status to "ABSENT". To this end, the tobacco and alcohol now use the NegationQualifier internally.
Numbers are now only detected without trying to remove the pollution in between digits, ie 55 @ 77777 could be detected as a full number before, but not anymore.
Fix fsspec open file encoding to "utf-8".

Changed

Rename eds.measurements to eds.quantities
scikit-learn (used in eds.endlines) is no longer installed by default when installing edsnlp[ml]

Pull Requests

Remove pollution exclusion during numbers matching by @percevalw in https://github.com/aphp/edsnlp/pull/316
Rename eds.measurements by @svittoz in https://github.com/aphp/edsnlp/pull/313
Adding minimumtablesize argument to eds.tables by @svittoz in https://github.com/aphp/edsnlp/pull/318
Fs encoding fix by @Aremaki in https://github.com/aphp/edsnlp/pull/320
chore(deps): bump actions/download-artifact from 2 to 4.1.7 in /.github/workflows in the github_actions group across 1 directory by @dependabot in https://github.com/aphp/edsnlp/pull/319
fix: skip spacy 3.8.0 due to numpy build dep by @percevalw in https://github.com/aphp/edsnlp/pull/321
Fix behavior, disorder and qualifier pipes by @Thomzoy in https://github.com/aphp/edsnlp/pull/322
Metastatic status by @aricohen93 in https://github.com/aphp/edsnlp/pull/308
chore: bump version to 0.13.1 by @percevalw in https://github.com/aphp/edsnlp/pull/327
Test 3.12 by @percevalw in https://github.com/aphp/edsnlp/pull/328

New Contributors

@dependabot made their first contribution in https://github.com/aphp/edsnlp/pull/319

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.13.0...v0.13.1

- Python
Published by percevalw over 1 year ago

Changelog

Added

data.set_processing(...) now expose an autocast parameter to disable or tweak the automatic casting of the tensor during the processing. Autocasting should result in a slight speedup, but may lead to numerical instability.
Use torch.inference_mode to disable view tracking and version counter bumps during inference.
Added a new NER pipeline for suicide attempt detection
Added date cues (regular expression matches that contributed to a date being detected) under the extension ent._.date_cues
Added tables processing in eds.measurement
Added 'all' as possible input in eds.measurement measurements config
Added new units in eds.measurement

Changed

Default to mixed precision inference

Fixed

edsnlp.load("your/huggingface-model", install_dependencies=True) now correctly resolves the python pip (especially on Colab) to auto-install the model dependencies
We now better handle empty documents in the eds.transformer, eds.text_cnn and eds.ner_crf components
Support mixed precision in eds.text_cnn and eds.ner_crf components
Support pre-quantization (<4.30) transformers versions
Verify that all batches are non empty
Fix span_context_getter for context_words = 0, context_sents > 2 and support assymetric contexts
Don't split sentences on rare unicode symbols
Better detect abbreviations, like E.coli, now split as [E., coli] and not [E, ., coli]

What's Changed

Various ml fixes by @percevalw in https://github.com/aphp/edsnlp/pull/303
TS by @aricohen93 in https://github.com/aphp/edsnlp/pull/269
date cues by @cvinot in https://github.com/aphp/edsnlp/pull/265
Fix fast inference by @percevalw in https://github.com/aphp/edsnlp/pull/305
Fix typo in diabetes patterns by @isabelbt in https://github.com/aphp/edsnlp/pull/306
Fix span context getter by @aricohen93 in https://github.com/aphp/edsnlp/pull/307
Fix sentences by @percevalw in https://github.com/aphp/edsnlp/pull/310
chore: bump version to 0.13.0 by @percevalw in https://github.com/aphp/edsnlp/pull/312

New Contributors

@cvinot made their first contribution in https://github.com/aphp/edsnlp/pull/265
@isabelbt made their first contribution in https://github.com/aphp/edsnlp/pull/306

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.12.3...v0.13.0

- Python
Published by percevalw over 1 year ago

edsnlp - v0.12.3

Fix model loading messages

- Python
Published by percevalw over 1 year ago

Changelog

Changed

Packages:

Pip-installable models are now built with hatch instead of poetry, which allows us to expose artifacts (weights) at the root of the sdist package (uploadable to HF) and move them inside the package upon installation to avoid conflicts.
Dependencies are no longer inferred with dill-magic (this didn't work well before anyway)
Option to perform substitutions in the model's README.md file (e.g., for the model's name, metrics, ...)
Huggingface models are now installed with pip editable installations, which is faster since it doesn't copy around the weights

What's Changed

Better packages by @percevalw in https://github.com/aphp/edsnlp/pull/302

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.12.1...v0.12.2

- Python
Published by percevalw over 1 year ago

Changelog

Added

Added binary distribution for linux aarch64 (Streamlit's environment)
Added new separator option in eds.table and new input check

Fixed

Make catalogue & entrypoints compatible with py37-py312
Check that a data has a doc before trying to use the document's note_datetime

Pull Requests

Fix catalogue entrypoints by @percevalw in https://github.com/aphp/edsnlp/pull/297
Adding sep_pattern in eds.tables docstring by @svittoz in https://github.com/aphp/edsnlp/pull/286
chore: bump version to 0.12.1 by @percevalw in https://github.com/aphp/edsnlp/pull/300

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.12.0...v0.12.1

- Python
Published by percevalw over 1 year ago

Changelog

Added

The eds.transformer component now accepts prompts (passed to its preprocess method, see breaking change below) to add before each window of text to embed.
LazyCollection.map / map_batches now support generator functions as arguments.
Window stride can now be disabled (i.e., stride = window) during training in the eds.transformer component by training_stride = False
Added a new eds.ner_overlap_scorer to evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold
edsnlp.load now accepts EDS-NLP models from the huggingface hub 🤗 !
New python -m edsnlp.package command to package a model for the huggingface hub or pypi-like registries

Changed

Trainable embedding components now all use foldedtensor to return embeddings, instead of returning a tensor of floats and a mask tensor.
:boom: TorchComponent __call__ no longer applies the end to end method, and instead calls the forward method directly, like all torch modules.
The trainable eds.span_qualifier component has been renamed to eds.span_classifier to reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not).
omop converter now takes the note_datetime field into account by default when building a document
span._.date.to_datetime() and span._.date.to_duration() now automatically take the note_datetime into account
nlp.vocab is no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway
:boom: Major breaking change in trainable components, moving towards a more "task-centric" design:
- the eds.transformer component is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via the preprocess method, which now accepts more arguments than just the docs to process.
- similarly the eds.span_pooler is now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in the preprocess method.

Consequently, the eds.transformer and eds.span_pooler no longer accept their span_getter argument, and the eds.ner_crf, eds.span_classifier, eds.span_linker and eds.span_qualifier components now accept a context_getter argument instead, as well as a span_getter argument for the latter two. This refactoring can be summarized as follows:

```diff - eds.transformer.spangetter + eds.nercrf.contextgetter + eds.spanclassifier.contextgetter + eds.spanlinker.context_getter

eds.spanpooler.spangetter
eds.spanqualifier.spangetter
eds.spanlinker.spangetter ```

and as an example for the eds.span_linker component:

diff nlp.add_pipe( eds.span_linker( metric="cosine", probability_mode="sigmoid", + span_getter="ents", + # context_getter="ents", -> by default, same as span_getter embedding=eds.span_pooler( hidden_size=128, - span_getter="ents", embedding=eds.transformer( - span_getter="ents", model="prajjwal1/bert-tiny", window=128, stride=96, ), ), ), name="linker", )

Fixed

edsnlp.data.read_json now correctly read the files from the directory passed as an argument, and not from the parent directory.
Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc)
Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors

Pull Requests

Drop codecov by @percevalw in https://github.com/aphp/edsnlp/pull/292
Fix dates by @percevalw in https://github.com/aphp/edsnlp/pull/288
Loading models from the hf hub by @percevalw in https://github.com/aphp/edsnlp/pull/293
Fix: only reinstall hf model when cache files are changed by @percevalw in https://github.com/aphp/edsnlp/pull/295
feat: expose package script to cli by @percevalw in https://github.com/aphp/edsnlp/pull/294
chore: bump version to 0.12.0 by @percevalw in https://github.com/aphp/edsnlp/pull/296

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.11.2...v0.12.0

- Python
Published by percevalw almost 2 years ago

edsnlp - v0.11.2

Changelog

Fixed

Fix edsnlp.utils.file_system.normalize_fs_path file system detection not working correctly
Improved performance of edsnlp.data methods over a filesystem (fs parameter)

Pull Requests

Fix normalize fs path by @svittoz in https://github.com/aphp/edsnlp/pull/283
Faster fs io by @percevalw in https://github.com/aphp/edsnlp/pull/285

New Contributors

@svittoz made their first contribution in https://github.com/aphp/edsnlp/pull/283

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.11.1...v0.11.2

- Python
Published by percevalw almost 2 years ago

edsnlp - v0.11.1

Changelog

Added

Automatic estimation of cpu count when using multiprocessing
optim.initialize() method to create optim state before the first backward pass

Changed

nlp.post_init will not tee lazy collections anymore (use edsnlp.utils.collections.multi_tee yourself if needed)

Fixed

Corrected inconsistencies in eds.span_linker

Pull Requests

Fix span linking by @percevalw in https://github.com/aphp/edsnlp/pull/282

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.11.0...v0.11.1

- Python
Published by percevalw almost 2 years ago

edsnlp - v0.11.0

Changelog

Added

Support for a filesystem parameter in every edsnlp.data.read_* and edsnlp.data.write_* functions
Pipes of a pipeline are now easily accessible with nlp.pipes.xxx instead of nlp.get_pipe("xxx")
Support builtin Span attributes in converters span_attributes parameter, e.g. ```python import edsnlp

nlp = ... nlp.add_pipe("eds.sentences")

data = edsnlp.data.fromxxx(...) data = data.mappipeline(nlp) data.topandas(converters={"ents": {"spanattributes": ["sent.text", "start", "end"]}}) - Support assigning Brat AnnotatorNotes as span attributes: `edsnlp.data.read_standoff(..., notes_as_span_attribute="cui")` - Support for mapping full batches in `edsnlp.processing` pipelines with `map_batches` lazy collection method:python import edsnlp

data = edsnlp.data.fromxxx(...) data = data.mapbatches(lambda batch: dosomething(batch)) data.topandas() ``- Newdata.mapgpu` method to map a deep learning operation on some data and take advantage of edsnlp multi-gpu inference capabilities - Added average precision computation in edsnlp spanclassification scorer - You can now add pipes to your pipeline by instantiating them directly, which comes with many advantages, such as auto-completion, introspection and type checking !

```python import edsnlp, edsnlp.pipes as eds

nlp = edsnlp.blank("eds") nlp.addpipe(eds.sentences()) # instead of nlp.addpipe("eds.sentences") ```

The previous way of adding pipes is still supported. - New eds.span_linker deep-learning component to match entities with their concepts in a knowledge base, in synonym-similarity or concept-similarity mode.

Changed

nlp.preprocess_many now uses lazy collections to enable parallel processing
:warning: Breaking change. Improved and simplified eds.span_qualifier: we didn't support combination groups before, so this feature was scrapped for now. We now also support splitting values of a single qualifier between different span labels.
Optimized edsnlp.data batching, especially for large batch sizes (removed a quadratic loop)
:warning: Breaking change. By default, the name of components added to a pipeline is now the default name defined in their class __init__ signature. For most components of EDS-NLP, this will change the name from "eds.xxx" to "xxx".

Fixed

Flatten list outputs (such as "ents" converter) when iterating: nlp.map(data).to_iterable("ents") is now a list of entities, and not a list of lists of entities
Allow span pooler to choose between multiple base embedding spans (as likely produced by eds.transformer) by sorting them by Dice overlap score.
EDS-NLP does not raise an error anymore when saving a model to an already existing, but empty directory

Pull Requests

Support for a filesystem param in all edsnlp.data readers/writers by @percevalw in https://github.com/aphp/edsnlp/pull/274
Data fixes by @percevalw in https://github.com/aphp/edsnlp/pull/275
Refacto span classification by @percevalw in https://github.com/aphp/edsnlp/pull/276
Entity linking by @percevalw in https://github.com/aphp/edsnlp/pull/280
chore: bump version to 0.11.0 by @percevalw in https://github.com/aphp/edsnlp/pull/281

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.7...v0.11.0

- Python
Published by percevalw almost 2 years ago

edsnlp - v0.10.7

Changelog

Added

Support empty converter (by default now) in edsnlp.data writers (do not convert by default)
Add support for polars data import / export
Allow kwargs in eds.transformer to pass to the transformer model

Changed

Saving pipelines now longer saves the disabled status of the pipes (i.e., all pipes are considered "enabled" when saved). This feature was not used and causing issues when saving a model wrapped in a nlp.select_pipes context.

Fixed

Allow missing meta.json, tokenizer and vocab paths when loading saved models
Save torch buffers when dumping machine learning models to disk (previous versions only saved the model parameters)
Fix automatic batch_size estimation in eds.transformer when max_tokens_per_device is set to auto and multiple GPUs are used
Fix JSONL file parsing

Pull Requests

Polars by @percevalw in https://github.com/aphp/edsnlp/pull/270
Various fixes by @percevalw in https://github.com/aphp/edsnlp/pull/271
chore: bump version to 0.10.7 by @percevalw in https://github.com/aphp/edsnlp/pull/272

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.6...v0.10.7

- Python
Published by percevalw almost 2 years ago

edsnlp - v0.10.6

What's Changed

Added

Added batch_by, split_into_batches_after, sort_chunks, chunk_size, disable_implicit_parallelism parameters to processing (simple and multiprocessing) backends to improve performance and memory usage. Sorting chunks can improve yield up to twice the speed in some cases.
The deep learning cache mechanism now supports multitask models with weight sharing in multiprocessing mode.
Added max_tokens_per_device="auto" parameter to eds.transformer to estimate memory usage and automatically split the input into chunks that fit into the GPU.

Changed

Improved speed and memory usage of the eds.text_cnn pipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases.
Deprecate the converters' (especially for BRAT/Standoff data) bool_attributes parameter in favor of general default_attributes. This new mapping describes how to set attributes on spans for which no attribute value was found in the input format. This is especially useful for negation, or frequent attributes values (e.g. "negated" is often False, "temporal" is often "present"), that annotators may not want to annotate every time.
Default eds.ner_crf window is now set to 40 and stride set to 20, as it doesn't affect throughput (compared to before, window set to 20) and improves accuracy.
New default overlap_policy='merge' option and parameter renaming in eds.span_context_getter (which replaces eds.span_sentence_getter)

Fixed

Improved error handling in multiprocessing backend (e.g., no more deadlock)
Various improvements to the data processing related documentation pages
Begin of sentence / end of sentence transitions of the eds.ner_crf component are now disabled when windows are used (e.g., neither window=1 equivalent to softmax and window=0equivalent to default full sequence Viterbi decoding)
eds tokenizer nows inherits from spacy.Tokenizer to avoid typing errors
Only match 'ne' negation pattern when not part of another word to avoid false positives cases like u[ne] cure de 10 jours
Disabled pipes are now correctly ignored in the Pipeline.preprocess method
Add "eventuel*" patterns to eds.hyphothesis

Pull Requests

Multi head ml by @percevalw in https://github.com/aphp/edsnlp/pull/257
Default span attributes on data loading by @percevalw in https://github.com/aphp/edsnlp/pull/258
Disable NER CRF BOS/EOS transitions when CRF windows are enabled by @percevalw in https://github.com/aphp/edsnlp/pull/259
Fix "eds" tokenizer base by @percevalw in https://github.com/aphp/edsnlp/pull/260
fix: only match 'ne' negation pattern when not part of another word by @percevalw in https://github.com/aphp/edsnlp/pull/261
Update patterns for hypothesis détection by @LaRiffle in https://github.com/aphp/edsnlp/pull/266
Add overlappolicy='merge' option to makesentencespangetter by @percevalw in https://github.com/aphp/edsnlp/pull/262
Fix select pipes by @percevalw in https://github.com/aphp/edsnlp/pull/267
chore: bump version to 0.10.6 by @percevalw in https://github.com/aphp/edsnlp/pull/268

New Contributors

@LaRiffle made their first contribution in https://github.com/aphp/edsnlp/pull/266

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.5...v0.10.6

- Python
Published by percevalw almost 2 years ago

edsnlp - v0.10.5

Changelog

Fixed

Allow non-url paths when parquet filesystem is given

Pull Requests

Allow non-url paths when parquet filesystem is given by @percevalw in https://github.com/aphp/edsnlp/pull/254
Bump version to 0.10.5 by @percevalw in https://github.com/aphp/edsnlp/pull/255

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.4...v0.10.5

- Python
Published by percevalw about 2 years ago

edsnlp - v0.10.4

Changelog

Changed

Assigning doc._.note_datetime will now automatically cast the value to a pendulum.DateTime object

Added

Support loading model from package name (e.g., edsnlp.load("eds_pseudo_aphp"))
Support filesystem parameter in edsnlp.data.read_parquet and edsnlp.data.write_parquet

Fixed

Support doc -> list converters with parquet files writer
Fixed some OOM errors when writing many outputs to parquet files
Both edsnlp & spacy factories are now listed when a factory lookup fails
Fixed some GPU OOM errors with the eds.transformer pipe when processing really long documents

Pull Requests

ML inference fixes & features by @percevalw in https://github.com/aphp/edsnlp/pull/251
Bump version to 0.10.4 by @percevalw in https://github.com/aphp/edsnlp/pull/252

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.3...v0.10.4

- Python
Published by percevalw about 2 years ago

edsnlp - v0.10.3

Changelog

Changed

By default, edsnlp.data.write_json will infer if the data should be written as a single JSONL file or as a directory of JSON files, based on the path argument being a file or not.

Fixed

Measurements now correctly match "0.X", "0.XX", ... numbers
Typo in "celsius" measurement unit
Spaces and digits are now supported in BRAT entity labels
Fixed missing 'permet pas + verb' false positive negation patterns

Pull Requests

fix: support missing torch with spark and multiprocessing backends by @percevalw in https://github.com/aphp/edsnlp/pull/244
Fix typo in "celcius" by @julienduquesne in https://github.com/aphp/edsnlp/pull/247
Add pattern to measurements to catch "0.XX" numbers by @julienduquesne in https://github.com/aphp/edsnlp/pull/246
Fix for (#242): connectors not working brat json training issues by @percevalw in https://github.com/aphp/edsnlp/pull/248
Handle 'ne permet pas + verb' false positive negation patterns by @percevalw in https://github.com/aphp/edsnlp/pull/249
Bump version to 0.10.3 by @percevalw in https://github.com/aphp/edsnlp/pull/250

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.2...v0.10.3

- Python
Published by percevalw about 2 years ago

edsnlp - v0.10.2

Changelog

Changed

eds.span_qualifier qualifiers argument now automatically adds the underscore prefix to qualifiers if not present

Fixed

Fix imports of components declared in spacy_factories entry points
Support pendulum v3
AsList errors are now correctly reported
eds.span_qualifier saved configuration during to_disk is now longer null

Pull Requests

fix: use spacy entry points for missing factories by @percevalw in https://github.com/aphp/edsnlp/pull/238

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.1...v0.10.2

- Python
Published by percevalw about 2 years ago

edsnlp - v0.10.1

Changelog

Changed

Small regex matching performance improvement, up to 1.25x faster (e.g. eds.measurements)

Fixed

Microgram scale is now correctly 1/1000g and inverse meter now 1/100 inverse cm. "cac" and "goutte" units have been fixed as well.
We now isolate some of edsnlp components (trainable pipes that require ml dependencies) in a new edsnlp_factories entry points to prevent spacy from auto-importing them.
TNM scores followed by a space are now correctly detected
Removed various short TNM false positives (e.g., "PT" or "a T")
The Span value extension is not more forcibly overwritten, and user assigned values are returned by Span._.value in priority, before the aggregated span._.get(span.label_) getter result (#220)
Enable mmap during multiprocessing model transfers
RegexMatcher now supports all alignment modes (strict, expand, contract) and better handles partial doc matching (#201).
on_ent_only=False/True is now supported again in qualifier pipes (e.g., "eds.negation", "eds.hypothesis", ...)

Pull Requests

fix scales by @ycattan in https://github.com/aphp/edsnlp/pull/231
Isolate edsnlp entry points to prevent auto-import by spacy by @percevalw in https://github.com/aphp/edsnlp/pull/235
fix volume units "goutte" and "cac" by @ycattan in https://github.com/aphp/edsnlp/pull/233
Detect tnm entities followed by a space by @percevalw in https://github.com/aphp/edsnlp/pull/229
Enable mmap during multiprocessing model transfers by @percevalw in https://github.com/aphp/edsnlp/pull/236
Compatible Span._.value extension by @percevalw in https://github.com/aphp/edsnlp/pull/228
Support all alignment modes in regex matching & partial doc matching by @percevalw in https://github.com/aphp/edsnlp/pull/230
Fix span_getter argument for qualifiers by @Thomzoy in https://github.com/aphp/edsnlp/pull/223
Bump version to 0.10.1 by @percevalw in https://github.com/aphp/edsnlp/pull/237

New Contributors

@ycattan made their first contribution in https://github.com/aphp/edsnlp/pull/231

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.0...v0.10.1

- Python
Published by percevalw about 2 years ago

edsnlp - v0.10.0

Changelog

Added

New add unified edsnlp.data api (json, brat, spark, pandas) and LazyCollection object to efficiently read / write data from / to different formats & sources.
New unified processing API to select the execution execution backends via data.set_processing(...)
The training scripts can now use data from multiple concatenated adapters
Support quantized transformers (compatible with multiprocessing as well !)

Changed

edsnlp.pipelines has been renamed to edsnlp.pipes, but the old name is still available for backward compatibility
Pipes (in edsnlp/pipes) are now lazily loaded, which should improve the loading time of the library.
to_disk methods can now return a config to override the initial config of the pipeline (e.g., to load a transformer directly from the path storing its fine-tuned weights)
The eds.tokenizer tokenizer has been added to entry points, making it accessible from the outside
Deprecate old connectors (e.g. BratDataConnector) in favor of the new edsnlp.data API
Deprecate old pipe wrapper in favor of the new processing API

Fixed

Support for pydantic v2
Support for python 3.11 (not ci-tested yet)

Pull Requests

Fix matcher assigns by @percevalw in https://github.com/aphp/edsnlp/pull/222
Refactor to use Pytorch for training models by @percevalw in https://github.com/aphp/edsnlp/pull/202
Relieve dependency constraints by @percevalw in https://github.com/aphp/edsnlp/pull/227

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.9.1...v0.10.0

- Python
Published by percevalw about 2 years ago

edsnlp - v0.10.0beta1

Changelog

Large refacto of EDS-NLP to allow training models and performing inference using PyTorch as the deep-learning backend. Rather than a mere wrapper of Pytorch using spaCy, this is a new framework to build hybrid multi-task models.

To achieve this, instead of patching spaCy's pipeline, a new pipeline was implemented in a similar fashion to aphp/edspdf#12. The new pipeline tries to preserve the existing API, especially for non-machine learning uses such as rule-based components. This means that users can continue to use the library in the same way as before, while also having the option to train models using PyTorch. We still use spaCy data structures such as Doc and Span to represent the texts and their annotations.

Otherwise, changes should be transparent for users that still want to use spacy pipelines with nlp = spacy.blank('eds'). To benefit from the new features, users should use nlp = edsnlp.blank('eds') instead.

Added

New pipeline system available via edsnlp.blank('eds') (instead of spacy.blank('eds'))
Use the confit package to instantiate components
Training script with Pytorch only (tests/training/) and tutorial
New trainable embeddings: eds.transformer, eds.text_cnn, eds.span_pooler embedding contextualizer pipes
Re-implemented the trainable NER component and trainable Span qualifier with the new system under eds.ner_crf and eds.span_classifier
New efficient implementation for eds.transformer (to be used in place of spacy-transformer)

Changed

Pipe registering: Language.factory -> edsnlp.registry.factory.register via confit
Lazy loading components from their entry point (had to patch spacy.Language.init) to avoid having to wrap every import torch statement for pure rule-based use cases. Hence, torch is not a required dependency

Pull Requests

This pre-release is tracked in #202.

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.9.1...v0.10.0beta1

- Python
Published by percevalw over 2 years ago

edsnlp - v0.9.1

Changelog

Changed

Improve negation patterns
Abstent disorders now set the negation to True when matched as ABSENT
Default qualifier is now None instead of False (empty string)

Fixed

span_getter is not incompatible with onentsonly anymore
ContextualMatcher now supports empty matches (e.g. lookahead/lookbehind) in assign patterns

Pull Requests

Fix negations by @percevalw in https://github.com/aphp/edsnlp/pull/216
Chore: bump version to 0.9.1 by @percevalw in https://github.com/aphp/edsnlp/pull/218

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.9.0...v0.9.1

- Python
Published by percevalw over 2 years ago

edsnlp - v0.9.0

Changelog

Added

New to_duration method to convert an absolute date into a date relative to the note_datetime (or None)

Changes

Input and output of components are now specified by span_getter and span_setter arguments.
:boom: Score / disorders / behaviors entities now have a fixed label (passed as an argument), instead of being dynamically set from the component name. The following scores may have a different name than the current one in your pipelines:
- eds.emergency.gemsa → emergency_gemsa
- eds.emergency.ccmu → emergency_ccmu
- eds.emergency.priority → emergency_priority
- eds.charlson → charlson
- eds.elston_ellis → elston_ellis
- eds.SOFA → sofa
- eds.adicap → adicap
- eds.measuremets → size, weight, ... instead of eds.size, eds.weight, ...
eds.dates now separate dates from durations. Each entity has its own label:
- spans["dates"] → entities labelled as date with a span._.date parsed object
- spans["durations"] → entities labelled as duration with a span._.duration parsed object
the "relative" / "absolute" / "duration" mode of the time entity is now stored in the mode attribute of the span._.date/duration
the "from" / "until" period bound, if any, is now stored in the span._.date.bound attribute
to_datetime now only return absolute dates, converts relative dates into absolute if doc._.note_datetime is given, and None otherwise

Fixed

export_to_brat issue with spans of entities on multiple lines.

Pull Requests

Fix exporttobrat when there are spaces before new lines by @TheooJ in https://github.com/aphp/edsnlp/pull/211
Refacto of the extensions by @percevalw in https://github.com/aphp/edsnlp/pull/213
chore: bump version to 0.9.0 by @percevalw in https://github.com/aphp/edsnlp/pull/215

New Contributors

@TheooJ made their first contribution in https://github.com/aphp/edsnlp/pull/211

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.8.1...v0.9.0

- Python
Published by percevalw over 2 years ago

edsnlp - v0.8.1

Post-release to synchronize Zenodo

- Python
Published by percevalw over 2 years ago

edsnlp - v0.8.1

What's changed

Fix release to allow installation from source.

Pull Requests

Ship cython files in sdist by @percevalw in https://github.com/aphp/edsnlp/pull/210

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.8.0...v0.8.1

- Python
Published by percevalw over 2 years ago

edsnlp - v0.8.0

Changelog

Added

New trainable component for multi-label, multi-class span qualification (any attribute/extension)
Add range measurements (like la tumeur fait entre 1 et 2 cm) to eds.measurements matcher
Add eds.CKD component
Add eds.COPD component
Add eds.alcohol component
Add eds.cerebrovascular_accident component
Add eds.congestive_heart_failure component
Add eds.connective_tissue_disease component
Add eds.dementia component
Add eds.diabetes component
Add eds.hemiplegia component
Add eds.leukemia component
Add eds.liver_disease component
Add eds.lymphoma component
Add eds.myocardial_infarction component
Add eds.peptic_ulcer_disease component
Add eds.peripheral_vascular_disease component
Add eds.solid_tumor component
Add eds.tobacco component
Add eds.spaces (or eds.normalizer with spaces=True) to detect space tokens, and add ignore_space_tokens to EDSPhraseMatcher and SimstringMatcher to skip them
Add ignore_space_tokens option in most components
eds.tables: new pipeline to identify formatted tables
New merge_mode parameter in eds.measurements to normalize existing entities or detect measures only inside existing entities
Tokenization exceptions (Mr., Dr., Mrs.) and non end-of-sentence periods are now tokenized with the next letter in the eds tokenizer

Changed

Disable EDSMatcher preprocessing auto progress tracking by default
Moved dependencies to a single pyproject.toml: support for pip install -e '.[dev,docs,setup]'
ADICAP matcher now allow dot separators (e.g. B.H.HP.A7A0)

Fixed

Abbreviation and number tokenization issues in the eds tokenizer
eds.adicap : reparsed the dictionnary used to decode the ADICAP codes (some of them were wrongly decoded)
Fix build for python 3.9 on Mac M1/M2 machines.

What's changed

Pull Requests

docs: mention INRIA in the acknowledgment by @percevalw in https://github.com/aphp/edsnlp/pull/170
Umls fixes by @percevalw in https://github.com/aphp/edsnlp/pull/183
fix typo by @gammaeva in https://github.com/aphp/edsnlp/pull/179
add link and definiton for sofa in documentation by @strayMat in https://github.com/aphp/edsnlp/pull/182
CI fail exploration by @Thomzoy in https://github.com/aphp/edsnlp/pull/189
Repare parsing errors of the ADICAP dict by @etienneguevel in https://github.com/aphp/edsnlp/pull/187
Move dependencies to pyproject.toml by @percevalw in https://github.com/aphp/edsnlp/pull/190
Add tokenization exceptions and detect some false positive EOS by @percevalw in https://github.com/aphp/edsnlp/pull/192
Bump version to 0.8.0 by @percevalw in https://github.com/aphp/edsnlp/pull/194
Update docs by @percevalw in https://github.com/aphp/edsnlp/pull/196
Ignore space tokens by @percevalw in https://github.com/aphp/edsnlp/pull/198
pipe tables by @aricohen93 in https://github.com/aphp/edsnlp/pull/180
Range measurements by @percevalw in https://github.com/aphp/edsnlp/pull/195
SpanQualifier trainable component by @percevalw in https://github.com/aphp/edsnlp/pull/193
18 pipes from the Charlson Comorbidity Index by @Thomzoy in https://github.com/aphp/edsnlp/pull/205
Bump version to v0.8.0 by @percevalw in https://github.com/aphp/edsnlp/pull/209

New Contributors

@gammaeva made their first contribution in https://github.com/aphp/edsnlp/pull/179
@strayMat made their first contribution in https://github.com/aphp/edsnlp/pull/182

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.4...v0.8.0

- Python
Published by percevalw over 2 years ago

edsnlp - v0.7.4

Changelog

Added

eds.history : Add the option to consider only the closest dates in the sentence (dates inside the boundaries and if there is not, it takes the closest date in the entire sentence).
eds.negation : It takes into account following past participates and preceding infinitives.
eds.hypothesis: It takes into account following past participates hypothesis verbs.
eds.negation & eds.hypothesis : Introduce new patterns and remove unnecessary patterns.
eds.dates : Add a pattern for preceding relative dates (ex: l'embolie qui est survenue à 10 jours).
Improve patterns in the eds.pollution component to account for multiline footers
Add QuickExample object to quickly try a pipeline.
Add UMLS terminology matcher eds.umls
New RegexMatcher method to create spans from groupdicts
New eds.dates option to disable time detection

Changed

Improve date detection by removing false positives

Fixed

eds.hypothesis : Remove too generic patterns.
EDSTokenizer : It now tokenizes "rechereche d'" as ["recherche", "d'"], instead of ["recherche", "d", "'"].
Fix small typos in the documentation and in the docstring.
Harmonize processing utils (distributed custom_pipe) to have the same API for Pandas and Pyspark
Fix BratConnector file loading issues with complex file hierarchies

Pull Requests

👓 Feedbacks from EDS-TeVa study by @Aremaki in https://github.com/aphp/edsnlp/pull/157
feat: :stethoscope: Update negation and hypothesis pipelines by @Aremaki in https://github.com/aphp/edsnlp/pull/162
Harmonize processing utils by @aricohen93 in https://github.com/aphp/edsnlp/pull/160
Update pattern footer (pollution) by @aricohen93 in https://github.com/aphp/edsnlp/pull/159
feat: add UMLS terminology (#147) by @percevalw in https://github.com/aphp/edsnlp/pull/165
Relax pydantic version constraints by @percevalw in https://github.com/aphp/edsnlp/pull/167
Allow back spacy dot components for backward compatibility by @percevalw in https://github.com/aphp/edsnlp/pull/152
Update docs by @percevalw in https://github.com/aphp/edsnlp/pull/168
Bump version to 0.7.3 by @percevalw in https://github.com/aphp/edsnlp/pull/169
Quick example by @Thomzoy in https://github.com/aphp/edsnlp/pull/166
Update index.md by @Thomzoy in https://github.com/aphp/edsnlp/pull/171
Fix brat file path search for complex file hierarchies by @percevalw in https://github.com/aphp/edsnlp/pull/172
Improve dates by @percevalw in https://github.com/aphp/edsnlp/pull/149
Bump version to 0.7.4 by @percevalw in https://github.com/aphp/edsnlp/pull/173

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.2...v0.7.4

- Python
Published by percevalw about 3 years ago

edsnlp - v0.7.2

Changelog

Added

Improve the eds.history component by taking into account the date extracted from eds.dates component.
New pop up when you click on the copy icon in the termynal widget (docs).
Add NER eds.elston-ellis pipeline to identify Elston Ellis scores
Add flags=re.MULTILINE to eds.pollution and change pattern of footer

Fixed

Remove the warning in the eds.sections when eds.normalizer is in the pipe.
Fix filter_spans for strictly nested entities
Fill eds.remove-lowercase "assign" metadata to run the pipeline during EDSPhraseMatcher preprocessing

Pull Requests

Update patterns pollution by @aricohen93 in https://github.com/aphp/edsnlp/pull/145
feat: :sparkles: Improve eds.history component with eds.dates by @Aremaki in https://github.com/aphp/edsnlp/pull/144
Small fixes by @percevalw in https://github.com/aphp/edsnlp/pull/146
Elston and Ellis by @etienneguevel in https://github.com/aphp/edsnlp/pull/148
Fix setup.py by @percevalw in https://github.com/aphp/edsnlp/pull/151
Patch patterns norm by @aricohen93 in https://github.com/aphp/edsnlp/pull/150
Bump version to 0.7.2 by @percevalw in https://github.com/aphp/edsnlp/pull/153

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.1...v0.7.2

- Python
Published by percevalw over 3 years ago

edsnlp - v0.7.1

Changelog

Added

Add new patterns (footer, web entities, biology tables, coding sections) to pipeline normalisation (pollution)

Changed

Improved TNM detection algorithm
Account for more modifiers in ADICAP codes detection

Fixed

Add nephew, niece and daughter to family qualifier patterns
EDSTokenizer (spacy.blank('eds')) now recognizes non-breaking whitespaces as spaces and does not split float numbers
eds.dates pipeline now allows new lines as space separators in dates

Pull Requests

add: new patterns to pollution by @Thomzoy in https://github.com/aphp/edsnlp/pull/132
docs: fix cim10 docs by @percevalw in https://github.com/aphp/edsnlp/pull/130
Remove print statement by @Thomzoy in https://github.com/aphp/edsnlp/pull/133
fix: param sampling AdicapCode by @etienneguevel in https://github.com/aphp/edsnlp/pull/131
Add nephew, niece and daughter to family qualifier patterns by @julienduquesne in https://github.com/aphp/edsnlp/pull/135
Modification of the TNM ner by @etienneguevel in https://github.com/aphp/edsnlp/pull/136
modification of the ADICAP ner by @etienneguevel in https://github.com/aphp/edsnlp/pull/137
EDSTokenizer: split on non-breaking spaces and don't split float numbers by @percevalw in https://github.com/aphp/edsnlp/pull/141
Allow newlines in dates by @percevalw in https://github.com/aphp/edsnlp/pull/142
new pattern norm pollution by @aricohen93 in https://github.com/aphp/edsnlp/pull/139
Bump version to 0.7.1 by @percevalw in https://github.com/aphp/edsnlp/pull/143

New Contributors

@etienneguevel made their first contribution in https://github.com/aphp/edsnlp/pull/131
@julienduquesne made their first contribution in https://github.com/aphp/edsnlp/pull/135

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.0...v0.7.1

- Python
Published by percevalw over 3 years ago

edsnlp - v0.7.0

Changelog

Added

New nested NER trainable nested_ner pipeline component
Support for nested entities and attributes in BratDataConnector
Pytorch wrappers and experimental training utils
Add attribute section to entities
Add new cases for separator pattern when components of the TNM score are separated by a forward slash
Add NER eds.adicap pipeline to identify ADICAP codes

Changed

Update of the ContextualMatcher (and all pipelines depending on it), rendering it more flexible to use
Rename R component of score TNM as "resection_completeness"

Fixed

Prevent section titles from capturing surrounding tokens, causing overlaps (#113)
Enhance existing patterns for section detection and add patterns for previously ignored sections (introduction, evolution, modalites de sortie, vaccination) .
Fix explain mode, which was always triggered, in eds.history factory.
Fix test in eds.sections. Previously, no check was done
Remove SOFA scores spurious span suffixes

Pull requests

Change links to streamlit demo by @percevalw in https://github.com/aphp/edsnlp/pull/111
Restore demo links by @percevalw in https://github.com/aphp/edsnlp/pull/112
Prevent section titles from capturing surrounding tokens by @percevalw in https://github.com/aphp/edsnlp/pull/114
Section upgrade by @paul-bssr in https://github.com/aphp/edsnlp/pull/115
Nested NER trainable pipeline component by @percevalw in https://github.com/aphp/edsnlp/pull/84
Fix history factory parameter type by @clementjumel in https://github.com/aphp/edsnlp/pull/117
Rename R component (TNM) by @aricohen93 in https://github.com/aphp/edsnlp/pull/119
Update separator pattern score TNM by @aricohen93 in https://github.com/aphp/edsnlp/pull/121
add section info to entities by @aricohen93 in https://github.com/aphp/edsnlp/pull/120
Adicap pipeline by @aricohen93 in https://github.com/aphp/edsnlp/pull/123
ContextualMatcher + ADICAP Update by @Thomzoy in https://github.com/aphp/edsnlp/pull/124
fix: handle single entity in contextual matcher by @Thomzoy in https://github.com/aphp/edsnlp/pull/126
Adicap model by @percevalw in https://github.com/aphp/edsnlp/pull/127
chore: bump version to 0.7.0 by @percevalw in https://github.com/aphp/edsnlp/pull/125
v0.7.0 + fixed package_data by @percevalw in https://github.com/aphp/edsnlp/pull/129

New Contributors

@paul-bssr made their first contribution in https://github.com/aphp/edsnlp/pull/115
@clementjumel made their first contribution in https://github.com/aphp/edsnlp/pull/117

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.6.2...v0.7.0

- Python
Published by percevalw over 3 years ago

edsnlp - v0.6.2

Changelog

Added

New SimstringMatcher matcher to perform fuzzy term matching, and algorithm parameter in terminology components and eds.matcher component

Changed

Add consultation date pattern "CS", and False Positive patterns for dates (namely phone numbers and pagination).
Update the pipeline score eds.TNM. Now it is possible to return a dictionary where the results are either str or int values

Fixed

Add new patterns to the negation qualifier
Numpy header issues with binary distributed packages
Simstring dependency on Windows

Pull Requests

chore: add acknowledgement by @bdura in https://github.com/aphp/edsnlp/pull/102
TNM by @aricohen93 in https://github.com/aphp/edsnlp/pull/103
fix: eds.sentences behaviour with dates by @bdura in https://github.com/aphp/edsnlp/pull/99
Add consultation date pattern and date False Positive by @JCharline in https://github.com/aphp/edsnlp/pull/107
Simstring by @percevalw in https://github.com/aphp/edsnlp/pull/94
Fix numpy header issues with binary packages by @percevalw in https://github.com/aphp/edsnlp/pull/109
fix: add "non" preceding pattern by @bdura in https://github.com/aphp/edsnlp/pull/105
Bump version to v0.6.2 by @percevalw in https://github.com/aphp/edsnlp/pull/110

New Contributors

@JCharline made their first contribution in https://github.com/aphp/edsnlp/pull/107

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.6.1...v0.6.2

- Python
Published by percevalw over 3 years ago

edsnlp - v0.6.1

Changelog

Added

Now possible to provide regex flags when using the RegexMatcher
New ContextualMatcher pipe, aiming at replacing the AdvancedRegex pipe.
New as_ents parameter for eds.dates, to save detected dates as entities

Changed

Faster eds.sentences pipeline component with Cython
Bump version of Pydantic in requirements.txt to 1.8.2 to handle an incompatibility with the ContextualMatcher
Optimise space requirements by using .csv.gz compression for verbs

Pull Requests

chore: bump version to 0.6.0 by @percevalw in https://github.com/aphp/edsnlp/pull/88
Fix norm and to_datetime dates methods by @percevalw in https://github.com/aphp/edsnlp/pull/92
SentenceSegmenter speed-up by @percevalw in https://github.com/aphp/edsnlp/pull/95
Contextual matcher by @Thomzoy in https://github.com/aphp/edsnlp/pull/93
bump pydantic version to minimal 1.8.2 by @Thomzoy in https://github.com/aphp/edsnlp/pull/96
correct typo by @aricohen93 in https://github.com/aphp/edsnlp/pull/98
Bump version to v0.6.1 by @bdura in https://github.com/aphp/edsnlp/pull/101

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.6.0...v0.6.1

- Python
Published by bdura over 3 years ago

edsnlp - v0.6.0

What's Changed

Add new pattern for dates pipeline by @aricohen93 in https://github.com/aphp/edsnlp/pull/74
Simple terminology matcher by @bdura in https://github.com/aphp/edsnlp/pull/75
Force batch size of 2000 when distributing pipe by @Thomzoy in https://github.com/aphp/edsnlp/pull/73
Add CIM10 terminology by @bdura in https://github.com/aphp/edsnlp/pull/77
New NER drugs pipeline by @scossin in https://github.com/aphp/edsnlp/pull/58
Fix resources by @bdura in https://github.com/aphp/edsnlp/pull/79
Improve dates by @aricohen93 in https://github.com/aphp/edsnlp/pull/80
Miscellaneous changes to the documentation and changelog by @bdura in https://github.com/aphp/edsnlp/pull/78
Hot fix distributed pipe, default extension value by @Aremaki in https://github.com/aphp/edsnlp/pull/85
Remove trailing spaces on get_text function by @Thomzoy in https://github.com/aphp/edsnlp/pull/86
Measurements complete rewamp by @percevalw and @keyber in https://github.com/aphp/edsnlp/pull/21

New Contributors

@scossin made their first contribution in https://github.com/aphp/edsnlp/pull/58
@Aremaki made their first contribution in https://github.com/aphp/edsnlp/pull/85

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.3...v0.6.0

- Python
Published by percevalw over 3 years ago

edsnlp - v0.5.3

Changelog

Added

Support for strings in the example utility
TNM detection and normalisation with the eds.TNM pipeline
Support for arbitrary callback for Pandas multiprocessing, with the callback argument

Pull requests

Bump to version v0.5.2 by @bdura in https://github.com/aphp/edsnlp/pull/71
Add generic callback for multiprocessing by @bdura in https://github.com/aphp/edsnlp/pull/57
Add TNM detection and normalisation pipeline by @bdura in https://github.com/aphp/edsnlp/pull/56
chore: bump version to 0.5.3 by @bdura in https://github.com/aphp/edsnlp/pull/72

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.2...v0.5.3

- Python
Published by bdura almost 4 years ago

edsnlp - v0.5.2

Changelog

Added

Support for chained attributes in the processing pipelines
Colour utility with the category20 colour palette

Fixed

Correct a REGEX on the date detector (both nov and nov. are now detected, as all other months)

Pull requests

Fix documentation for handling multiple texts by @bdura in https://github.com/aphp/edsnlp/pull/53
feat: allow recursive attributes in processing by @Thomzoy in https://github.com/aphp/edsnlp/pull/54
Add colour utility by @bdura in https://github.com/aphp/edsnlp/pull/55
Update doc citation endlines by @aricohen93 in https://github.com/aphp/edsnlp/pull/60
Correct regex on date by @gozat in https://github.com/aphp/edsnlp/pull/70
Bump to version v0.5.2 by @bdura in https://github.com/aphp/edsnlp/pull/71

New Contributors

@aricohen93 made their first contribution in https://github.com/aphp/edsnlp/pull/60
@gozat made their first contribution in https://github.com/aphp/edsnlp/pull/70

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.1...v0.5.2

- Python
Published by bdura almost 4 years ago

edsnlp - v0.5.1

What's Changed

Use constrained cibuildwheel to compile wheels by @bdura in https://github.com/aphp/edsnlp/pull/50
Fix issue with Numpy and bump to v0.5.1 by @bdura in https://github.com/aphp/edsnlp/pull/52

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.0...v0.5.1

- Python
Published by bdura almost 4 years ago

edsnlp - v0.5.0

What's Changed

Reimplementation of the EDSPhraseMatcher in Cython by @percevalw in https://github.com/aphp/edsnlp/pull/43, with a x15 speed increase
Revamp of the date pipeline by @keyber in https://github.com/aphp/edsnlp/pull/22
New EDS Language (spacy.blank("eds")) by @percevalw in https://github.com/aphp/edsnlp/pull/34
Test code blocs in documentation by @bdura in https://github.com/aphp/edsnlp/pull/44

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.4.4...v0.5.0

- Python
Published by bdura almost 4 years ago

edsnlp - v0.4.4

What's Changed

Add measures pipeline
Cap Jinja2 version to fix mkdocs
Adding the possibility to add context in the processing module
Improve the speed of char replacement pipelines (accents and quotes)
Improve the speed of the regex matcher

- Python
Published by percevalw almost 4 years ago

edsnlp - v0.4.3

What's Changed

Demo: update dataframe representation by @bdura in https://github.com/aphp/edsnlp/pull/25
fix: regex matching on spans by @percevalw in https://github.com/aphp/edsnlp/pull/26

New Contributors

@percevalw made their first contribution in https://github.com/aphp/edsnlp/pull/26

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.4.2...v0.4.3

- Python
Published by percevalw almost 4 years ago

edsnlp - v0.4.2

New version.

Changes :

Fix issue with dateparser library (see scrapinghub/dateparser#1045)
Fix attr issue in the advanced-regex pipeline
Add documentation for eds.covid
Update the demo with an explanation for the regex

- Python
Published by bdura almost 4 years ago

edsnlp - v0.4.1

What's Changed

Deploy Streamlit demo and eds.covid pipeline component by @bdura in https://github.com/aphp/edsnlp/pull/2
Add Github CI by @bdura in https://github.com/aphp/edsnlp/pull/1
Skip no-commit-to-branch pre-commit hook by @bdura in https://github.com/aphp/edsnlp/pull/8
feat: matrices testing by @Thomzoy in https://github.com/aphp/edsnlp/pull/12
Add codecov by @bdura in https://github.com/aphp/edsnlp/pull/11
fix: gh-action strategy by @Thomzoy in https://github.com/aphp/edsnlp/pull/13
Update documentation by @bdura in https://github.com/aphp/edsnlp/pull/14
Documentation by @bdura in https://github.com/aphp/edsnlp/pull/15
Update coverage by @bdura in https://github.com/aphp/edsnlp/pull/16
Koalas support by @Thomzoy in https://github.com/aphp/edsnlp/pull/10

New Contributors

@Thomzoy made their first contribution in https://github.com/aphp/edsnlp/pull/12

Full Changelog: https://github.com/aphp/edsnlp/compare/v0.4.0...v0.4.1

- Python
Published by bdura almost 4 years ago

Recent Releases of edsnlp

edsnlp - v0.18.0

Changelog

Added

Fixed

Changed

Pull Requests

edsnlp - v0.18.0

Changelog

Added

Fixed

Changed

Pull Requests

edsnlp - v0.17.2

Changelog

Added

Fixed

Changed

Pull Requests

edsnlp - v0.17.1

Changelog

Added

Fixed

Pull Requests

New Contributors

edsnlp - v0.17.0

Changelog

Added

Changed

Fixed

Pull Requests

edsnlp - v0.16.0

Changelog

Added

Changed

Fixed

Pull Requests

New Contributors

edsnlp - v0.15.0

Changelog

Added

Fixed

Pull Requests

edsnlp - v0.14.0

Changelog

Added

Changed

Fixed

Data API changes

Training API changes

Pull Requests

edsnlp - v0.13.1

Changelog

Added

Fixed

Changed

Pull Requests

New Contributors

edsnlp - v0.13.0

Changelog

Added

Changed

Fixed

What's Changed

New Contributors

edsnlp - v0.12.3

edsnlp - v0.12.2

Changelog

Changed

What's Changed

edsnlp - v0.12.1

Changelog

Added

Fixed

Pull Requests

edsnlp - v0.12.0

Changelog

Added

Changed

Fixed