Recent Releases of edsnlp
edsnlp - v0.18.0
Changelog
Added
- Added support for multiple loggers (
tensorboard,wandb,comet_ml,aim,mlflow,clearml,dvclive,csv,json,rich) inedsnlp.trainvia theloggerparameter. Default is [jsonandrich] for backward compatibility. - Sub batch sizes for gradient accumulation can now be defined as simple "splits" of the original batch, e.g.
batch_size = 10000 tokensandsub_batch_size = 5 splitsto accumulate batches of 2000 tokens. - Parquet writer now has a
pyarrow_write_kwargsto pass to pyarrow.dataset.write_dataset - LinearSchedule (mostly used for LR scheduling) now allows a
end_valueparameter to configure if the learning rate should decay to zero or another value. - New
eds.explodepipe that splits one document into multiple documents, one per span yielded by itsspan_getterparameter, each new document containing exactly that single span. - New
Training a span classifiertutorial, and reorganized deep-learning docs ScheduledOptimizernow warns when a parameter selector does not match any parameter.
Fixed
use_sectionineds.historyshould now correctly handle cases when there are other sections following history sections.- Added clickable snippets in the documentation for more registered functions
- Pyarrow dataset writing with multiprocessing should be faster, as we removed a useless data transfer
- We should now correctly support loading transformers in offline mode if they were already in huggingface's cache
- We now support
words[-10:10]syntax in trainable span classifiercontext_getterparameter - :ambulance: Until now,
post_initwas applied after the instantiation of the optimizer : if the model discovered new labels, and therefore changed its parameter tensors to reflect that, these new tensors were not taken into account by the optimizer, which could likely lead to subpar performance. Now,post_initis applied before the optimizer is instantiated, so that the optimizer can correctly handle the new tensors. - Added missing entry points for readers and writers in the registry, including
write_parquetand support forpolarsinpyproject.toml. Now all implemented readers and writers are correctly registered as entry points.
Changed
- Sections cues in
eds.historyare now section titles, and not the full section. - :boom: Validation metrics are now found under the root field
validationin the training logs (e.g.metrics['validation']['ner']['micro']['f']) - It is now recommended to define optimizer groups of
ScheduledOptimizeras a list of dicts of optim hyper-parameters, each containing aselectorregex key, rather than as a single dict with aselectoras keys and a dict of optim hyper-parameters as values. This allows for more flexibility in defining the optimizer groups, and is more consistent with the rest of the EDS-NLP API. This makes it easier to reference groups values from other places in config files, since their path doesn't contain a complex regex string anymore. See the updated training tutorials for more details.
Pull Requests
- chore: bump version to 0.18.0 by @percevalw in https://github.com/aphp/edsnlp/pull/439
- fix: use_sections in eds.history should now work by @percevalw in https://github.com/aphp/edsnlp/pull/430
- docs: fix read parquet parameters docs by @percevalw in https://github.com/aphp/edsnlp/pull/425
- Explode pipe + span classifier training tutorial by @percevalw in https://github.com/aphp/edsnlp/pull/432
- Update, fix and refactor doc dependencies by @percevalw in https://github.com/aphp/edsnlp/pull/438
- fix: entrypoints by @aricohen93 in https://github.com/aphp/edsnlp/pull/420
- fix: take filter_expr into account in dependency parsing evaluation by @percevalw in https://github.com/aphp/edsnlp/pull/382
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.2...v0.18.0
- Python
Published by percevalw 6 months ago
edsnlp - v0.18.0
Changelog
Added
- Added support for multiple loggers (
tensorboard,wandb,comet_ml,aim,mlflow,clearml,dvclive,csv,json,rich) inedsnlp.trainvia theloggerparameter. Default is [jsonandrich] for backward compatibility. - Sub batch sizes for gradient accumulation can now be defined as simple "splits" of the original batch, e.g.
batch_size = 10000 tokensandsub_batch_size = 5 splitsto accumulate batches of 2000 tokens. - Parquet writer now has a
pyarrow_write_kwargsto pass to pyarrow.dataset.write_dataset - LinearSchedule (mostly used for LR scheduling) now allows a
end_valueparameter to configure if the learning rate should decay to zero or another value. - New
eds.explodepipe that splits one document into multiple documents, one per span yielded by itsspan_getterparameter, each new document containing exactly that single span. - New
Training a span classifiertutorial, and reorganized deep-learning docs ScheduledOptimizernow warns when a parameter selector does not match any parameter.
Fixed
use_sectionineds.historyshould now correctly handle cases when there are other sections following history sections.- Added clickable snippets in the documentation for more registered functions
- Pyarrow dataset writing with multiprocessing should be faster, as we removed a useless data transfer
- We should now correctly support loading transformers in offline mode if they were already in huggingface's cache
- We now support
words[-10:10]syntax in trainable span classifiercontext_getterparameter - :ambulance: Until now,
post_initwas applied after the instantiation of the optimizer : if the model discovered new labels, and therefore changed its parameter tensors to reflect that, these new tensors were not taken into account by the optimizer, which could likely lead to subpar performance. Now,post_initis applied before the optimizer is instantiated, so that the optimizer can correctly handle the new tensors. - Added missing entry points for readers and writers in the registry, including
write_parquetand support forpolarsinpyproject.toml. Now all implemented readers and writers are correctly registered as entry points.
Changed
- Sections cues in
eds.historyare now section titles, and not the full section. - :boom: Validation metrics are now found under the root field
validationin the training logs (e.g.metrics['validation']['ner']['micro']['f']) - It is now recommended to define optimizer groups of
ScheduledOptimizeras a list of dicts of optim hyper-parameters, each containing aselectorregex key, rather than as a single dict with aselectoras keys and a dict of optim hyper-parameters as values. This allows for more flexibility in defining the optimizer groups, and is more consistent with the rest of the EDS-NLP API. This makes it easier to reference groups values from other places in config files, since their path doesn't contain a complex regex string anymore. See the updated training tutorials for more details.
Pull Requests
- chore: bump version to 0.18.0 by @percevalw in https://github.com/aphp/edsnlp/pull/439
- fix: use_sections in eds.history should now work by @percevalw in https://github.com/aphp/edsnlp/pull/430
- docs: fix read parquet parameters docs by @percevalw in https://github.com/aphp/edsnlp/pull/425
- Explode pipe + span classifier training tutorial by @percevalw in https://github.com/aphp/edsnlp/pull/432
- Update, fix and refactor doc dependencies by @percevalw in https://github.com/aphp/edsnlp/pull/438
- fix: entrypoints by @aricohen93 in https://github.com/aphp/edsnlp/pull/420
- fix: take filter_expr into account in dependency parsing evaluation by @percevalw in https://github.com/aphp/edsnlp/pull/382
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.2...v0.18.0
- Python
Published by percevalw 6 months ago
edsnlp - v0.17.2
Changelog
Added
- Handling intra-word linebreak as pollution : adds a pollution pattern that detects intra-word linebreak, which can then be removed in the
get_textmethod - Qualifiers can process
SpanorDoc: this feature especially makes it easier to nest qualifiers components in other components - New labelweights parameter in eds.spanclassifier`, which allows the user to set per label-value loss weights during training
- New
edsnlp.data.converters.MarkupToDocConverterto convert Markdown or XML-like markup to documents, which is particularly useful to create annotated documents from scratch (e.g., for testing purposes). - New Metrics documentation page to document the available metrics and how to use them.
Fixed
- Various disorders/behaviors patches
Changed
- Deduplicate spans between doc.ents and doc.spans during train: previously, a
span_getterrequesting entities from bothentsandspanscould yield duplicates.
Pull Requests
- feat: Various patches by @Thomzoy in https://github.com/aphp/edsnlp/pull/391
- Metrics doc by @percevalw in https://github.com/aphp/edsnlp/pull/417
- chore: bump version to 0.17.2 by @percevalw in https://github.com/aphp/edsnlp/pull/424
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.1...v0.17.2
- Python
Published by percevalw 8 months ago
edsnlp - v0.17.1
Changelog
Added
- Added grad spike detection to the
edsnlp.trainscript, and per weight layer gradient logging.
Fixed
- Fixed mini-batch accumulation for multi-task training
- Fixed a pickling error when applying a pipeline in multiprocessing mode. This occurred in some cases when one of the pipes was declared in a "difficultly importable" module (e.g., causing a "PicklingWarning: Cannot locate reference to <class...").
- Fixed typo in
eds.consultation_datestowns:berck.sur.mer. - Fixed a bug where relative date expressions with bounds (e.g. 'depuis hier') raised an error when converted to durations.
- Fixed pipe ADICAP to deal with cases where not code is found after 'codification'/'adicap'
- Support "00"-like hours and minutes in the
eds.datescomponent - Fix arc minutes, arc seconds and degree unit scales in
eds.quantities, used when converting between different time (or angle) units
Pull Requests
- fix: add grad spike detection by @percevalw in https://github.com/aphp/edsnlp/pull/375
- fix: avoid pickling error in multiprocessing mode by @percevalw in https://github.com/aphp/edsnlp/pull/408
- fix: correct town name typo (berck.sur.mer) by @percevalw in https://github.com/aphp/edsnlp/pull/409
- fix: error when converting relative date expressions with bounds to durations by @percevalw in https://github.com/aphp/edsnlp/pull/411
- Fix adicap by @aricohen93 in https://github.com/aphp/edsnlp/pull/410
- Fix time matching by @LoickChardon in https://github.com/aphp/edsnlp/pull/413
- chore: bump version to 0.17.1 by @percevalw in https://github.com/aphp/edsnlp/pull/416
New Contributors
- @LoickChardon made their first contribution in https://github.com/aphp/edsnlp/pull/413
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.17.0...v0.17.1
- Python
Published by percevalw 9 months ago
edsnlp - v0.17.0
Changelog
Added
- Support for numpy>2.0, and formal support for Python 3.11 and Python 3.12
- Expose the defaults patterns of
eds.negation,eds.hypothesis,eds.family,eds.historyandeds.reported_speechunder aeds.negation.default_patternsattribute - Added a
context_getterSpanGetter argument to theeds.matcherclass to only retrieve entities inside the spans returned by the getter - Added a
filter_exprparameter to scorers to filter the documents to score - Added a new
requiredfield toeds.contextual_matcherassign patterns to only match if the required field has been found, and anincludeparameter (similar toexclude) to search for required patterns without assigning them to the entity - Added context strings (e.g., "words[0:5] | sent[0:1]") to the
eds.contextual_matchercomponent to allow for more complex patterns in the selection of the window around the trigger spans. - Include and exclude patterns in the contextual matcher now dismiss matches that occur inside the anchor pattern (e.g. "anti" exclude pattern for anchor pattern "antibiotics" will not match the "anti" part of "antibiotics")
- Pull Requests will now build a public accessible preview of the docs
Changed
- Improve the contextual matcher documentation.
Fixed
edsnlp.packagenow correctly detect if a project uses an old-style poetry pyproject or a PEP621 pyproject.toml.- PEP621 projects containing nested directories (e.g., "my_project/pipes/foo.py") are now supported.
- Try several paths to find current pip executable
- The parameter "value_extract" of
eds.scorenow correctly handles lists of patterns. - "Zero variance error" when computing param tuning importance are now catched and converted as a warning
Pull Requests
- Fix packaging by @percevalw in https://github.com/aphp/edsnlp/pull/395
- fix: avoid non-standard (pytoml) syntax in pyproject.toml by @percevalw in https://github.com/aphp/edsnlp/pull/399
- fix: try several paths to find current pip executable by @percevalw in https://github.com/aphp/edsnlp/pull/401
- Fix optuna issue by @LucasDedieu in https://github.com/aphp/edsnlp/pull/398
- Improve contextual matcher by @percevalw in https://github.com/aphp/edsnlp/pull/289
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.16.0...v0.17.0
- Python
Published by percevalw 10 months ago
edsnlp - v0.16.0
Changelog
Added
- Hyperparameter Tuning for EDS-NLP: introduced a new script
edsnlp.tunefor hyperparameter tuning using Optuna. This feature allows users to efficiently optimize model parameters with options for single-phase or two-phase tuning strategies. Includes support for parameter importance analysis, visualization, pruning, and automatic handling of GPU time budgets. - Provided a detailed tutorial on hyperparameter tuning, covering usage scenarios and configuration options.
ScheduledOptimizer(e.g.,@core: "optimizer") now supports importing optimizers using their qualified name (e.g.,optim: "torch.optim.Adam").eds.ner_crfnow computes confidence score on spans.
Changed
- The loss of
eds.ner_crfis now computed as the mean over the words instead of the sum. This change is compatible with multi-gpu training. - Having multiple stats keys matching a batching pattern now warns instead of raising an error.
Fixed
- Support packaging with poetry 2.0
- Solve pickling issues with multiprocessing when pytorch is installed
- Allow deep attributes like
a.b.cforspan_attributesin Standoff and OMOP doc2dict converters - Fixed various aspects of stream shuffling:
- Ensure the Parquet reader shuffles the data when
shuffle=True - Ensure we don't overwrite the RNG of the data reader when calling
stream.shuffle()with no seed - Raise an error if the batch size in
stream.shuffle(batch_size=...)is not compatible with the stream
- Ensure the Parquet reader shuffles the data when
eds.splitnow keeps doc and span attributes in the sub-documents.
Pull Requests
- fix: support packaging with poetry 2.0 by @percevalw in https://github.com/aphp/edsnlp/pull/362
- Solve pickling issues with multiprocessing when pytorch is installed by @percevalw in https://github.com/aphp/edsnlp/pull/367
- Feat: add hyperparameters tuning by @LucasDedieu in https://github.com/aphp/edsnlp/pull/361
- Fix issue 368: Add
metricparameter and write optimalconfig.ymlat the end of tuning. by @LucasDedieu in https://github.com/aphp/edsnlp/pull/369 - Fix issue 370: two-phase tuning now write phase 1 frozen best values into phase 2
results_summary.txtby @LucasDedieu in https://github.com/aphp/edsnlp/pull/371 - fix: allow deep attributes in Standoff and OMOP doc2dict converters by @percevalw in https://github.com/aphp/edsnlp/pull/381
- fix: improve various aspect of stream shuffling by @percevalw in https://github.com/aphp/edsnlp/pull/380
- fix: eds.split now keeps doc and span attributes in the sub-documents by @percevalw in https://github.com/aphp/edsnlp/pull/363
- feat: allow importing optims using qualified names in ScheduledOptimizer by @percevalw in https://github.com/aphp/edsnlp/pull/383
- feat: compute eds.ner_crf loss as mean over words by @percevalw in https://github.com/aphp/edsnlp/pull/384
- Fix issue 372: resulting tuning config file now preserve comments by @LucasDedieu in https://github.com/aphp/edsnlp/pull/373
- Feat: add checkpoint management for tuning by @LucasDedieu in https://github.com/aphp/edsnlp/pull/385
- feat: add ner confidence score by @LucasDedieu in https://github.com/aphp/edsnlp/pull/387
- chore: bump version to 0.16.0 by @LucasDedieu in https://github.com/aphp/edsnlp/pull/393
New Contributors
- @LucasDedieu made their first contribution in https://github.com/aphp/edsnlp/pull/361
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.15.0...v0.16.0
- Python
Published by LucasDedieu 11 months ago
edsnlp - v0.15.0
Changelog
Added
edsnlp.data.read_parquetnow accept awork_unit="fragment"option to split tasks between workers by parquet fragment instead of row. When this is enabled, workers do not read every fragment while skipping 1 in n rows, but read all rows of 1/n fragments, which should be faster.- Accept no validation data in
edsnlp.trainscript - Log the training config at the beginning of the trainings
- Support a specific model output dir path for trainings (
output_model_dir), and whether to save the model or not (save_model) - Specify whether to log the validation results or not (
logger=False) - Added support for the CoNLL format with
edsnlp.data.read_conlland with a specificeds.conll_dict2docconverter - Added a Trainable Biaffine Dependency Parser (
eds.biaffine_dep_parser) component and metrics - New
eds.extractive_qacomponent to perform extractive question answering using questions as prompts to tag entities instead of a list of predefined labels as ineds.ner_crf.
Fixed
- Fix
join_threadmissing attribute inSimpleQueuewhen cleaning a multiprocessing executor - Support huggingface transformers that do not set
cls_token_idandsep_token_id(we now also look for these tokens in thespecial_tokens_mapandvocabmappings) - Fix changing scorers dict size issue when evaluating during training
- Seed random states (instead of using
random.RandomState()) when shuffling in data readers : this is important for- reproducibility
- in multiprocessing mode, ensure that the same data is shuffled in the same way in all workers
- Bubble BaseComponent instantiation errors correctly
- Improved support for multi-gpu gradient accumulation (only sync the gradients at the end of the accumulation), now controled by the optiona
sub_batch_sizeargument ofTrainingData. - Support again edsnlp without pytorch installed
- We now test that edsnlp works without pytorch installed
- Fix units and scales, ie 1l = 1dm3, 1ml = 1cm3
Pull Requests
- fix: check join_thread attribute in queue when cleaning mp exec by @percevalw in https://github.com/aphp/edsnlp/pull/345
- fix: support hf transformers with clstokenid and septokenid set to None by @percevalw in https://github.com/aphp/edsnlp/pull/346
- fix: changing scorers dict size issue when evaluating during training by @percevalw in https://github.com/aphp/edsnlp/pull/347
- Fix streams by @percevalw in https://github.com/aphp/edsnlp/pull/350
- Various trainer fixes by @percevalw in https://github.com/aphp/edsnlp/pull/352
- Trainable biaffine dependency parser by @percevalw in https://github.com/aphp/edsnlp/pull/353
- feat: new eds.extractive_qa component by @percevalw in https://github.com/aphp/edsnlp/pull/351
- Fix training and multiprocessing by @percevalw in https://github.com/aphp/edsnlp/pull/354
- fix: correct conversions for volumes, areas by @etienneguevel in https://github.com/aphp/edsnlp/pull/349
- chore: bump version to 0.15.0 by @percevalw in https://github.com/aphp/edsnlp/pull/355
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.14.0...v0.15.0
- Python
Published by percevalw about 1 year ago
edsnlp - v0.14.0
Changelog
Added
- Support for setuptools based projects in
edsnlp.packagecommand - Pipelines can now be instantiated directly from a config file (instead of having to cast a dict containing their arguments) by putting the @core = "pipeline" or "load" field in the pipeline section)
edsnlp.loadnow correctly takes disable, enable and exclude parameters into account- Pipeline now has a basic repr showing is base langage (mostly useful to know its tokenizer) and its pipes
- New
python -m edsnlp.evaluatescript to evaluate a model on a dataset - Sentence detection can now be configured to change the minimum number of newlines to consider a newline-triggered sentence, and disable capitalization checking.
- New
eds.splitpipe to split a document into multiple documents based on a splitting pattern (useful for training) - Allow
converterargument ofedsnlp.data.read/from_...to be a list of converters instead of a single converter - New revamped and documented
edsnlp.trainscript and API - Support YAML config files (supported only CFG/INI files before)
- Most of EDS-NLP functions are now clickable in the documentation
- ScheduledOptimizer now accepts schedules directly in place of parameters, and easy parameter selection:
ScheduledOptimizer( optim="adamw", module=nlp, total_steps=2000, groups={ "^transformer": { # lr will go from 0 to 5e-5 then to 0 for params matching "transformer" "lr": {"@schedules": "linear", "warmup_rate": 0.1, "start_value": 0 "max_value": 5e-5,}, }, "": { # lr will go from 3e-4 during 200 steps then to 0 for other params "lr": {"@schedules": "linear", "warmup_rate": 0.1, "start_value": 3e-4 "max_value": 3e-4,}, }, }, )
Changed
eds.span_context_getter's parametercontext_sentsis no longer optional and must be explicitly set to 0 to disable sentence context- In multi-GPU setups, streams that contain torch components are now stripped of their parameter tensors when sent to CPU Workers since these workers only perform preprocessing and postprocessing and should therefore not need the model parameters.
- The
batch_sizeargument ofPipelineis deprecated and is not used anymore. Use thebatch_sizeargument ofstream.map_pipelineinstead.
Fixed
- Sort files before iterating over a standoff or json folder to ensure reproducibility
- Sentence detection now correctly match capitalized letters + apostrophe
- We now ensure that the workers pool is properly closed whatever happens (exception, garbage collection, data ending) in the
multiprocessingbackend. This prevents some executions from hanging indefinitely at the end of the processing. - Propagate torch sharing strategy to other workers in the
multiprocessingbackend. This is useful when the system is running out of file descriptors andulimit -nis not an option. Torch sharing strategy can also be set via an environment variableTORCH_SHARING_STRATEGY(default isfile_descriptor, consider usingfile_systemif you encounter issues).
Data API changes
LazyCollectionobjects are now calledStreamobjects- By default,
multiprocessingbackend now preserves the order of the input data. To disable this and improve performance, usedeterministic=Falsein theset_processingmethod - :rocket: Parallelized GPU inference throughput improvements !
- For simple {pre-process → model → post-process} pipelines, GPU inference can be up to 30% faster in non-deterministic mode (results can be out of order) and up to 20% faster in deterministic mode (results are in order)
- For multitask pipelines, GPU inference can be up to twice as fast (measured in a two-tasks BERT+NER+Qualif pipeline on T4 and A100 GPUs)
- The
.map_batches,.map_pipelineand.map_gpumethods now support a specificbatch_sizeand batching function, instead of having a single batch size for all pipes - Readers now have a
loopparameter to cycle over the data indefinitely (useful for training) - Readers now have a
shuffleparameter to shuffle the data before iterating over it - In
multiprocessingmode, file based readers now read the data in the workers (was an option before) - We now support two new special batch sizes
- "fragment" in the case of parquet datasets: rows of a full parquet file fragment per batch
- "dataset" which is mostly useful during training, for instance to shuffle the dataset at each epoch.
These are also compatible in batched writer such as parquet, where each input fragment can be processed and mapped to a single matching output fragment.
- :boom: Breaking change: a
mapfunction returning a list or a generator won't be automatically flattened anymore. Useflatten()to flatten the output if needed. This shouldn't change the behavior for most users since most writers (topandas, topolars, to_parquet, ...) still flatten the output - :boom: Breaking change: the
chunk_sizeandsort_chunksare now deprecated : to sort data before applying a transformation, use.map_batches(custom_sort_fn, batch_size=...)
Training API changes
- We now provide a training script
python -m edsnlp.train --config config.cfgthat should fit many use cases. Check out the docs ! - In particular, we do not require pytorch's Dataloader for training and can rely solely on EDS-NLP stream/data API, which is better suited for large streamable datasets and dynamic preprocessing (ie different result each time we apply a noised preprocessing op on a sample).
- Each trainable component can now provide a
statsfield in itspreprocessoutput to log info about the sample (number of words, tokens, spans, ...):
- these stats are both used for batching (e.g., make batches of no more than "25000 tokens")
- for logging
- for computing correct loss means when accumulating gradients over multiple mini-mini-batches
- for computing correct loss means in multi-GPU setups, since these stats are synchronized and accumulated across GPUs
- Support multi GPU training via hugginface
accelerateand EDS-NLPStreamAPI consideration of env['WOLRDSIZE'] and env['LOCALRANK'] environment variables
Pull Requests
- Improve training tutorials by @percevalw in https://github.com/aphp/edsnlp/pull/331
- Various fixes by @percevalw in https://github.com/aphp/edsnlp/pull/332
- Multiprocessing related fixes by @percevalw in https://github.com/aphp/edsnlp/pull/333
- chore: bump version to 0.14.0 by @percevalw in https://github.com/aphp/edsnlp/pull/334
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.13.1...v0.14.0
- Python
Published by percevalw over 1 year ago
edsnlp - v0.13.1
Changelog
Added
eds.tablesaccepts a minimumtablesize (default 2) argument to reduce pollutionRuleBasedQualifiernow expose aprocessmethod that only returns qualified entities and token without actually tagging them, deferring this task to the__call__method.- Added new patterns for metastasis detection developed on CT-Scan reports.
- Added citation of articles
Fixed
- Disorder and Behavior pipes don't use a "PRESENT" or "ABSENT"
statusanymore. Instead,status=Noneby default, andent._.negationis set to True instead of settingstatusto "ABSENT". To this end, the tobacco and alcohol now use theNegationQualifierinternally. - Numbers are now only detected without trying to remove the pollution in between digits, ie
55 @ 77777could be detected as a full number before, but not anymore. - Fix fsspec open file encoding to "utf-8".
Changed
- Rename
eds.measurementstoeds.quantities - scikit-learn (used in
eds.endlines) is no longer installed by default when installingedsnlp[ml]
Pull Requests
- Remove pollution exclusion during numbers matching by @percevalw in https://github.com/aphp/edsnlp/pull/316
- Rename eds.measurements by @svittoz in https://github.com/aphp/edsnlp/pull/313
- Adding minimumtablesize argument to eds.tables by @svittoz in https://github.com/aphp/edsnlp/pull/318
- Fs encoding fix by @Aremaki in https://github.com/aphp/edsnlp/pull/320
- chore(deps): bump actions/download-artifact from 2 to 4.1.7 in /.github/workflows in the github_actions group across 1 directory by @dependabot in https://github.com/aphp/edsnlp/pull/319
- fix: skip spacy 3.8.0 due to numpy build dep by @percevalw in https://github.com/aphp/edsnlp/pull/321
- Fix behavior, disorder and qualifier pipes by @Thomzoy in https://github.com/aphp/edsnlp/pull/322
- Metastatic status by @aricohen93 in https://github.com/aphp/edsnlp/pull/308
- chore: bump version to 0.13.1 by @percevalw in https://github.com/aphp/edsnlp/pull/327
- Test 3.12 by @percevalw in https://github.com/aphp/edsnlp/pull/328
New Contributors
- @dependabot made their first contribution in https://github.com/aphp/edsnlp/pull/319
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.13.0...v0.13.1
- Python
Published by percevalw over 1 year ago
edsnlp - v0.13.0
Changelog
Added
data.set_processing(...)now expose anautocastparameter to disable or tweak the automatic casting of the tensor during the processing. Autocasting should result in a slight speedup, but may lead to numerical instability.- Use
torch.inference_modeto disable view tracking and version counter bumps during inference. - Added a new NER pipeline for suicide attempt detection
- Added date cues (regular expression matches that contributed to a date being detected) under the extension
ent._.date_cues - Added tables processing in eds.measurement
- Added 'all' as possible input in eds.measurement measurements config
- Added new units in eds.measurement
Changed
- Default to mixed precision inference
Fixed
edsnlp.load("your/huggingface-model", install_dependencies=True)now correctly resolves the python pip (especially on Colab) to auto-install the model dependencies- We now better handle empty documents in the
eds.transformer,eds.text_cnnandeds.ner_crfcomponents - Support mixed precision in
eds.text_cnnandeds.ner_crfcomponents - Support pre-quantization (<4.30) transformers versions
- Verify that all batches are non empty
- Fix
span_context_getterforcontext_words= 0,context_sents> 2 and support assymetric contexts - Don't split sentences on rare unicode symbols
- Better detect abbreviations, like
E.coli, now split as [E.,coli] and not [E,.,coli]
What's Changed
- Various ml fixes by @percevalw in https://github.com/aphp/edsnlp/pull/303
- TS by @aricohen93 in https://github.com/aphp/edsnlp/pull/269
- date cues by @cvinot in https://github.com/aphp/edsnlp/pull/265
- Fix fast inference by @percevalw in https://github.com/aphp/edsnlp/pull/305
- Fix typo in diabetes patterns by @isabelbt in https://github.com/aphp/edsnlp/pull/306
- Fix span context getter by @aricohen93 in https://github.com/aphp/edsnlp/pull/307
- Fix sentences by @percevalw in https://github.com/aphp/edsnlp/pull/310
- chore: bump version to 0.13.0 by @percevalw in https://github.com/aphp/edsnlp/pull/312
New Contributors
- @cvinot made their first contribution in https://github.com/aphp/edsnlp/pull/265
- @isabelbt made their first contribution in https://github.com/aphp/edsnlp/pull/306
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.12.3...v0.13.0
- Python
Published by percevalw over 1 year ago
edsnlp - v0.12.2
Changelog
Changed
Packages:
- Pip-installable models are now built with
hatchinstead of poetry, which allows us to exposeartifacts(weights) at the root of the sdist package (uploadable to HF) and move them inside the package upon installation to avoid conflicts. - Dependencies are no longer inferred with dill-magic (this didn't work well before anyway)
- Option to perform substitutions in the model's README.md file (e.g., for the model's name, metrics, ...)
- Huggingface models are now installed with pip editable installations, which is faster since it doesn't copy around the weights
What's Changed
- Better packages by @percevalw in https://github.com/aphp/edsnlp/pull/302
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.12.1...v0.12.2
- Python
Published by percevalw over 1 year ago
edsnlp - v0.12.1
Changelog
Added
- Added binary distribution for linux aarch64 (Streamlit's environment)
- Added new separator option in eds.table and new input check
Fixed
- Make catalogue & entrypoints compatible with py37-py312
- Check that a data has a doc before trying to use the document's
note_datetime
Pull Requests
- Fix catalogue entrypoints by @percevalw in https://github.com/aphp/edsnlp/pull/297
- Adding sep_pattern in eds.tables docstring by @svittoz in https://github.com/aphp/edsnlp/pull/286
- chore: bump version to 0.12.1 by @percevalw in https://github.com/aphp/edsnlp/pull/300
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.12.0...v0.12.1
- Python
Published by percevalw over 1 year ago
edsnlp - v0.12.0
Changelog
Added
- The
eds.transformercomponent now acceptsprompts(passed to itspreprocessmethod, see breaking change below) to add before each window of text to embed. LazyCollection.map/map_batchesnow support generator functions as arguments.- Window stride can now be disabled (i.e., stride = window) during training in the
eds.transformercomponent bytraining_stride = False - Added a new
eds.ner_overlap_scorerto evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold edsnlp.loadnow accepts EDS-NLP models from the huggingface hub 🤗 !- New
python -m edsnlp.packagecommand to package a model for the huggingface hub or pypi-like registries
Changed
- Trainable embedding components now all use
foldedtensorto return embeddings, instead of returning a tensor of floats and a mask tensor. - :boom: TorchComponent
__call__no longer applies the end to end method, and instead calls theforwardmethod directly, like all torch modules. - The trainable
eds.span_qualifiercomponent has been renamed toeds.span_classifierto reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not). omopconverter now takes thenote_datetimefield into account by default when building a documentspan._.date.to_datetime()andspan._.date.to_duration()now automatically take thenote_datetimeinto accountnlp.vocabis no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway- :boom: Major breaking change in trainable components, moving towards a more "task-centric" design:
- the
eds.transformercomponent is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via thepreprocessmethod, which now accepts more arguments than just the docs to process. - similarly the
eds.span_pooleris now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in thepreprocessmethod.
- the
Consequently, the eds.transformer and eds.span_pooler no longer accept their span_getter argument, and the eds.ner_crf, eds.span_classifier, eds.span_linker and eds.span_qualifier components now accept a context_getter argument instead, as well as a span_getter argument for the latter two. This refactoring can be summarized as follows:
```diff - eds.transformer.spangetter + eds.nercrf.contextgetter + eds.spanclassifier.contextgetter + eds.spanlinker.context_getter
- eds.spanpooler.spangetter
- eds.spanqualifier.spangetter
- eds.spanlinker.spangetter ```
and as an example for the eds.span_linker component:
diff
nlp.add_pipe(
eds.span_linker(
metric="cosine",
probability_mode="sigmoid",
+ span_getter="ents",
+ # context_getter="ents", -> by default, same as span_getter
embedding=eds.span_pooler(
hidden_size=128,
- span_getter="ents",
embedding=eds.transformer(
- span_getter="ents",
model="prajjwal1/bert-tiny",
window=128,
stride=96,
),
),
),
name="linker",
)
Fixed
edsnlp.data.read_jsonnow correctly read the files from the directory passed as an argument, and not from the parent directory.- Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc)
- Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors
Pull Requests
- Drop codecov by @percevalw in https://github.com/aphp/edsnlp/pull/292
- Fix dates by @percevalw in https://github.com/aphp/edsnlp/pull/288
- Loading models from the hf hub by @percevalw in https://github.com/aphp/edsnlp/pull/293
- Fix: only reinstall hf model when cache files are changed by @percevalw in https://github.com/aphp/edsnlp/pull/295
- feat: expose package script to cli by @percevalw in https://github.com/aphp/edsnlp/pull/294
- chore: bump version to 0.12.0 by @percevalw in https://github.com/aphp/edsnlp/pull/296
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.11.2...v0.12.0
- Python
Published by percevalw almost 2 years ago
edsnlp - v0.11.2
Changelog
Fixed
- Fix
edsnlp.utils.file_system.normalize_fs_pathfile system detection not working correctly - Improved performance of
edsnlp.datamethods over a filesystem (fsparameter)
Pull Requests
- Fix normalize fs path by @svittoz in https://github.com/aphp/edsnlp/pull/283
- Faster fs io by @percevalw in https://github.com/aphp/edsnlp/pull/285
New Contributors
- @svittoz made their first contribution in https://github.com/aphp/edsnlp/pull/283
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.11.1...v0.11.2
- Python
Published by percevalw almost 2 years ago
edsnlp - v0.11.1
Changelog
Added
- Automatic estimation of cpu count when using multiprocessing
optim.initialize()method to create optim state before the first backward pass
Changed
nlp.post_initwill not tee lazy collections anymore (useedsnlp.utils.collections.multi_teeyourself if needed)
Fixed
- Corrected inconsistencies in
eds.span_linker
Pull Requests
- Fix span linking by @percevalw in https://github.com/aphp/edsnlp/pull/282
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.11.0...v0.11.1
- Python
Published by percevalw almost 2 years ago
edsnlp - v0.11.0
Changelog
Added
- Support for a
filesystemparameter in everyedsnlp.data.read_*andedsnlp.data.write_*functions - Pipes of a pipeline are now easily accessible with
nlp.pipes.xxxinstead ofnlp.get_pipe("xxx") - Support builtin Span attributes in converters
span_attributesparameter, e.g. ```python import edsnlp
nlp = ... nlp.add_pipe("eds.sentences")
data = edsnlp.data.fromxxx(...)
data = data.mappipeline(nlp)
data.topandas(converters={"ents": {"spanattributes": ["sent.text", "start", "end"]}})
- Support assigning Brat AnnotatorNotes as span attributes: `edsnlp.data.read_standoff(..., notes_as_span_attribute="cui")`
- Support for mapping full batches in `edsnlp.processing` pipelines with `map_batches` lazy collection method:
python
import edsnlp
data = edsnlp.data.fromxxx(...)
data = data.mapbatches(lambda batch: dosomething(batch))
data.topandas()
``
- Newdata.mapgpu` method to map a deep learning operation on some data and take advantage of edsnlp multi-gpu inference capabilities
- Added average precision computation in edsnlp spanclassification scorer
- You can now add pipes to your pipeline by instantiating them directly, which comes with many advantages, such as auto-completion, introspection and type checking !
```python import edsnlp, edsnlp.pipes as eds
nlp = edsnlp.blank("eds") nlp.addpipe(eds.sentences()) # instead of nlp.addpipe("eds.sentences") ```
The previous way of adding pipes is still supported.
- New eds.span_linker deep-learning component to match entities with their concepts in a knowledge base, in synonym-similarity or concept-similarity mode.
Changed
nlp.preprocess_manynow uses lazy collections to enable parallel processing- :warning: Breaking change. Improved and simplified
eds.span_qualifier: we didn't support combination groups before, so this feature was scrapped for now. We now also support splitting values of a single qualifier between different span labels. - Optimized edsnlp.data batching, especially for large batch sizes (removed a quadratic loop)
- :warning: Breaking change. By default, the name of components added to a pipeline is now the default name defined in their class
__init__signature. For most components of EDS-NLP, this will change the name from "eds.xxx" to "xxx".
Fixed
- Flatten list outputs (such as "ents" converter) when iterating:
nlp.map(data).to_iterable("ents")is now a list of entities, and not a list of lists of entities - Allow span pooler to choose between multiple base embedding spans (as likely produced by
eds.transformer) by sorting them by Dice overlap score. - EDS-NLP does not raise an error anymore when saving a model to an already existing, but empty directory
Pull Requests
- Support for a filesystem param in all edsnlp.data readers/writers by @percevalw in https://github.com/aphp/edsnlp/pull/274
- Data fixes by @percevalw in https://github.com/aphp/edsnlp/pull/275
- Refacto span classification by @percevalw in https://github.com/aphp/edsnlp/pull/276
- Entity linking by @percevalw in https://github.com/aphp/edsnlp/pull/280
- chore: bump version to 0.11.0 by @percevalw in https://github.com/aphp/edsnlp/pull/281
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.7...v0.11.0
- Python
Published by percevalw almost 2 years ago
edsnlp - v0.10.7
Changelog
Added
- Support empty
converter(by default now) inedsnlp.datawriters (do not convert by default) - Add support for polars data import / export
- Allow kwargs in
eds.transformerto pass to the transformer model
Changed
- Saving pipelines now longer saves the
disabledstatus of the pipes (i.e., all pipes are considered "enabled" when saved). This feature was not used and causing issues when saving a model wrapped in anlp.select_pipescontext.
Fixed
- Allow missing
meta.json,tokenizerandvocabpaths when loading saved models - Save torch buffers when dumping machine learning models to disk (previous versions only saved the model parameters)
- Fix automatic
batch_sizeestimation ineds.transformerwhenmax_tokens_per_deviceis set toautoand multiple GPUs are used - Fix JSONL file parsing
Pull Requests
- Polars by @percevalw in https://github.com/aphp/edsnlp/pull/270
- Various fixes by @percevalw in https://github.com/aphp/edsnlp/pull/271
- chore: bump version to 0.10.7 by @percevalw in https://github.com/aphp/edsnlp/pull/272
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.6...v0.10.7
- Python
Published by percevalw almost 2 years ago
edsnlp - v0.10.6
What's Changed
Added
- Added
batch_by,split_into_batches_after,sort_chunks,chunk_size,disable_implicit_parallelismparameters to processing (simpleandmultiprocessing) backends to improve performance and memory usage. Sorting chunks can improve yield up to twice the speed in some cases. - The deep learning cache mechanism now supports multitask models with weight sharing in multiprocessing mode.
- Added
max_tokens_per_device="auto"parameter toeds.transformerto estimate memory usage and automatically split the input into chunks that fit into the GPU.
Changed
- Improved speed and memory usage of the
eds.text_cnnpipe by running the CNN on a non-padded version of its input: expect a speedup up to 1.3x in real-world use cases. - Deprecate the converters' (especially for BRAT/Standoff data)
bool_attributesparameter in favor of generaldefault_attributes. This new mapping describes how to set attributes on spans for which no attribute value was found in the input format. This is especially useful for negation, or frequent attributes values (e.g. "negated" is often False, "temporal" is often "present"), that annotators may not want to annotate every time. - Default
eds.ner_crfwindow is now set to 40 and stride set to 20, as it doesn't affect throughput (compared to before, window set to 20) and improves accuracy. - New default
overlap_policy='merge'option and parameter renaming ineds.span_context_getter(which replaceseds.span_sentence_getter)
Fixed
- Improved error handling in
multiprocessingbackend (e.g., no more deadlock) - Various improvements to the data processing related documentation pages
- Begin of sentence / end of sentence transitions of the
eds.ner_crfcomponent are now disabled when windows are used (e.g., neitherwindow=1equivalent to softmax andwindow=0equivalent to default full sequence Viterbi decoding) edstokenizer nows inherits fromspacy.Tokenizerto avoid typing errors- Only match 'ne' negation pattern when not part of another word to avoid false positives cases like
u[ne] cure de 10 jours - Disabled pipes are now correctly ignored in the
Pipeline.preprocessmethod - Add "eventuel*" patterns to
eds.hyphothesis
Pull Requests
- Multi head ml by @percevalw in https://github.com/aphp/edsnlp/pull/257
- Default span attributes on data loading by @percevalw in https://github.com/aphp/edsnlp/pull/258
- Disable NER CRF BOS/EOS transitions when CRF windows are enabled by @percevalw in https://github.com/aphp/edsnlp/pull/259
- Fix "eds" tokenizer base by @percevalw in https://github.com/aphp/edsnlp/pull/260
- fix: only match 'ne' negation pattern when not part of another word by @percevalw in https://github.com/aphp/edsnlp/pull/261
- Update patterns for hypothesis détection by @LaRiffle in https://github.com/aphp/edsnlp/pull/266
- Add overlappolicy='merge' option to makesentencespangetter by @percevalw in https://github.com/aphp/edsnlp/pull/262
- Fix select pipes by @percevalw in https://github.com/aphp/edsnlp/pull/267
- chore: bump version to 0.10.6 by @percevalw in https://github.com/aphp/edsnlp/pull/268
New Contributors
- @LaRiffle made their first contribution in https://github.com/aphp/edsnlp/pull/266
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.5...v0.10.6
- Python
Published by percevalw almost 2 years ago
edsnlp - v0.10.5
Changelog
Fixed
- Allow non-url paths when parquet filesystem is given
Pull Requests
- Allow non-url paths when parquet filesystem is given by @percevalw in https://github.com/aphp/edsnlp/pull/254
- Bump version to 0.10.5 by @percevalw in https://github.com/aphp/edsnlp/pull/255
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.4...v0.10.5
- Python
Published by percevalw about 2 years ago
edsnlp - v0.10.4
Changelog
Changed
- Assigning
doc._.note_datetimewill now automatically cast the value to apendulum.DateTimeobject
Added
- Support loading model from package name (e.g.,
edsnlp.load("eds_pseudo_aphp")) - Support filesystem parameter in
edsnlp.data.read_parquetandedsnlp.data.write_parquet
Fixed
- Support doc -> list converters with parquet files writer
- Fixed some OOM errors when writing many outputs to parquet files
- Both edsnlp & spacy factories are now listed when a factory lookup fails
- Fixed some GPU OOM errors with the
eds.transformerpipe when processing really long documents
Pull Requests
- ML inference fixes & features by @percevalw in https://github.com/aphp/edsnlp/pull/251
- Bump version to 0.10.4 by @percevalw in https://github.com/aphp/edsnlp/pull/252
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.3...v0.10.4
- Python
Published by percevalw about 2 years ago
edsnlp - v0.10.3
Changelog
Changed
- By default,
edsnlp.data.write_jsonwill infer if the data should be written as a single JSONL file or as a directory of JSON files, based on thepathargument being a file or not.
Fixed
- Measurements now correctly match "0.X", "0.XX", ... numbers
- Typo in "celsius" measurement unit
- Spaces and digits are now supported in BRAT entity labels
- Fixed missing 'permet pas + verb' false positive negation patterns
Pull Requests
- fix: support missing torch with spark and multiprocessing backends by @percevalw in https://github.com/aphp/edsnlp/pull/244
- Fix typo in "celcius" by @julienduquesne in https://github.com/aphp/edsnlp/pull/247
- Add pattern to measurements to catch "0.XX" numbers by @julienduquesne in https://github.com/aphp/edsnlp/pull/246
- Fix for (#242): connectors not working brat json training issues by @percevalw in https://github.com/aphp/edsnlp/pull/248
- Handle 'ne permet pas + verb' false positive negation patterns by @percevalw in https://github.com/aphp/edsnlp/pull/249
- Bump version to 0.10.3 by @percevalw in https://github.com/aphp/edsnlp/pull/250
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.2...v0.10.3
- Python
Published by percevalw about 2 years ago
edsnlp - v0.10.2
Changelog
Changed
eds.span_qualifierqualifiers argument now automatically adds the underscore prefix to qualifiers if not present
Fixed
- Fix imports of components declared in
spacy_factoriesentry points - Support
pendulumv3 AsListerrors are now correctly reportededs.span_qualifiersaved configuration duringto_diskis now longer null
Pull Requests
- fix: use spacy entry points for missing factories by @percevalw in https://github.com/aphp/edsnlp/pull/238
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.1...v0.10.2
- Python
Published by percevalw about 2 years ago
edsnlp - v0.10.1
Changelog
Changed
- Small regex matching performance improvement, up to 1.25x faster (e.g.
eds.measurements)
Fixed
- Microgram scale is now correctly 1/1000g and inverse meter now 1/100 inverse cm. "cac" and "goutte" units have been fixed as well.
- We now isolate some of edsnlp components (trainable pipes that require ml dependencies)
in a new
edsnlp_factoriesentry points to prevent spacy from auto-importing them. - TNM scores followed by a space are now correctly detected
- Removed various short TNM false positives (e.g., "PT" or "a T")
- The Span value extension is not more forcibly overwritten, and user assigned values are returned by
Span._.valuein priority, before the aggregatedspan._.get(span.label_)getter result (#220) - Enable mmap during multiprocessing model transfers
RegexMatchernow supports all alignment modes (strict,expand,contract) and better handles partial doc matching (#201).on_ent_only=False/Trueis now supported again in qualifier pipes (e.g., "eds.negation", "eds.hypothesis", ...)
Pull Requests
- fix scales by @ycattan in https://github.com/aphp/edsnlp/pull/231
- Isolate edsnlp entry points to prevent auto-import by spacy by @percevalw in https://github.com/aphp/edsnlp/pull/235
- fix volume units "goutte" and "cac" by @ycattan in https://github.com/aphp/edsnlp/pull/233
- Detect tnm entities followed by a space by @percevalw in https://github.com/aphp/edsnlp/pull/229
- Enable mmap during multiprocessing model transfers by @percevalw in https://github.com/aphp/edsnlp/pull/236
- Compatible Span._.value extension by @percevalw in https://github.com/aphp/edsnlp/pull/228
- Support all alignment modes in regex matching & partial doc matching by @percevalw in https://github.com/aphp/edsnlp/pull/230
- Fix
span_getterargument for qualifiers by @Thomzoy in https://github.com/aphp/edsnlp/pull/223 - Bump version to 0.10.1 by @percevalw in https://github.com/aphp/edsnlp/pull/237
New Contributors
- @ycattan made their first contribution in https://github.com/aphp/edsnlp/pull/231
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.10.0...v0.10.1
- Python
Published by percevalw about 2 years ago
edsnlp - v0.10.0
Changelog
Added
- New add unified
edsnlp.dataapi (json, brat, spark, pandas) and LazyCollection object to efficiently read / write data from / to different formats & sources. - New unified processing API to select the execution execution backends via
data.set_processing(...) - The training scripts can now use data from multiple concatenated adapters
- Support quantized transformers (compatible with multiprocessing as well !)
Changed
edsnlp.pipelineshas been renamed toedsnlp.pipes, but the old name is still available for backward compatibility- Pipes (in
edsnlp/pipes) are now lazily loaded, which should improve the loading time of the library. to_diskmethods can now return a config to override the initial config of the pipeline (e.g., to load a transformer directly from the path storing its fine-tuned weights)- The
eds.tokenizertokenizer has been added to entry points, making it accessible from the outside - Deprecate old connectors (e.g. BratDataConnector) in favor of the new
edsnlp.dataAPI - Deprecate old
pipewrapper in favor of the new processing API
Fixed
- Support for pydantic v2
- Support for python 3.11 (not ci-tested yet)
Pull Requests
- Fix matcher assigns by @percevalw in https://github.com/aphp/edsnlp/pull/222
- Refactor to use Pytorch for training models by @percevalw in https://github.com/aphp/edsnlp/pull/202
- Relieve dependency constraints by @percevalw in https://github.com/aphp/edsnlp/pull/227
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.9.1...v0.10.0
- Python
Published by percevalw about 2 years ago
edsnlp - v0.10.0beta1
Changelog
Large refacto of EDS-NLP to allow training models and performing inference using PyTorch as the deep-learning backend. Rather than a mere wrapper of Pytorch using spaCy, this is a new framework to build hybrid multi-task models.
To achieve this, instead of patching spaCy's pipeline, a new pipeline was implemented in a similar fashion to aphp/edspdf#12. The new pipeline tries to preserve the existing API, especially for non-machine learning uses such as rule-based components. This means that users can continue to use the library in the same way as before, while also having the option to train models using PyTorch. We still use spaCy data structures such as Doc and Span to represent the texts and their annotations.
Otherwise, changes should be transparent for users that still want to use spacy pipelines
with nlp = spacy.blank('eds'). To benefit from the new features, users should use
nlp = edsnlp.blank('eds') instead.
Added
- New pipeline system available via
edsnlp.blank('eds')(instead ofspacy.blank('eds')) - Use the confit package to instantiate components
- Training script with Pytorch only (
tests/training/) and tutorial - New trainable embeddings:
eds.transformer,eds.text_cnn,eds.span_poolerembedding contextualizer pipes - Re-implemented the trainable NER component and trainable Span qualifier with the new
system under
eds.ner_crfandeds.span_classifier - New efficient implementation for eds.transformer (to be used in place of spacy-transformer)
Changed
- Pipe registering:
Language.factory->edsnlp.registry.factory.registervia confit - Lazy loading components from their entry point (had to patch spacy.Language.init) to avoid having to wrap every import torch statement for pure rule-based use cases. Hence, torch is not a required dependency
Pull Requests
This pre-release is tracked in #202.
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.9.1...v0.10.0beta1
- Python
Published by percevalw over 2 years ago
edsnlp - v0.9.1
Changelog
Changed
- Improve negation patterns
- Abstent disorders now set the negation to True when matched as
ABSENT - Default qualifier is now
Noneinstead ofFalse(empty string)
Fixed
span_getteris not incompatible with onentsonly anymoreContextualMatchernow supports empty matches (e.g. lookahead/lookbehind) inassignpatterns
Pull Requests
- Fix negations by @percevalw in https://github.com/aphp/edsnlp/pull/216
- Chore: bump version to 0.9.1 by @percevalw in https://github.com/aphp/edsnlp/pull/218
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.9.0...v0.9.1
- Python
Published by percevalw over 2 years ago
edsnlp - v0.9.0
Changelog
Added
- New
to_durationmethod to convert an absolute date into a date relative to the note_datetime (or None)
Changes
- Input and output of components are now specified by
span_getterandspan_setterarguments. - :boom: Score / disorders / behaviors entities now have a fixed label (passed as an argument), instead of being dynamically set from the component name. The following scores may have a different name than the current one in your pipelines:
eds.emergency.gemsa→emergency_gemsaeds.emergency.ccmu→emergency_ccmueds.emergency.priority→emergency_priorityeds.charlson→charlsoneds.elston_ellis→elston_elliseds.SOFA→sofaeds.adicap→adicapeds.measuremets→size,weight, ... instead ofeds.size,eds.weight, ...
eds.datesnow separate dates from durations. Each entity has its own label:spans["dates"]→ entities labelled asdatewith aspan._.dateparsed objectspans["durations"]→ entities labelled asdurationwith aspan._.durationparsed object
- the "relative" / "absolute" / "duration" mode of the time entity is now stored in
the
modeattribute of thespan._.date/duration - the "from" / "until" period bound, if any, is now stored in the
span._.date.boundattribute to_datetimenow only return absolute dates, converts relative dates into absolute ifdoc._.note_datetimeis given, and None otherwise
Fixed
export_to_bratissue with spans of entities on multiple lines.
Pull Requests
- Fix exporttobrat when there are spaces before new lines by @TheooJ in https://github.com/aphp/edsnlp/pull/211
- Refacto of the extensions by @percevalw in https://github.com/aphp/edsnlp/pull/213
- chore: bump version to 0.9.0 by @percevalw in https://github.com/aphp/edsnlp/pull/215
New Contributors
- @TheooJ made their first contribution in https://github.com/aphp/edsnlp/pull/211
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.8.1...v0.9.0
- Python
Published by percevalw over 2 years ago
edsnlp - v0.8.0
Changelog
Added
- New trainable component for multi-label, multi-class span qualification (any attribute/extension)
- Add range measurements (like
la tumeur fait entre 1 et 2 cm) toeds.measurementsmatcher - Add
eds.CKDcomponent - Add
eds.COPDcomponent - Add
eds.alcoholcomponent - Add
eds.cerebrovascular_accidentcomponent - Add
eds.congestive_heart_failurecomponent - Add
eds.connective_tissue_diseasecomponent - Add
eds.dementiacomponent - Add
eds.diabetescomponent - Add
eds.hemiplegiacomponent - Add
eds.leukemiacomponent - Add
eds.liver_diseasecomponent - Add
eds.lymphomacomponent - Add
eds.myocardial_infarctioncomponent - Add
eds.peptic_ulcer_diseasecomponent - Add
eds.peripheral_vascular_diseasecomponent - Add
eds.solid_tumorcomponent - Add
eds.tobaccocomponent - Add
eds.spaces(oreds.normalizerwithspaces=True) to detect space tokens, and addignore_space_tokenstoEDSPhraseMatcherandSimstringMatcherto skip them - Add
ignore_space_tokensoption in most components eds.tables: new pipeline to identify formatted tables- New
merge_modeparameter ineds.measurementsto normalize existing entities or detect measures only inside existing entities - Tokenization exceptions (
Mr.,Dr.,Mrs.) and non end-of-sentence periods are now tokenized with the next letter in theedstokenizer
Changed
- Disable
EDSMatcherpreprocessing auto progress tracking by default - Moved dependencies to a single pyproject.toml: support for
pip install -e '.[dev,docs,setup]' - ADICAP matcher now allow dot separators (e.g.
B.H.HP.A7A0)
Fixed
- Abbreviation and number tokenization issues in the
edstokenizer eds.adicap: reparsed the dictionnary used to decode the ADICAP codes (some of them were wrongly decoded)- Fix build for python 3.9 on Mac M1/M2 machines.
What's changed
Pull Requests
- docs: mention INRIA in the acknowledgment by @percevalw in https://github.com/aphp/edsnlp/pull/170
- Umls fixes by @percevalw in https://github.com/aphp/edsnlp/pull/183
- fix typo by @gammaeva in https://github.com/aphp/edsnlp/pull/179
- add link and definiton for sofa in documentation by @strayMat in https://github.com/aphp/edsnlp/pull/182
- CI fail exploration by @Thomzoy in https://github.com/aphp/edsnlp/pull/189
- Repare parsing errors of the ADICAP dict by @etienneguevel in https://github.com/aphp/edsnlp/pull/187
- Move dependencies to pyproject.toml by @percevalw in https://github.com/aphp/edsnlp/pull/190
- Add tokenization exceptions and detect some false positive EOS by @percevalw in https://github.com/aphp/edsnlp/pull/192
- Bump version to 0.8.0 by @percevalw in https://github.com/aphp/edsnlp/pull/194
- Update docs by @percevalw in https://github.com/aphp/edsnlp/pull/196
- Ignore space tokens by @percevalw in https://github.com/aphp/edsnlp/pull/198
- pipe tables by @aricohen93 in https://github.com/aphp/edsnlp/pull/180
- Range measurements by @percevalw in https://github.com/aphp/edsnlp/pull/195
- SpanQualifier trainable component by @percevalw in https://github.com/aphp/edsnlp/pull/193
- 18 pipes from the Charlson Comorbidity Index by @Thomzoy in https://github.com/aphp/edsnlp/pull/205
- Bump version to v0.8.0 by @percevalw in https://github.com/aphp/edsnlp/pull/209
New Contributors
- @gammaeva made their first contribution in https://github.com/aphp/edsnlp/pull/179
- @strayMat made their first contribution in https://github.com/aphp/edsnlp/pull/182
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.4...v0.8.0
- Python
Published by percevalw over 2 years ago
edsnlp - v0.7.4
Changelog
Added
eds.history: Add the option to consider only the closest dates in the sentence (dates inside the boundaries and if there is not, it takes the closest date in the entire sentence).eds.negation: It takes into account following past participates and preceding infinitives.eds.hypothesis: It takes into account following past participates hypothesis verbs.eds.negation&eds.hypothesis: Introduce new patterns and remove unnecessary patterns.eds.dates: Add a pattern for preceding relative dates (ex: l'embolie qui est survenue à 10 jours).- Improve patterns in the
eds.pollutioncomponent to account for multiline footers - Add
QuickExampleobject to quickly try a pipeline. - Add UMLS terminology matcher
eds.umls - New
RegexMatchermethod to create spans from groupdicts - New
eds.datesoption to disable time detection
Changed
- Improve date detection by removing false positives
Fixed
eds.hypothesis: Remove too generic patterns.EDSTokenizer: It now tokenizes"rechereche d'"as["recherche", "d'"], instead of["recherche", "d", "'"].- Fix small typos in the documentation and in the docstring.
- Harmonize processing utils (distributed custom_pipe) to have the same API for Pandas and Pyspark
- Fix BratConnector file loading issues with complex file hierarchies
Pull Requests
- 👓 Feedbacks from EDS-TeVa study by @Aremaki in https://github.com/aphp/edsnlp/pull/157
- feat: :stethoscope: Update negation and hypothesis pipelines by @Aremaki in https://github.com/aphp/edsnlp/pull/162
- Harmonize processing utils by @aricohen93 in https://github.com/aphp/edsnlp/pull/160
- Update pattern footer (pollution) by @aricohen93 in https://github.com/aphp/edsnlp/pull/159
- feat: add UMLS terminology (#147) by @percevalw in https://github.com/aphp/edsnlp/pull/165
- Relax pydantic version constraints by @percevalw in https://github.com/aphp/edsnlp/pull/167
- Allow back spacy dot components for backward compatibility by @percevalw in https://github.com/aphp/edsnlp/pull/152
- Update docs by @percevalw in https://github.com/aphp/edsnlp/pull/168
- Bump version to 0.7.3 by @percevalw in https://github.com/aphp/edsnlp/pull/169
- Quick example by @Thomzoy in https://github.com/aphp/edsnlp/pull/166
- Update index.md by @Thomzoy in https://github.com/aphp/edsnlp/pull/171
- Fix brat file path search for complex file hierarchies by @percevalw in https://github.com/aphp/edsnlp/pull/172
- Improve dates by @percevalw in https://github.com/aphp/edsnlp/pull/149
- Bump version to 0.7.4 by @percevalw in https://github.com/aphp/edsnlp/pull/173
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.2...v0.7.4
- Python
Published by percevalw about 3 years ago
edsnlp - v0.7.2
Changelog
Added
- Improve the
eds.historycomponent by taking into account the date extracted fromeds.datescomponent. - New pop up when you click on the copy icon in the termynal widget (docs).
- Add NER
eds.elston-ellispipeline to identify Elston Ellis scores - Add
flags=re.MULTILINEtoeds.pollutionand change pattern of footer
Fixed
- Remove the warning in the
eds.sectionswheneds.normalizeris in the pipe. - Fix filter_spans for strictly nested entities
- Fill eds.remove-lowercase "assign" metadata to run the pipeline during EDSPhraseMatcher preprocessing
Pull Requests
- Update patterns pollution by @aricohen93 in https://github.com/aphp/edsnlp/pull/145
- feat: :sparkles: Improve
eds.historycomponent witheds.datesby @Aremaki in https://github.com/aphp/edsnlp/pull/144 - Small fixes by @percevalw in https://github.com/aphp/edsnlp/pull/146
- Elston and Ellis by @etienneguevel in https://github.com/aphp/edsnlp/pull/148
- Fix setup.py by @percevalw in https://github.com/aphp/edsnlp/pull/151
- Patch patterns norm by @aricohen93 in https://github.com/aphp/edsnlp/pull/150
- Bump version to 0.7.2 by @percevalw in https://github.com/aphp/edsnlp/pull/153
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.1...v0.7.2
- Python
Published by percevalw over 3 years ago
edsnlp - v0.7.1
Changelog
Added
- Add new patterns (footer, web entities, biology tables, coding sections) to pipeline normalisation (pollution)
Changed
- Improved TNM detection algorithm
- Account for more modifiers in ADICAP codes detection
Fixed
- Add nephew, niece and daughter to family qualifier patterns
- EDSTokenizer (
spacy.blank('eds')) now recognizes non-breaking whitespaces as spaces and does not split float numbers eds.datespipeline now allows new lines as space separators in dates
Pull Requests
- add: new patterns to pollution by @Thomzoy in https://github.com/aphp/edsnlp/pull/132
- docs: fix cim10 docs by @percevalw in https://github.com/aphp/edsnlp/pull/130
- Remove print statement by @Thomzoy in https://github.com/aphp/edsnlp/pull/133
- fix: param sampling AdicapCode by @etienneguevel in https://github.com/aphp/edsnlp/pull/131
- Add nephew, niece and daughter to family qualifier patterns by @julienduquesne in https://github.com/aphp/edsnlp/pull/135
- Modification of the TNM ner by @etienneguevel in https://github.com/aphp/edsnlp/pull/136
- modification of the ADICAP ner by @etienneguevel in https://github.com/aphp/edsnlp/pull/137
- EDSTokenizer: split on non-breaking spaces and don't split float numbers by @percevalw in https://github.com/aphp/edsnlp/pull/141
- Allow newlines in dates by @percevalw in https://github.com/aphp/edsnlp/pull/142
- new pattern norm pollution by @aricohen93 in https://github.com/aphp/edsnlp/pull/139
- Bump version to 0.7.1 by @percevalw in https://github.com/aphp/edsnlp/pull/143
New Contributors
- @etienneguevel made their first contribution in https://github.com/aphp/edsnlp/pull/131
- @julienduquesne made their first contribution in https://github.com/aphp/edsnlp/pull/135
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.7.0...v0.7.1
- Python
Published by percevalw over 3 years ago
edsnlp - v0.7.0
Changelog
Added
- New nested NER trainable
nested_nerpipeline component - Support for nested entities and attributes in BratDataConnector
- Pytorch wrappers and experimental training utils
- Add attribute
sectionto entities - Add new cases for separator pattern when components of the TNM score are separated by a forward slash
- Add NER
eds.adicappipeline to identify ADICAP codes
Changed
- Update of the
ContextualMatcher(and all pipelines depending on it), rendering it more flexible to use - Rename R component of score TNM as "resection_completeness"
Fixed
- Prevent section titles from capturing surrounding tokens, causing overlaps (#113)
- Enhance existing patterns for section detection and add patterns for previously ignored sections (introduction, evolution, modalites de sortie, vaccination) .
- Fix explain mode, which was always triggered, in
eds.historyfactory. - Fix test in
eds.sections. Previously, no check was done - Remove SOFA scores spurious span suffixes
Pull requests
- Change links to streamlit demo by @percevalw in https://github.com/aphp/edsnlp/pull/111
- Restore demo links by @percevalw in https://github.com/aphp/edsnlp/pull/112
- Prevent section titles from capturing surrounding tokens by @percevalw in https://github.com/aphp/edsnlp/pull/114
- Section upgrade by @paul-bssr in https://github.com/aphp/edsnlp/pull/115
- Nested NER trainable pipeline component by @percevalw in https://github.com/aphp/edsnlp/pull/84
- Fix
historyfactory parameter type by @clementjumel in https://github.com/aphp/edsnlp/pull/117 - Rename R component (TNM) by @aricohen93 in https://github.com/aphp/edsnlp/pull/119
- Update separator pattern score TNM by @aricohen93 in https://github.com/aphp/edsnlp/pull/121
- add section info to entities by @aricohen93 in https://github.com/aphp/edsnlp/pull/120
- Adicap pipeline by @aricohen93 in https://github.com/aphp/edsnlp/pull/123
- ContextualMatcher + ADICAP Update by @Thomzoy in https://github.com/aphp/edsnlp/pull/124
- fix: handle single entity in contextual matcher by @Thomzoy in https://github.com/aphp/edsnlp/pull/126
- Adicap model by @percevalw in https://github.com/aphp/edsnlp/pull/127
- chore: bump version to 0.7.0 by @percevalw in https://github.com/aphp/edsnlp/pull/125
- v0.7.0 + fixed package_data by @percevalw in https://github.com/aphp/edsnlp/pull/129
New Contributors
- @paul-bssr made their first contribution in https://github.com/aphp/edsnlp/pull/115
- @clementjumel made their first contribution in https://github.com/aphp/edsnlp/pull/117
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.6.2...v0.7.0
- Python
Published by percevalw over 3 years ago
edsnlp - v0.6.2
Changelog
Added
- New
SimstringMatchermatcher to perform fuzzy term matching, andalgorithmparameter in terminology components andeds.matchercomponent
Changed
- Add consultation date pattern "CS", and False Positive patterns for dates (namely phone numbers and pagination).
- Update the pipeline score
eds.TNM. Now it is possible to return a dictionary where the results are eitherstrorintvalues
Fixed
- Add new patterns to the negation qualifier
- Numpy header issues with binary distributed packages
- Simstring dependency on Windows
Pull Requests
- chore: add acknowledgement by @bdura in https://github.com/aphp/edsnlp/pull/102
- TNM by @aricohen93 in https://github.com/aphp/edsnlp/pull/103
- fix: eds.sentences behaviour with dates by @bdura in https://github.com/aphp/edsnlp/pull/99
- Add consultation date pattern and date False Positive by @JCharline in https://github.com/aphp/edsnlp/pull/107
- Simstring by @percevalw in https://github.com/aphp/edsnlp/pull/94
- Fix numpy header issues with binary packages by @percevalw in https://github.com/aphp/edsnlp/pull/109
- fix: add "non" preceding pattern by @bdura in https://github.com/aphp/edsnlp/pull/105
- Bump version to v0.6.2 by @percevalw in https://github.com/aphp/edsnlp/pull/110
New Contributors
- @JCharline made their first contribution in https://github.com/aphp/edsnlp/pull/107
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.6.1...v0.6.2
- Python
Published by percevalw over 3 years ago
edsnlp - v0.6.1
Changelog
Added
- Now possible to provide regex flags when using the RegexMatcher
- New
ContextualMatcherpipe, aiming at replacing theAdvancedRegexpipe. - New
as_entsparameter foreds.dates, to save detected dates as entities
Changed
- Faster
eds.sentencespipeline component with Cython - Bump version of Pydantic in
requirements.txtto 1.8.2 to handle an incompatibility with the ContextualMatcher - Optimise space requirements by using
.csv.gzcompression for verbs
Pull Requests
- chore: bump version to 0.6.0 by @percevalw in https://github.com/aphp/edsnlp/pull/88
- Fix norm and to_datetime dates methods by @percevalw in https://github.com/aphp/edsnlp/pull/92
- SentenceSegmenter speed-up by @percevalw in https://github.com/aphp/edsnlp/pull/95
- Contextual matcher by @Thomzoy in https://github.com/aphp/edsnlp/pull/93
- bump pydantic version to minimal 1.8.2 by @Thomzoy in https://github.com/aphp/edsnlp/pull/96
- correct typo by @aricohen93 in https://github.com/aphp/edsnlp/pull/98
- Bump version to v0.6.1 by @bdura in https://github.com/aphp/edsnlp/pull/101
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.6.0...v0.6.1
- Python
Published by bdura over 3 years ago
edsnlp - v0.6.0
What's Changed
- Add new pattern for dates pipeline by @aricohen93 in https://github.com/aphp/edsnlp/pull/74
- Simple terminology matcher by @bdura in https://github.com/aphp/edsnlp/pull/75
- Force batch size of 2000 when distributing pipe by @Thomzoy in https://github.com/aphp/edsnlp/pull/73
- Add CIM10 terminology by @bdura in https://github.com/aphp/edsnlp/pull/77
- New NER drugs pipeline by @scossin in https://github.com/aphp/edsnlp/pull/58
- Fix resources by @bdura in https://github.com/aphp/edsnlp/pull/79
- Improve dates by @aricohen93 in https://github.com/aphp/edsnlp/pull/80
- Miscellaneous changes to the documentation and changelog by @bdura in https://github.com/aphp/edsnlp/pull/78
- Hot fix distributed pipe, default extension value by @Aremaki in https://github.com/aphp/edsnlp/pull/85
- Remove trailing spaces on get_text function by @Thomzoy in https://github.com/aphp/edsnlp/pull/86
- Measurements complete rewamp by @percevalw and @keyber in https://github.com/aphp/edsnlp/pull/21
New Contributors
- @scossin made their first contribution in https://github.com/aphp/edsnlp/pull/58
- @Aremaki made their first contribution in https://github.com/aphp/edsnlp/pull/85
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.3...v0.6.0
- Python
Published by percevalw over 3 years ago
edsnlp - v0.5.3
Changelog
Added
- Support for strings in the example utility
- TNM detection and normalisation with the
eds.TNMpipeline - Support for arbitrary callback for Pandas multiprocessing, with the
callbackargument
Pull requests
- Bump to version v0.5.2 by @bdura in https://github.com/aphp/edsnlp/pull/71
- Add generic callback for multiprocessing by @bdura in https://github.com/aphp/edsnlp/pull/57
- Add TNM detection and normalisation pipeline by @bdura in https://github.com/aphp/edsnlp/pull/56
- chore: bump version to 0.5.3 by @bdura in https://github.com/aphp/edsnlp/pull/72
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.2...v0.5.3
- Python
Published by bdura almost 4 years ago
edsnlp - v0.5.2
Changelog
Added
- Support for chained attributes in the
processingpipelines - Colour utility with the category20 colour palette
Fixed
- Correct a REGEX on the date detector (both
novandnov.are now detected, as all other months)
Pull requests
- Fix documentation for handling multiple texts by @bdura in https://github.com/aphp/edsnlp/pull/53
- feat: allow recursive attributes in processing by @Thomzoy in https://github.com/aphp/edsnlp/pull/54
- Add colour utility by @bdura in https://github.com/aphp/edsnlp/pull/55
- Update doc citation endlines by @aricohen93 in https://github.com/aphp/edsnlp/pull/60
- Correct regex on date by @gozat in https://github.com/aphp/edsnlp/pull/70
- Bump to version v0.5.2 by @bdura in https://github.com/aphp/edsnlp/pull/71
New Contributors
- @aricohen93 made their first contribution in https://github.com/aphp/edsnlp/pull/60
- @gozat made their first contribution in https://github.com/aphp/edsnlp/pull/70
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.1...v0.5.2
- Python
Published by bdura almost 4 years ago
edsnlp - v0.5.1
What's Changed
- Use constrained cibuildwheel to compile wheels by @bdura in https://github.com/aphp/edsnlp/pull/50
- Fix issue with Numpy and bump to v0.5.1 by @bdura in https://github.com/aphp/edsnlp/pull/52
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.5.0...v0.5.1
- Python
Published by bdura almost 4 years ago
edsnlp - v0.5.0
What's Changed
- Reimplementation of the EDSPhraseMatcher in Cython by @percevalw in https://github.com/aphp/edsnlp/pull/43, with a x15 speed increase
- Revamp of the date pipeline by @keyber in https://github.com/aphp/edsnlp/pull/22
- New EDS Language (
spacy.blank("eds")) by @percevalw in https://github.com/aphp/edsnlp/pull/34 - Test code blocs in documentation by @bdura in https://github.com/aphp/edsnlp/pull/44
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.4.4...v0.5.0
- Python
Published by bdura almost 4 years ago
edsnlp - v0.4.4
What's Changed
- Add
measurespipeline - Cap Jinja2 version to fix mkdocs
- Adding the possibility to add context in the processing module
- Improve the speed of char replacement pipelines (accents and quotes)
- Improve the speed of the regex matcher
- Python
Published by percevalw almost 4 years ago
edsnlp - v0.4.3
What's Changed
- Demo: update dataframe representation by @bdura in https://github.com/aphp/edsnlp/pull/25
- fix: regex matching on spans by @percevalw in https://github.com/aphp/edsnlp/pull/26
New Contributors
- @percevalw made their first contribution in https://github.com/aphp/edsnlp/pull/26
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.4.2...v0.4.3
- Python
Published by percevalw almost 4 years ago
edsnlp - v0.4.1
What's Changed
- Deploy Streamlit demo and
eds.covidpipeline component by @bdura in https://github.com/aphp/edsnlp/pull/2 - Add Github CI by @bdura in https://github.com/aphp/edsnlp/pull/1
- Skip no-commit-to-branch pre-commit hook by @bdura in https://github.com/aphp/edsnlp/pull/8
- feat: matrices testing by @Thomzoy in https://github.com/aphp/edsnlp/pull/12
- Add codecov by @bdura in https://github.com/aphp/edsnlp/pull/11
- fix: gh-action strategy by @Thomzoy in https://github.com/aphp/edsnlp/pull/13
- Update documentation by @bdura in https://github.com/aphp/edsnlp/pull/14
- Documentation by @bdura in https://github.com/aphp/edsnlp/pull/15
- Update coverage by @bdura in https://github.com/aphp/edsnlp/pull/16
- Koalas support by @Thomzoy in https://github.com/aphp/edsnlp/pull/10
New Contributors
- @Thomzoy made their first contribution in https://github.com/aphp/edsnlp/pull/12
Full Changelog: https://github.com/aphp/edsnlp/compare/v0.4.0...v0.4.1
- Python
Published by bdura almost 4 years ago