Recent Releases of ktrain

ktrain - v0.41.4

0.41.4 (2024-06-18)

new:

  • N/A

changed

  • N/A

fixed:

  • Remove references to paper-qa (#530)
  • Reduce memory footprint of TopicModel.filter (#531)

- Jupyter Notebook
Published by amaiya over 1 year ago

ktrain - v0.41.3

0.41.3 (2024-04-05)

new:

  • N/A

changed

  • N/A

fixed:

  • Removed tf_keras as dependencies due to issues in varioius dependencies related to TF 2.16 and allow TF to prompt user for it (#528)
  • Removed auto-setting TF_USE_LEGACY_KERAS, as it causes problems in tensorflow<2.16 (#528)
  • Unpin transformers due to incompatibilites with different versions of TensorFlow.

- Jupyter Notebook
Published by amaiya almost 2 years ago

ktrain - 0.41.2

0.41.2 (2024-03-11)

new:

  • N/A

changed

  • N/A

fixed:

  • Added tf_keras to dependencies and set USE_TF_TF_USE_LEGACY_KERAS (#525)

- Jupyter Notebook
Published by amaiya almost 2 years ago

ktrain - v0.41.1

0.41.1 (2024-03-02)

new:

  • N/A

changed

  • N/A

fixed:

  • temporarily pinning to transformers==4.37.2 due to issue (#523) on Google Colab

- Jupyter Notebook
Published by amaiya almost 2 years ago

ktrain - v0.41.0

0.41.0 (2024-02-20)

new:

  • N/A

changed

  • Breaking Change: Removed the ktrain.text.qa.generative_qa module. Users should use our OnPrem.LLM for generative question-answering (#522)

fixed:

  • use arrays in TextPredictor due to possible issues with tf.Dataset (#521)

- Jupyter Notebook
Published by amaiya about 2 years ago

ktrain - v0.40.0

0.40.0 (2024-01-27)

new:

  • N/A

changed

  • Changed shallownlp.classifier API with respect to hyperparameters and defaults

fixed:

  • Ensure weight files in checkpoint folder have val_loss in file name (#519)

- Jupyter Notebook
Published by amaiya about 2 years ago

ktrain - v0.39.0

0.39.0 (2023-11-18)

new:

  • N/A

changed

  • Changes to custom eli5 and stellargraph to support Python 3.11 (#515)

fixed:

  • Switch from unmaintained cchardet to charset-normalizer (#512)
  • Use textract-py3 instead of textract (#511)

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.38.0

0.38.0 (2023-09-05)

new:

  • N/A

changed

  • Breaking Change: The generative_ai.LLM class replaces generative_ai.GenerativeAI is now powered by our OnPrem.LLM package (see example notebook).
  • GenerativeQA now recomends langchain==0.0.240

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.6

0.37.6 (2023-07-23)

new:

  • N/A

changed

  • N/A

fixed:

  • Removed pin to paper-qa==2.1.1 due to issue in latest langchain release. Added notification to install langchain==0.0.180

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.5

0.37.5 (2023-07-22)

new:

  • N/A

changed

  • N/A

fixed:

  • Removed pin on scikit-learn, as eli5-tf repo was updated to support scikit-learn>=1.3 (#505)
  • pin to paper-qa==2.1.1 due to breaking changes (#506)

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.4

0.37.4 (2023-07-22)

new:

  • N/A

changed

  • N/A

fixed:

  • Temporarily pin to scikit-learn<1.3 to avoid eli5 import error (#505)
  • Temporarily changed generative_qa imports to avoid `OPENAIAPIKEY error (#506)

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.3

0.37.3 (2023-07-22)

new:

  • N/A

changed

  • N/A

fixed:

  • fix eda.py topic visualization to work with bokeh>=3.0.0 (#504)

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.2

0.37.2 (2023-06-14)

new:

  • N/A

changed

  • text.models, vision.models, and tabular.models now all automatically set metrics to use binary_accuracy for multilabel problems (#500)

fixed:

  • fix validate to support multilabel classification problems (#498)
  • add a warning to TransformerPreprocessor.get_classifier to use binary_accuracy for multilabel problems (#498)

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.1

0.37.1 (2023-06-05)

new:

  • Supply arguments to generate in TransformerSummarizer.summarize

changed

  • N/A

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 2 years ago

ktrain - v0.37.0

0.37.0 (2023-05-11)

new:

  • Support for Generative Question-Answering powered by OpenAI models, LangChain, and Paper-QA. Ask questions to any set of documents and get back answers with citations to where the answer was found in your documents.

changed

  • N/A

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.36.1

0.36.1 (2023-05-09)

new:

  • N/A

changed

  • N/A

fixed:

  • resolved issue with using DeBERTa embedding models with NER (#492)

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.36.0

0.36.0 (2023-04-21)

new:

  • easy-to-use-wrapper for sentiment analysis

changed

  • N/A

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.35.1

0.35.1 (2023-04-02)

new:

  • N/A

changed

  • N/A

fixed:

  • Ensure do_sample=True for GenerativeAI

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.35.0

0.35.0 (2023-04-01)

new:

  • Support for generative AI with few-shot and zero-shot prompting using a GPT-based model that can run on your own machine.

changed

  • N/A

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.34.0

0.34.0 (2023-03-30)

new:

  • Support for LexRank summarization

changed

  • N/A

fixed:

  • Bug fix in dataset module (#486)

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.33.4

0.33.4 (2023-03-22)

new:

  • N/A

changed

  • Added verbose parameter to predict* methods in all Predictor classes

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.33.3

0.33.3 (2023-03-17)

new:

  • N/A

changed

  • Added exclude_unigrams argument to text.kw module and support unigram extraction when noun_phrases is selected

fixed:

  • explicitly set num_beams and early_stopping for generate in ktrain.text.translation.core to prevent errors in transformers>=4.26.0

- Jupyter Notebook
Published by amaiya almost 3 years ago

ktrain - v0.33.2

0.33.2 (2023-02-06)

new:

  • N/A

changed

  • N/A

fixed:

  • fixed typo in translation module (#479)
  • removed superfluous warning when inspecting transformer model signature

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.33.1

0.33.1 (2023-02-03)

new:

  • N/A

changed

  • N/A

fixed:

  • Resolved bug that causes problems when loading PyTorch models (#478)

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.33.0

0.33.0 (2023-01-14)

new:

  • Support for the latest version of transformers.

changed

  • Removed pin to transformers==4.17

fixed:

  • Changed numpy.float and numpy.int to numpy.float64 and numpy.int_ respectively, in ktrain.utils (#474)
  • Removed pandas deprecation warnings from ktrain.tabular.prepreprocessor (#475)
  • Ensure use_token_type_ids always exists in TransformerPreprocessor objects to ensure backwards compatibility
  • Removed reference to networkx.info, as it was removed in networkx>=3

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.32.3

0.32.3 (2022-12-12)

new:

  • N/A

changed

  • N/A

fixed:

  • Changed NMF to accept optional parameters nmf_alpha_W and nmf_alpha_H based on changes in scikit-learn==1.2.0.
  • Change ktrain.utils to check for TensorFlow before doing a version check, so that ktrain can be imported without TensorFlow being installed.

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.32.2

0.32.2 (2022-12-12)

new:

  • N/A

changed

  • N/A

fixed:

  • Changed call to NMF to use alpha_W instead of alpha, as alpha parameter was removed in scikit-learn==1.2. (#470)

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.32.1

0.32.1 (2022-12-11)

new:

  • N/A

changed

  • N/A

fixed:

  • In TensorFlow 2.11, the tf.optimizers.Optimizer base class points the new keras optimizer that seems to have problems. Users should use legacy optimizers in tf.keras.optimizers.legacy with ktrain (which evidently will never be deleted). This means that, in TF 2.11, supplying a string representation of an optimizer like "adam" to model.compile uses the new optimizer instead of the legacy optimizers. In these cases, ktrain will issue a warning and automatically recompile the model with the default tf.keras.optimizers.legacy.Adam optimizer.

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.32.0

0.32.0 (2022-12-08)

new:

  • Support for TensorFlow 2.11. For now, as recommended in the TF release notes, ktrain has been changed to use the legacy optimizers in tf.keras.optimizers.legacy. This means that, when compiling Keras models, you should supply tf.keras.optimizers.legacy.Adam() instead of the string "adam".
  • Support for Python 3.10. Changed references from CountVectorizer.get_field_names to CountVectorizer.get_field_names_out. Updated supported versions in setup.py.

changed

  • N/A

fixed:

  • fixed error in docs

- Jupyter Notebook
Published by amaiya about 3 years ago

ktrain - v0.31.10

0.31.10 (2022-10-01)

new:

  • N/A

changed

  • N/A

fixed:

  • Adjusted tika imports due to issue with /tmp/tika.log in multi-user scenario

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.9

0.31.9 (2022-09-24)

new:

  • N/A

changed

  • N/A

fixed:

  • Adjustment for kwe
  • Fixed problem with importing ktrain without TensorFlow installed

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.8

0.31.8 (2022-09-08)

new:

  • N/A

changed

  • N/A

fixed:

  • Fixed paragraph tokenization in AnswerExtractor

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.7

0.31.7 (2022-08-04)

new:

  • N/A

changed

  • re-arranged dep warnings for TF
  • ktrain now pinned to transformers==4.17.0. Python 3.6 users can downgrade to transformers==4.10.3 and still use ktrain.

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.6

0.31.6 (2022-08-02)

new:

  • N/A

changed

  • updated dependencies to work with newer versions (but temporarily continue pinning to transformers==4.10.1)

fixed:

  • fixes for newer networkx

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.5

0.31.5 (2022-08-01)

new:

  • N/A

changed

  • N/A

fixed:

  • fix release

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.4

0.31.4 (2022-08-01)

new:

  • N/A

changed

  • TextPredictor.explain and ImagePredictor.explain now use a different fork of eli5: pip install https://github.com/amaiya/eli5-tf/archive/refs/heads/master.zip

fixed:

  • Fixed loss_fn_from_model function to work with DISABLE_V2_BEHAVIOR properly
  • TextPredictor.explain and ImagePredictor.explain now work with tensorflow>=2.9 and scipy>=1.9 (due to new eli5-tf fork -- see above)

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.3

0.31.3 (2022-07-15)

new:

  • N/A

changed

  • added alnum check and period check to KeywordExtractor

fixed:

  • fixed bug in text.qa.core caused by previous refactoring of paragraph_tokenize and tokenize

- Jupyter Notebook
Published by amaiya over 3 years ago

ktrain - v0.31.2

0.31.2 (2022-05-20)

new:

  • N/A

changed

  • added truncate_to argument (default:5000) and minchars argument (default:3) argument to KeywordExtractor.extract_keywords method.
  • added score_by argument to KeywordExtractor.extract_keywords. Default is freqpos, which means keywords are now ranked by a combination of frequency and position in document.

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 4 years ago

ktrain - v0.31.1

0.31.1 (2022-05-17)

new:

  • N/A

changed

  • Allow for returning prediction probabilities when merging tokens in sequence-tagging (PR #445)
  • added basic ML pipeline test to workflow using latest TensorFlow

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 4 years ago

ktrain - v0.31.0

0.31.0 (2022-05-07)

new:

  • The text.ner.models.sequence_tagger now supports word embeddings from non-BERT transformer models (e.g., roberta-base, codebert). Thanks to @Niekvdplas.
  • Custom tokenization can now be used in sequence-tagging even when using transformer word embeddings. See custom_tokenizer argument to NERPredictor.predict.

changed

  • [breaking change] In the text.ner.models.sequence_tagger function, the bilstm-bert model is now called bilstm-transformer and the bert_model parameter has been renamed to transformer_model.
  • [breaking change] The syntok package is now used as the default tokenizer for NERPredictor (sequence-tagging prediction). To use the tokenization scheme from older versions of ktrain, you can import the re and string packages and supply this function to the custom_tokenizer argument: lambda s: re.compile(f"([{string.punctuation}“”¨«»®´·º½¾¿¡§£₤‘’])").sub(r" \1 ", s).split().
  • Code base was reformatted using black and isort
  • ktrain now supports TIKA for text extraction in the text.textractor.TextExtractor package with the use_tika=True argument as default. To use the old-style text extraction based on the textract package, you can supply use_tika=False to TextExtractor.
  • removed warning about sentence pair classification to avoid confusion

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 4 years ago

ktrain - v0.30.0

0.30.0 (2022-03-28)

new:

  • ktrain now supports simple, fast, and robust keyphrase extraction with the ktran.text.kw.KeywordExtractor module
  • ktrain now only issues a warning if TensorFlow is not installed, insteading of halting and preventing further use. This means that pre-trained PyTorch models (e.g., text.zsl.ZeroShotClassifier) and sklearn models (e.g., text.eda.TopicModel) in ktrain can now be used without having TensorFlow installed.
  • text.qa.SimpleQA and text.qa.AnswerExtractor now both support PyTorch with optional quantization (use framework='pt' for PyTorch version)
  • text.zsl.ZeroShotClassifier, text.translation.Translator, and text.translation.EnglishTranslator all support a quantize argument.
  • pretrained image-captioning and object-detection via transformers are now supported

changed

  • reorganized imports
  • localized seqeval
  • The half parameter to text.translation.Translator, and text.translation.EnglishTranslator was changed to quantize and now supports both CPU and GPU.
  • TFDataset and SequenceDataset classes must not be imported as ktrain.dataset.TFDataset and ktrain.dataset.SequenceDataset.

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 4 years ago

ktrain - v0.29.3

0.29.3 (2022-03-09)

new:

  • NERPredictor.predict now includes a return_offsets parameter. If True, the results will include character offsets of predicted entities.

changed

  • In eda.TopicModel, changed lda_max_iter to max_iter and nmf_alpha to alpha
  • Added show_counts parameter to TopicModel.get_topics method
  • Changed qa.core._process_question to qa.core.process_question
  • In qa.core, added remove_english_stopwords and and_np parameters to process_question
  • The valley learning rate suggestion is now returned in learner.lr_estimate and learner.lr_plot (when suggest=True supplied to learner.lr_plot)

fixed:

  • save TransformerEmbedding model, tokenizer, and configuration when saving NERPredictor and reset te_model to facilitate loading NERPredictors with BERT embeddings offline (#423)
  • switched from keras2onnx to tf2onnx, which supports newer versions of TensorFlow

- Jupyter Notebook
Published by amaiya almost 4 years ago

ktrain - v0.29.2

0.29.2 (2022-02-09)

new:

  • N/A

changed

  • N/A

fixed:

  • added get_tokenizer call to TransformersPreprocessor._load_pretrained to address issue #416

- Jupyter Notebook
Published by amaiya about 4 years ago

ktrain - v0.29.1

0.29.1 (2022-02-08)

new:

  • N/A

changed

  • pin to sklearn==0.24.2 due to breaking changes. This scikit-learn version change only really affects TextPredictor.explain. The eli5 fork supporting tf.keras updated for scikit-learn 0.24.2. To use scikit-learn==0.24.2, users must uninstall and re-install the eli5 fork with: pip install https://github.com/amaiya/eli5/archive/refs/heads/tfkeras_0_10_1.zip.

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya about 4 years ago

ktrain - v0.29.0

0.29.0 (2022-01-28)

new:

  • New vision models: added MobileNetV3-Small and EfficientNet. Thanks to @ilos-vigil.

changed

  • core.Learner.plot now supports plotting of any value that exists in the training History object (e.g., mae if previously specified as metric). Thanks to @ilos-vigil.
  • added raw_confidence parameter to QA.ask method to return raw confidence scores. Thanks to @ilos-vigil.

fixed:

  • pin to transformers==4.10.3 due to Issue #398
  • pin to syntok==1.3.3 due to bug with syntok==1.4.1 causing paragraph tokenization in qa module to break
  • properly suppress TF/CUDA warnings by default
  • ensure document fed to keras_bert tokenizer to avoid this issue

- Jupyter Notebook
Published by amaiya about 4 years ago

ktrain - v0.28.3

0.28.3 (2021-11-05)

new:

  • speech transcription support

changed

  • N/A

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.28.2

0.28.2 (2021-10-17)

new:

  • N/A

changed

  • minor fix to installation due to pypi

fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.28.1

0.28.1 (2021-10-17)

New:

  • N/A

Changed

  • added extra_requirements to setup.py
  • changed imports for summarization, translation, qa, and zsl in notebooks and tests

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.28.0

0.28.0 (2021-10-13)

New:

  • text.AnswerExtractor is a universal information extractor powered by a Question-Answering module and capable of extracting user-specfied information from texts.
  • text.TextExtractor is a text extraction pipeline (e.g., convert PDFs to plain text)

Changed

  • changed transformers pin to transformers>=4.0.0,<=4.10.3

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.27.3

0.27.3 (2021-09-03)

New:

  • N/A

Changed

-N/A

Fixed:

  • SimpleQA now can load PyTorch question-answering checkpoints
  • change API call to support newest causalnlp

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.27.2

0.27.2 (2021-07-28)

New:

  • N/A

Changed

  • N/A

Fixed:

  • check for logits attribute when predicting using transformers
  • change raised Exception to warning for longer sequence lengths for transformers

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.27.1

0.27.1 (2021-07-20)

New:

  • N/A

Changed

  • Added method parameter to tabular.causal_inference_model.

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.27.0

0.27.0 (2021-07-20)

New:

  • Added tabular.causal_inference_model function for causal inference support.

Changed

  • N/A

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.26.5

0.26.5 (2021-07-15)

New:

  • N/A

Changed

  • added query parameter to SimpleQA.ask so that an alternative query can be used to retrieve contexts from corpus
  • added chardet as dependency for stellargraph

Fixed:

  • fixed issue with TopicModel.build when threshold=None

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.26.4

0.26.4 (2021-06-23)

New:

  • API documenation index

Changed

  • Added warning when a TensorFlow version of selected transformers model is not available and the PyTorch version is being downloaded and converted instead using from_pt=True.

Fixed:

  • Fixed utils.metrics_from_model to support alternative metrics
  • Check for AUC ktrain.utils "inspect" function

- Jupyter Notebook
Published by amaiya over 4 years ago

ktrain - v0.26.3

0.26.3 (2021-05-19)

New:

  • N/A

Changed

  • shallownlp.ner.NER.predict processes lists of sentences in batches resulting in faster predictions
  • batch_size argument added to shallownlp.ner.NER.predict
  • added verbose parameter to ktrain.text.textutils.extract_copy to optionally see why each skipped document was skipped

Fixed:

  • Changed TextPredictor.save to save Hugging Face tokenizer files locally to ensure they can be easily reloaded when text.Transformer is supplied with local path.
  • For transformers models, the predictor.preproc.model_name variable is automatically updated to be new Predictor folder to avoid having users manually update model_name. Applies when a local path is supplied to text.Transformer and resultant Predictor is moved to new machine.

- Jupyter Notebook
Published by amaiya almost 5 years ago

ktrain - v0.26.2

0.26.2 (2021-03-26)

New:

  • N/A

Changed

  • NERPredictor.predict now optionally accepts lists of sentences to make sequence-labeling predictions in batches (as all other Predictor instances already do).

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 5 years ago

ktrain - v0.26.1

0.26.1 (2021-03-11)

New:

  • N/A

Changed

  • expose errors from transformers in _load_pretrained
  • changed TextPreprocessor.check_trained to be a warning instead of Exception

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya almost 5 years ago

ktrain - v0.26.0

0.26.0 (2021-03-10)

New:

  • Support for transformers 4.0 and above.

Changed

  • added set_tokenizer to TransformerPreprocessor
  • show error message when original weights cannot be saved (for reset_weights method)

Fixed:

  • cast filename to string before concatenating with suffix in images_from_csv and images_from_df (addresses issue #330)
  • resolved import error for sklearn>=0.24.0, but eli5 still requires sklearn<0.24.0.

- Jupyter Notebook
Published by amaiya almost 5 years ago

ktrain - v0.25.4

0.25.4 (2021-01-26)

New:

  • N/A

Changed

  • N/A

Fixed:

  • fixed problem with LabelEncoder not properly being stored when texts_from_df is invoked
  • refrain from invoking max on empty sequence (#307)
  • corrected issue with return_proba=True in NER predictions (#316)

- Jupyter Notebook
Published by amaiya about 5 years ago

ktrain - v0.25.3

0.25.3 (2020-12-23)

New:

  • N/A

Changed

  • A steps_per_epoch argument has been added to all *fit* methods that operate on generators
  • Added get_tokenizer methods to all instances of TextPreprocessor

Fixed:

  • propogate custom metrics to model when distilbert is chosen in text_classifier and text_regression_model functions
  • pin scikit-learn to 0.24.0 sue to breaking change

- Jupyter Notebook
Published by amaiya about 5 years ago

ktrain - v0.25.2

0.25.2 (2020-12-05)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Added custom_objects argument to load_predictor to load models with custom loss functions, etc.
  • Fixed bug #286 related to length computation when use_dynamic_shape=True

- Jupyter Notebook
Published by amaiya about 5 years ago

ktrain - v0.25.1

0.25.1 (2020-12-02)

New:

  • N/A

Changed

  • Added use_dynamic_shape parameter to text.preprocessor.hf_convert_examples which is set to True when running predictions. This reduces the input length when making predictions, if possible..
  • Added warnings to some imports in imports.py to allow for slightly lighter-weight deployments
  • Temporarily pinning to transformers>=3.1,<4.0 due to breaking changes in v4.0.

Fixed:

  • Suppress progress bar in predictor.predict for keras_bert models
  • Fixed typo causing problems when loading predictor for Inception models
  • Fixes to address documented/undocumented breaking changes in transformers>=4.0. But, temporarily pinning to transformers>=3.1,<4.0 for backwards compatibility.

- Jupyter Notebook
Published by amaiya about 5 years ago

ktrain - v0.25.0

0.25.0 (2020-11-08)

New:

  • The SimpleQA.index_from_folder method now supports text extraction from many file types including PDFs, MS Word documents, and MS PowerPoint files (i.e., set use_text_extraction=True to use this feature).

Changed

  • The default in SimpleQA.index_from_list and SimpleQA.index_from_folder has been changed to breakup_docs=True.

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.24.2

0.24.2 (2020-11-07)

New:

  • N/A

Changed

  • ktrain.text.textutils.extract_copy now uses textract to extract text from many file types (e.g., PDF, DOC, PPT) instead of just PDFs,

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.24.1

0.24.1 (2020-11-06)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Change exception in model ID check in Translator to warning to better allow offline language translations

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.24.0

0.24.0 (2020-11-05)

New:

  • Predictor instances now provide built-in support for exporting to TensorFlow Lite and ONNX.

Changed

  • N/A

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.23.2

0.23.2 (2020-10-27)

New:

  • N/A

Changed

  • Use fast tokenizers for the following Hugging Face transformers models: BERT, DistilBERT, and RoBERTa models. This change affects models created with either text.Transformer(... or text.text_clasifier('distilbert',..'). BERT models created with text_classifier('bert',.., which uses keras_bert instead of transformers, are not affected by this change.

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.23.1

0.23.1 (2020-10-26)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Resolved issue in qa.ask method occuring with embedding computations when full answer sentences exceed 512 tokens.

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.23.0

0.23.0 (2020-10-16)

New:

  • Support for upcoming release of TensorFlow 2.4 such as removal of references to obsolete multi_gpu_model

Changed

  • [breaking change] TopicModel.get_docs now returns a list of dicts instead of a list of tuples. Each dict has keys: text, doc_id, topic_proba, topic_id.
  • added TopicModel.get_document_topic_distribution
  • added TopicModel.get_sorted_docs method to return all documents sorted by relevance to a given topic_id

Fixed:

  • Changed version check warning in lr_find to a raised Exception to avoid confusion when warnings from ktrain are suppressed
  • Pass verbose parameter to hf_convert_examples

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.22.4

0.22.4 (2020-10-12)

New:

  • N/A

Changed

  • changed qa.core.display_answers to make URLs open in new tab

Fixed:

  • pin to seqeval==0.0.19 due to numpy version incompatibility with latest TensorFlow and to suppress errors during installation

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.22.3

0.22.3 (2020-10-09)

New:

  • N/A

Changed

  • N/A

Fixed:

  • fixed issue with missing noun phrase at end of sentence in extract_noun_phrases
  • fixed TensorFlow versioning issues with utils.metrics_from_model

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.22.2

0.22.2 (2020-10-09)

New:

  • added extract_noun_phrases to textutils

Changed

  • SimpleQA.ask now includes an include_np parameter. When True, noun phrases will be used to retrieve documents containing candidate answers.

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.22.1

0.22.1 (2020-10-08)

New:

  • N/A

Changed

  • added optional references argument to SimpleQA.index_from_list
  • added min_words argument to SimpleQA.index_from_list and SimpleQA.index_from_folder to prune small documents or paragraphs that are unlikely to include good answers
  • qa.display_answers now supports hyperlinks for document references

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.22.0

0.22.0 (2020-10-06)

New:

  • added breakup_docs argument to index_from_list and index_from_folder that potentially speeds up ask method substantially
  • added batch_size argument to ask and set default at 8 for faster answer-retrieval

Changed

  • refactored QA and SimpleQA for better extensibility

Fixed:

  • Ensure save_path is correctyl processed in Learner.evaluate

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.21.4

0.21.4 (2020-09-24)

New:

  • N/A

Changed

  • Changed installation instructions in README.md to reflect that using ktrain with TensorFlow 2.1 will require downgrading transformers to 3.1.0.
  • updated requirements with keras_bert>=0.86.0 due to TensorFlow 2.3 error with older versions of keras_bert
  • In lr_find and lr_plot, check for TF 2.2 or 2.3 and make necessary adjustments due to TF bug 41174.

Fixed:

  • fixed typos in __all__ in text and graph modules (PR #250)
  • fixed Chinese language translation based on name-changes of models with zh as source language

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.21.3

0.21.3 (2020-09-08)

New:

  • N/A

Changed

  • added TopicModel.get_word_weights method to retrieve the word weights for a given topic
  • added return_fig option to Learner.lr_plot and Learner.plot, which allows the matplotlib Figure to be returned to user

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.21.2

0.21.2 (2020-09-03)

New:

  • N/A

Changed

  • SUPPRESS_KTRAIN_WARNINGS environment variable changed to SUPPRESS_DEP_WARNINGS

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.21.1

0.21.1 (2020-09-03)

New:

  • N/A

Changed

  • added num_beams and early_stopping arguments to translate methods in translation module that can be set to improve translation speed
  • added half parameter to Translator constructor

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.21.0

0.21.0 (2020-09-03)

New:

  • Added translate_sentences method to Translator class that translates list of sentences, where list is fed to model as single batch

Changed

  • Removed TensorFlow dependency from setup.py to allow users to use ktrain with any version of TensorFlow 2 they choose.
  • Added truncation=True to tokenization in summarization module
  • Require transformers>=3.1.0 due to breaking changes
  • SUPPRESS_TF_WARNINGS environment variable changed to SUPPRESS_KTRAIN_WARNINGS

Fixed:

  • Use prepare_seq2seq_batch insteadd of prepare_translation_batch in translation module due to breaking change in transformers==3.1.0

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.20.2

0.20.2 (2020-08-27)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Always use *Auto* classes to load transformers models to prevent loading errors

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.20.1

0.20.1 (2020-08-25)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Added missing torch.no_grad() scope in text.translation and text.summarization modules

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.20.0

0.20.0 (2020-08-24)

New:

  • added nli_template parameter to ZeroShotClassifier.predict to allow versatility in the kinds of labels that can be predicted
  • efficiency improvements to ZeroShotClassifier.predict that allow faster predictions on large sequences of documents and a large numer of labels to predict
  • added 'multilabelparameter toZeroShotClassifier.predict`
  • added labels parameter to ZeroShotClassifer.predict, an alias to topic_strings parameter

Changed

  • N/A

Fixed:

  • Allow variations on accuracy metric such as binary_accuracy when inpecting model in is_classifier

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.9

0.19.9 (2020-08-17)

New:

  • N/A

Changed

  • N/A

Fixed:

  • In texts_from_array, check class_names only after preprocessing when printing classification vs. regression status.

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.8

0.19.8 (2020-08-17)

New:

  • N/A

Changed

  • N/A

Fixed:

  • In TextPreprocessor instances, correctly reset class_names when targets are in string format.

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.7

0.19.7 (2020-08-16)

New:

  • N/A

Changed

  • added class_weight parameter to lr_find for imbalanced datasets
  • removed pins for cchardet and scikitlearn from setup.py
  • added version check for eli5 fork
  • removed scipy pin from setup.py
  • Allow TensorFlow 2.3 for Python 3.8
  • Request manual installation of shap in TabularPredictor.explain instead of inclusion in setup.py

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.6

0.19.6 (2020-08-12)

New:

  • N/A

Changed

-N/A

Fixed:

  • include metrics check in is_classifier function to support with non-standard loss functions

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.5

0.19.5 (2020-08-11)

New:

  • N/A

Changed

-N/A

Fixed:

  • Ensure transition to YTransform is backwards compatibility for StandardTextPreprocessor and BertPreprocessor

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.4

0.19.4 (2020-08-10)

New:

  • N/A

Changed

  • TextPreprocessor instances now use YTransform class to transform targets
  • texts_from_df, texts_from_csv, and texts_from_array employ the use of either YTransformDataFrame or YTransform
  • images_from_df, images_from_fname, images_from_csv, and imagas_from_array use YTransformDataFrame or YTransform
  • Extra imports removed from PyTorch-based zsl.core.ZeroShotClassifier and summarization.core.TransformerSummarizer. If necessary, both can now be used without having TensorFlow installed by installing ktrain using --no-deps and importing these modules using a method like this.

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.3

0.19.3 (2020-08-05)

New:

  • N/A/

Changed

  • NERPredictor.predict was changed to accept an optional custom_tokenizer argument

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.2

0.19.2 (2020-08-03)

New:

  • N/A

Changed

  • N/A

Fixed:

  • added missing num_classes argument to to_categorical

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.1

0.19.1 (2020-07-29)

New:

  • N/A

Changed

  • Adjusted no_grad scope in ZeroShotClassifier.predict

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.19.0

0.19.0 (2020-07-29)

New:

  • support for tabular data including explainable AI for tabular predictions
  • learner.validate and learner.evaluate now support regression models
  • added restore_weights_only flag to lr_find. When True, only the model weights will be restored after simulating training, not the optimizer weights. In at least a few observed cases, this "warm up" seems to improve performance when actual training begins. Further investigation is needed, so it is False by default.

Changed

  • N/A

Fixed:

  • added save_path argument to Learner.validate and Learner.evaluate. If print_report=False, classification report will be saved as CSV to save_path.
  • Use torch.no_grad with ZeroShotClassifier.predict to prevent OOM
  • Added max_length parameter to ZeroShotClassifier.predict to prevent errors on long documnets
  • Added type check to TransformersPreprocessor.preprocess_train

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.18.5

0.18.5 (2020-07-20)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Changed qa module to use use 'Auto' when loading QuestionAnswering models and tokenizer
  • try from_pt=True for qa module if initial model-loading fails
  • use get_hf_model_name in qa module

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.18.4

0.18.4 (2020-07-17)

New:

  • N/A

Changed

  • N/A

Fixed:

  • return gracefully if no documents match question in qa module
  • tokenize question in qa module to ensure all candidate documents are returned
  • Added error in text.preprocessor when training set has incomplete integer labels

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.18.3

0.18.3 (2020-07-12)

New:

  • added batch_size argument to ZeroShotClassifier.predict that can be increased to speed up predictions. This is especially useful if len(topic_strings) is large.

Changed

  • N/A

Fixed:

  • fixed typo in load_predictor error message

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.18.2

0.18.2 (2020-07-08)

New:

  • N/A

Changed

  • updated doc comments in core module
  • removed unused nosave parameter from reset_weights
  • added warning about obsolete show_wd parameter in print_layers method
  • pin to scipy==1.4.1 due to TensorFlow requirement

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.18.1

0.18.1 (2020-07-07)

New:

  • N/A

Changed

  • Use tensorflow==2.1.0 if Python 3.6/3.7 and use tensorflow==2.2.0 only if on Python 3.8 due to TensorFlow v2.2.0 issues

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.18.0

0.18.0 (2020-07-07)

New:

  • N/A

Changed

  • Fixes to address changes or issues in TensorFlow 2.2.0:
    • created metrics_from_model function due to changes in the way metrics are extracted from compiled model
    • use loss_fn_from_model function due to changes in they way loss functions are extracted from compiled model
    • addd **kwargs to `AdamWeightDecay based on this issue
    • changed TransformerTextClassLearner.predict and TextPredictor.predict to deal with tuples being returned by predict in TensorFlow 2.2.0
    • changed multilabel test to use loss insead of accuracy due to TF 2.2.0 issue
    • changed Learner.lr_find to use save_model and load_model to restore weights due to this TF issue and added TransformersPreprocessor.load_model_and_configure_from_data to support this

Fixed:

  • N/A

- Jupyter Notebook
Published by amaiya over 5 years ago

ktrain - v0.17.5

0.17.5 (2020-07-02)

New:

  • N/A

Changed

  • N/A

Fixed:

  • Explicitly supply 'truncate='longest_first' to prevent sentence pair classification from breaking in transformers==3.0.0
  • Fixed typo in encode_plus invocation

- Jupyter Notebook
Published by amaiya over 5 years ago