Recent Releases of https://github.com/johnsnowlabs/johnsnowlabs

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.1.0 Release

John Snow Labs 6.1.0 has been released with the following upgrades:

  • Bump Enterprise-NLP to 6.1.0
  • Bump Visual-NLP to 6.1.0
  • Bump Spark-NLP to 6.1.1

- Python
Published by C-K-Loan 6 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.4 Release

John Snow Labs 6.0.4 comes with the following upgrades:

  • Bump Enterprise-NLP to 6.0.4
  • Bump Spark-NLP to 6.0.4

- Python
Published by C-K-Loan 7 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.3 Release

This release comes with improvements to the Snowflake and Docker Integrations, enabling now you to now deploy custom pipelines , any pretrained john snow labs pipeline or any NLU Pipelin as a Snowflake UDF or Docker container. Additionally Spark-NLP and Enterprise-NLP are bumped to version 6.0.3.

Improvements to Docker integration

Instead of providing a nlu model reference for deploying a model, users have now 2 options

Deploy Pipeline via Name, Language and Bucket

python nlp.build_image(pipeline_name=pipe_name, pipeline_language=pipe_lang, pipeline_bucket=pipe_bucket)

Deploy custom pipeline by providing a fitted pipeline object

python custom_pipe = nlp.Pipeline(stages=[...]).fit(df) nlp.build_image(custom_pipe=custom_pipe)

** Deploy NLU pipelines as container**

Any nlu pipeline can still be deployed as a custom pipe ```python

Load Model

pipe = nlp.load(modelnluref)

Predict so that under the hood pipeline is fitted and vanillatransformerpipe attribute is avaiable

pipe.predict('')

Now we can just handle it like a custom pipeline

nlp.buildimage(custompipe=pipe.vanillatransformerpipe) ```


Improvements to Snowflake integration

Since the Snowflake integration uses the Docker, all of the changes for the docker utilities are reflected into Snowflake as well.

Instead of providing a nlu model reference for deploying a model, users have now 2 options

Deploy Pipeline via Name, Language and Bucket

python nlp.build_image(pipeline_name=pipe_name, pipeline_language=pipe_lang, pipeline_bucket=pipe_bucket)

Deploy custom pipeline by providing a fitted pipeline object

python custom_pipe = nlp.Pipeline(stages=[...]).fit(df) nlp.build_image(custom_pipe=custom_pipe)

Deploy NLU pipelines as container Any nlu pipeline can still be deployed ```python

Load Model

pipe = nlp.load(modelnluref)

Predict so that under the hood pipeline is fitted and vanillatransformerpipe attribute is avaiable

pipe.predict('')

Now we can just handle it like a custom pipeline

nlp.buildimage(custompipe=pipe.vanillatransformerpipe) ```


Version Bumps

The following dependency versions have been bumped: - Spark NLP to 6.0.3 - Enterprise NLP to 6.0.3


Bug Fixes

  • Fixed bug causing errors when creating Databricks cluster with nlp.install_to_databricks()

- Python
Published by C-K-Loan 8 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.2 Release

John Snow Labs 6.0.2 comes with the following upgrades:

  • Bump Enterprise-NLP to 6.0.2

- Python
Published by C-K-Loan 9 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.1 Release

John Snow Labs 6.0.1 comes with the following upgrades:

  • Bump Spark-NLP to 6.0.1
  • Bump Enterprise-NLP to 6.0.1

- Python
Published by C-K-Loan 9 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.0 Release

John Snow Labs 6.0.0 comes with the following upgrades: - Bump Spark-NLP to 6.0.0 - Bump Enterprise-NLP to 6.0.0 - Bump Visual-NLP to 6.0.0

Additionally a few bugfixes and improvements have been made: - Deprecated pkg_resources usage and refactored modules depending on it which caused import errors in some python versions - Made import of _shared_pyspark_ml_param more roubust which caused import errors in some python versions

- Python
Published by C-K-Loan 10 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.5 Release

John Snow Labs 5.5.5 comes with the following upgrades:

  • Bump Medical NLP to 5.5.3
  • Bump Spark-NLP to 5.5.3

- Python
Published by C-K-Loan 11 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.4 Release

John Snow Labs 5.5.4 comes with the following upgrades and fixes:

  • Bump Visual NLP to 5.5.0
  • Fix bug causing browser-based install to fail if you only have PAYG licenses in my.jsl

- Python
Published by C-K-Loan about 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs -

John Snow Labs 5.5.3 comes with the following upgrades:

  • Bump Enterprise NLP to 5.5.2
  • Bump Spark NLP to 5.5.2

- Python
Published by C-K-Loan about 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.2 Release

John Snow Labs 5.5.2 releases with the following upgrades:

  • Bump Spark-NLP to 5.5.1
  • Bump Enterprise NLP to 5.5.1
  • Bump Visual NP to 5.4.2

- Python
Published by C-K-Loan about 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.1 Release

John Snow Labs 5.5.1 comes with various improvements to the Snowflake utilities and docs.

  • New Snowflake Tutorial Notebook for creating Snowflake UDFs and calling them
  • Updated Snowflake Utility Docs
  • Automated Login to Docker Repo while creating Snowflake UDF
  • nlp.deploy_as_snowflake_udf now blocks until the Snowflake UDF is successfully created or timeout occurs
  • Less verbose logging when creating a snowflake UDF

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.0 Release

John Snow Labs 5.5.0 comes with the following upgrades:

  • Bump Enterprise NLP to 5.5.0
  • Bump Spark NLP to 5.5.0
  • Support for Pydantic >= 2.0

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.5 Release

John Snow Labs 5.4.5 comes with the following upgrades:

  • Bump Visual NLP to 5.4.1
  • Bump NLU to 5.4.1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.4 Release

John Snow Labs 5.4.4 is a hotfix release for import issues with Healthcare NLP to 5.4.1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.3 Release

John Snow Labs 5.4.3 has been released with the following upgrades:

  • Bump Spark NLP to 5.4.1
  • Bump Healthcare NLP to 5.4.1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.2 Release

  • bugfix for creating docker images via nlp.build_image

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.1 Release

John Snow Labs 5.4.0 has been released with the following changes:

  • Bump NLU to 5.4.1rc1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.0 Release

We are very excited to Announce John Snow Labs 5.4.0 has been released, the entire JSL Suite has been upgraded to 5.4.0!

Changes:

  • Bump Spark NLP to 5.4.0
  • Bump Healthcare NLP to 5.4.0
  • Bump Visual NLP to 5.4.0
  • Bump NLU to 5.4.0

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.6 Library Release

  • bump NLU to 5.3.2
  • Fix Detection of Databricks environment https://github.com/JohnSnowLabs/johnsnowlabs/pull/1222 and follow up bugs in Endpoint Environments

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.5 Library Release

We are very excited to announce John Snow Labs 5.3.5 has been released! It features: - bump NLU to 5.3.1 - bump Spark-NLP to 5.3.2 - bump Medical NLP to 5.3.2 - bump Visual NLP to 5.3.2 - fixed bug that caused nbformat import exceptions

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.4 Library Release

John Snow Labs 5.3.4 Includes:

  • Bump Visual NLP to 5.3.1
  • Bump Medical NLP to 5.3.1
  • Fix bug with test paths

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - Snowflake UDFs and Docker Image creation for NLP, Healthcare and Visual Models in John Snow Labs 5.3.3

John Snow Labs 5.3.3 has been released with support for creating Snowflake UDF's from John Snow Labs Models as well as easy containerization of any John Snow Labs model, bundled with a simple REST API which supports text & file-prediction for any file-type supported by nlp.load() like PDF's, images and more!

For more infos see the new official documentation pages - Docker utility documentation - Snowflake utility documentation

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.1 Library Release

  • fix bug causing unexpected browser pop-up during install
  • fix bug causing databricks install to fail

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.1 Library Release

  • bump spark-nlp to 5.3.1

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.0 Library Release

We are excited to announce johnsnowlabs 5.3.0 has been released!

It Features - Bump Spark NLP to 5.3.0 - Bump Medical NLP to 5.3.0 - Bump Visual NLP to 5.3.0 - Bump NLU to 5.3.0 - Bump Spark NLP Display to 5.0.0 - Bugfixes for installing to existing Databricks Clusters

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.8 Library Release

Fix bug with pip install

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.7 Library Release

  • Bump pyspark to 3.4.0
  • NerConverterInternal accessible via medical.NerConverter, finance.NerConverter, legal.NerConverter

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.6 Library Release

Fix minor bugs on Databricks marketplace causing some models to load improperly

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.5 Library Release

Fix minor bugs on Databricks marketplace causing some models to load improperly

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.4 Library Release

Fix bug on marketplace with when providing databricks host url with a trailing comma

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.3 Library Release

New Databricks Model Marketplace Utils and Notebook updates and tweaks

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.2 Library Release

New Databricks Model Marketplace Utils and Notebook

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.1 Library Release

  • bump Spark NLP to 5.2.2
  • bump Enterprise NLP to 5.2.1
  • bump NLU to 5.1.3

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.0 Library Release

  • bump Spark NLP to 5.2.0
  • bump Enterprise NLP to 5.2.0
  • bump Visual NLP to 5.1.2

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.9 Library Release

  • add IOB tagger to legal, finance and medical modules
  • bugfix for missing metadata in haystack
  • InternalDocumentSplitter for legal, finance and medical modules
  • bump visual to 510
  • bump medical to 514

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.8 Library Release

Johnsnowlabs Haystack Integrations

Johnsnowlabs provides the following nodes which can be used inside the Haystack Framework for scalable pre-processing&embedding on spark clusters. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Haystack with Johnsnowlabs Tutorial Notebook and the new Haystack+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Haystack based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

```python

Create Pre-Processor which is connected to spark-cluster

from johnsnowlabs.llm import embeddingretrieval processor = embeddingretrieval.JohnSnowLabsHaystackProcessor( chunkoverlap=2, chunksize=20, explodesplits=True, keepseperators=True, patternsareregex=False, splitpatterns=["\n\n", "\n", " ", ""], trimwhitespace=True, )

Process document distributed on a spark-cluster

processor.process(some_documents) ```

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs in Haystack You must provide the NLU reference of a sentence embeddings to load it. If you want to use GPU with the Embedding Model, set GPU=True on localhost, it will start a spark-session with GPU jars. For clusters, you must setup cluster-env correctly, using nlp.installtodatabricks() is recommended.

```python from johnsnowlabs.llm import embeddingretrieval from haystack.documentstores import InMemoryDocumentStore

Write some processed data to Doc store, so we can retrieve it later

documentstore = InMemoryDocumentStore(embeddingdim=512) documentstore.writedocuments(some_documents)

Create Embedder which connects is connected to spark-cluster

retriever = embeddingretrieval.JohnSnowLabsHaystackEmbedder( embeddingmodel='en.embedsentence.bertbaseuncased', documentstore=documentstore, usegpu=False, )

Compute Embeddings distributed in a cluster

documentstore.updateembeddings(retriever) ```

Johnsnowlabs Langchain Integrations

Johnsnowlabs provides the following components which can be used inside the Langchain Framework for scalable pre-processing&embedding on spark clusters as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Langchain with Johnsnowlabs Tutorial Notebook and the new Langchain+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Langchain based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

```python from langchain.documentloaders import TextLoader from johnsnowlabs.llm import embeddingretrieval

loader = TextLoader('/content/stateofthe_union.txt') documents = loader.load()

from johnsnowlabs.llm import embedding_retrieval

Create Pre-Processor which is connected to spark-cluster

processor = embeddingretrieval.JohnSnowLabsLangChainCharSplitter( chunkoverlap=2, chunksize=20, explodesplits=True, keepseperators=True, patternsareregex=False, splitpatterns=["\n\n", "\n", " ", ""], trim_whitespace=True, )

Process document distributed on a spark-cluster

preprocesseddocs = jslsplitter.splitdocuments(documents)

```

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs. You must provide the NLU reference of a sentence embeddings to load it. You can start a spark session by setting hardware_target as one of cpu, gpu, apple_silicon, or aarch on localhost environments. For clusters, you must setup the cluster-env correctly, using nlp.installtodatabricks() is recommended.

```python

Create Embedder which connects is connected to spark-cluster

from johnsnowlabs.llm import embeddingretrieval embeddings = embeddingretrieval.JohnSnowLabsLangChainEmbedder('en.embedsentence.bertbaseuncased',hardwaretarget='cpu')

Compute Embeddings distributed

from langchain.vectorstores import FAISS retriever = FAISS.fromdocuments(preprocesseddocs, embeddings).asretriever()

Create A tool

from langchain.agents.agenttoolkits import createretrievertool tool = createretrievertool( retriever, "searchstateofunion", "Searches and returns documents regarding the state-of-the-union." )

Use Create LLM Agent with the Tool

from langchain.agents.agenttoolkits import createconversationalretrievalagent from langchain.chatmodels import ChatOpenAI llm = ChatOpenAI(openaiapikey='YOURAPIKEY') agentexecutor = createconversationalretrievalagent(llm, [tool], verbose=True) result = agentexecutor({"input": "what did the president say about going to east of Columbus?"}) result['output']

Entering new AgentExecutor chain... Invoking: search_state_of_union with {'query': 'going to east of Columbus'} [Document(pagecontent='miles east of', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='in America.', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='out of America.', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='upside down.', metadata={'source': '/content/stateoftheunion.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address. Finished chain. I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address. ```

nlp.deployendpoint and nlp.queryendpoint

You can Query&Deploy John Snow Labs models with 1 line of code as Databricks Model Serve Endpoints.
Data is passed to the predict() function and predictions are shaped accordingly.
You must create endpoints from a Databricks cluster created by nlp.install.

See Cluster Creation Notebook and Databricks Endpoint Tutorial Notebook
These functions deprecate nlp.queryanddeployifmissing, which will be dropped in John Snow Labs 5.2.0

```python

You need mlflow_by_johnsnowlabs installed until next mlflow is released

! pip install mlflowbyjohnsnowlabs from johnsnowlabs import nlp nlp.deployendpoint('bert') nlp.queryendpoint('bert_ENDPOINT','My String to embed') ```

nlp.deploy_endpoint will register a ML-FLow model into your registry and deploy an Endpoint with a JSL license. It has the following parameters:

| Parameter | Description | |------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | model | Model to be deployed as endpoint which is converted into NluPipelines, supported classes are: String Reference to NLU Pipeline name like 'bert', NLUPipeline, List[Annotator], Pipeline, LightPipeline, PretrainedPipeline, PipelineModel. In case of a NLU reference, the endpoint name is auto-generated aus <nlu_ref>_ENDPOINT i.e. bert_ENDPOINT. '.' is replaced with '' in the nlu reference for the endpoint name | | `endpointname| Name for the deployed endpoint. Optional if using NLU model reference but mandatory for custom pipelines. | |recreateendpoint| if False, endpoint creation is skipped if one already exists. If True, it will delete existing endpoint if it exists | |recreatemodel| if False, model creation is skipped if one already exists. If True, model will be re-logged again, bumping the current version by 2 | |workloadsize| one of Small, Medium, Large. | |gpu|True/Falseto load GPU-optimized jars or CPU-optimized jars in the container. Must use a gpu basedworkloadtypeifgpu=true| |newrun| if True, mlflow will start a new run before logging the model | |blockuntildeployed| if True, this function will block until the endpoint is created | |workloadtype|CPUby default, useGPUSMALLto spawn a GPU based endpoint instead. Check Databricks docs for alternative values | |dbhost| the databricks host URL. If not specified, the DATABRICKS_HOST environment variable is used | |dbtoken` | the databricks Access Token. If not specified, the DATABRICKSTOKEN environment variable is used |

nlp.query_endpoint translates your query to JSON, sends it to the endpoint and returns the result as pandas DataFrame. It has the following parameters which are forwarded to the model.predict() call inside of the endpoint:

| Parameter | Description | |-----------------------------|---------------------------------------------------------------------------------------------------| | endpoint_name | Name of the endpoint to query | | query | str or list of strings or raw json string. If raw json, isjsonquery must be True | | is_json_query | if True, query is treated as raw json string | | output_level | One of token, chunk, sentence, relation, document to shape outputs | | positions | Set True/False to include or exclude character index position of predictions | | metadata | Set True/False to include additional metadata | | drop_irrelevant_cols | Set True/False to drop irrelevant columns | | get_embeddings | Set True/False to include embedding or not | | keep_stranger_features | Set True/False to return columns not named "text", 'image" or "filetype" from your input data | | multithread | Set True/False to use multi-Threading for inference. Auto-inferred if not set | | `dbhost| the databricks host URL. If not specified, the DATABRICKS_HOST environment variable is used | |dbtoken` | the databricks Access Token. If not specified, the DATABRICKSTOKEN environment variable is used |

nlp.query_endpoint and nlp.deploy_endpoint check the following mandatory env vars to resolve wheels for endpoints

| Env Var Name | Description | |-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| | HEALTHCARE_SECRET | Automatically set on your cluster if you run nlp.install() | | VISUAL_SECRET | Automatically set if you run. nlp.install(..., visual=True). You can only spawn visual endpoint from a cluster created by nlp.install(..., visual=True) | | JOHNSNOWLABS_LICENSE_JSON | JSON content of your john snow labs licensed to use for endpoints. Should be airgap license |

Version Bumps

Bug Fixes & Minor tweaks

  • Fixed a bug causing cached jars on johnsnowlabs home directory on databricks to not be used
  • fixed bugs with boto3 imports in certain envs
  • new parameter in nlp.run_in_databricks(return_job_url=True) to optionally return URL of Job

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.7 Library Release

  • enterprise nlp bump to 5.1.2
  • open source nlp bump to 5.1.2
  • nlu bump to 5.0.4rc2
  • support for deploying endpoints with GPU infrastructure in databricks via the workload_type parameter in nlp.query_and_deploy
  • yarn mode support for EMR configs-

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.6 Library Release

  • bump visual NLP to 5.0.2

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.5 Library Release

  • bump NLU to 5.0.3

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.4 Library Release

  • upgrade NLU to 5.0.2
  • remove pandas >=2 downgrade for databricks clusters

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.3 Library Release

  • Fix update Databricks cluster
  • nlp.install(med_license=) should work without aws keys for floating licenses
  • add nlp.installtodatabricks and add deprecation warning for nlp.install() when creating new databricks cluster. Will be dropped next release
  • Fixed pandas to 1.5.3 for newly created Databricks clusters until NLU supports pandas>=2
  • new parameters parameter in nlp.runindatabricks for parameterizing submitted databricks jobs and new documentation

  • new parameter extra_pip_installs which can be used to install additional pypi dependencies when creating a Databricks cluster or installing to an existing cluster.

example of extra_pip_installs

```python

nlp.installtodatabricks( databricksclusterid=clusterid, databrickshost=host, databrickstoken=token, extrapip_installs=["farm-haystack==1.21.2", "langchain"], ) ```

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.2 Library Release

  • bump Healthcare NLP to 5.1.1

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.1 Library Release

  • bump Enterprise NLP to 5.1.1
  • bump Healthcare NLP to 5.1.1
  • support for submitting jupyter notebooks in nlp.runindatabricks and new docs for notebook submission

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.0 Library Release

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.8 Library Release

nlp.queryanddeployifmissing() has been upgraded with new powerful features!

| Parameter | Description | |-----------------------------|----------------------------------------------------------------------------------------------------| | output_level | One of token, chunk, sentence, relation, document to shape outputs | | positions | Set True/False to include or exclude character index position of predictions | | metadata | Set True/False to include additional metadata | | drop_irrelevant_cols | Set True/False to drop irrelevant columns | | get_embeddings | Set True/False to include embedding or not | | keep_stranger_features | Set True/False to return columns not named "text", 'image" or "file_type" from your input data | | multithread | Set True/False to use multi-Threading for inference. Auto-inferred if not set |

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.7 Library Release

Hotfix for bad package

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.6 Library Release

  • clean_cluster in nlp.install() to clean databricks cluster before installing johnsnowlabs software. Default True
  • write_db_credentials in nlp.install() to write databricks host and access token into env variables which will be used for endpoint creation. Default True
  • fixed bug which caused visual library to be installed to databricks cl usters, even if visual=False
  • updated documentation
  • New powerful 1-liner nlp.query_and_deploy_if_missing() which deploys a john snow labs model as databricks serve endpoint and queries it. If model is already deployed, it will not be deployed again. For more details see

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.5 Library Release

  • new block_till_complete parameter in nlp.runindatabricks and logging of Databricks Task URL for monitoring
  • optimized Databricks configs for Visual NLP clusters

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.4 Library Release

  • bump spark-nlp to 5.0.2
  • bum healthcare to 5.0.2
  • bump ocr to 5.0.0
  • bump nlu to 5.0.0

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.1 Library Release

  • bump spark-nlp to 5.0.1
  • bum healthcare to 5.0.1
  • bump ocr to 4.4.4

  • nlp.install(hardware_target='m1') is now nlp.install(hardware_target='apple_silicon')

  • nlp.start(hardware_target='m1') is now nlp.start(hardware_target='apple_silicon')

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.0 Library Release

We are very excited to announce John Snow Labs 5.0.0 has been released.

  • bump spark-nlp to 5.0.0
  • bump enterprise-nlp to 5.0.0

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.11 Library Release

hotfix pydantic version to 1.10.11 because of validation bug

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.10 Library Release

Add ChunkFiltererApproach to finance, legal and medical modules

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.9 Library Release

  • bump Spark-NLP to 4.4.4
  • bump Enterpprise-NLP to 4.4.4
  • bump Visual-NLP to 4.4.3

Improved nlp.install() : When providing credentials with outdated secrets for the library versions, they will automatically be upgraded to the latest recommended versions, as long as you have a valid license and settings.enforce_versions=True

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.8 Library Release

  • upgrade NLU to 4.2.2
  • support for JOHNSNOWLABS_LICENSE_JSON as single env variable to provide credentials. This is the raw json string of your license file.

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.7 Library Release

  • bump visual NLP to 4.4.2
  • bump enterprise NLP to 4.4.3
  • bump NLU to 4.2.1

https://github.com/JohnSnowLabs/johnsnowlabs/issues/333 and partially https://github.com/JohnSnowLabs/johnsnowlabs/issues/348

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.6 Library Release

John Snow Labs 4.4.6 Library ReleaseWe are very excited to announce johnsnowlabs 4.4.6 has been released!

Features : - create_jsl_home_if_missing parameter added to nlp.start() which can be set to False to disable the creation of the ~/.johnsnowlabs directory. This is useful when jars are provided directly via jar_paths parameter.

  • Dynamic Wheel resolution for spark nlp, enabling you to set settings.nlp_version='4.4.2' and it will automatically use the appropriate jars and wheels when starting a session or building an envirornment.

  • Fixed erronous handling of enterprise-secrets which have <VERSION>.<PR-NUM>.<COMMIT_HASH> pattern,

  • Bump Enterprise NLP version to 4.4.2

  • Bump OCR version to 4.4.1

- Python
Published by C-K-Loan over 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.5 Library Release

bump enterprise NLP to 4.4.1

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.4 Library Release

  • bugfix in databricks installation

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.3 Library Release

  • Bump Spark-NLP to 441, introducing DistilBertForZeroShotClassification to the nlp module
  • Fix type hint bug

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.2 Library Release

Open Source NLP: - Bump version to 4.4.0 - Support for ConvNextForImageClassification, BartTransformer, BertForZeroShotClassification

Enterprise NLP: - Bump version to 4.4.0 - Support for QuestionAnswering, TextGenerator, Summarizer, and WindowedSentenceModel in Finance, Legal and Medical modules

Visual NLP: - PretrainedPipeline support via visual.PretrainedPipeline

General: Enhanced backward compatibility mechanisms which ensure all available classes, functions and submodules are usable for the finance, legal, and medical modules, even if an outdated version of Enterprise NLP with breaking changes is installed.

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.1 Library Release

Downgrade Enterprise and Spark NLP to 4.3.2 for the workshop

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.0 Library Release

Open Source NLP : - Bump version to 4.4.0 - Support for ConvNextForImageClassification, BartTransformer, BertForZeroShotClassification

Enterprise NLP: - Bump version to 4.4.0 - Support for QuestionAnswering, TextGenerator, Summarizer, WindowedSentenceModel in Finance, Legal and Medical modules

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.5 Library Release

  • fix bug that caused failure when installing enterprise-nlp 4.3.2

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.4 Library Release

  • Better Secret Handling for Databricks
  • Bump NLU Version to 4.2.0
  • Bump NLP Version to 4.3.2
  • Bump Enterprise NLP Version to 4.3.2
  • Bump Visual Version to 4.3.3

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.3 Library Release

  • Bump Visual NLP to 4.3.1
  • access to visual.LightpPipeline

- Python
Published by C-K-Loan almost 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.2 Library Release

Bump NLU version Update client ID to dedicated jsl-lib secret for OAUTH johnsnowlabs-4.3.2.tar.gz

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.1 Library Release

  • Hotfix for bug that causes pip install to fail because of dependency conflicts from NLU

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.0 Library Release

  • bump enterprise NLP and open source NLP to 4.3.0
  • Generic Log Reg and Generic SVM available for finance, legal and medical modules
  • Hubert, Swin Transformer, Zero-SHOT NER, CamemBert for QA for nlp module

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.9 Library Release

  • New TextSplitter Annotator for Finance & Legal which is just a use-case focused alias for SentenceDetector
  • Fix bug with NLP module not properly refreshing attached classes after running nlp.install()

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.5 Release

  • visual to 4.2.4 bump
  • enterprise nlp to 4.2.5 bump
  • traininglogparser

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.3 Release

  • version bumps
  • docstring updates
  • new nlp annotators from enterprise 4.2.1

https://github.com/JohnSnowLabs/johnsnowlabs/pull/12

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.4 Release

bump nlp-enterprise version fix bad import mapping for some legal annotators fix bug with setting license env variables on Databricks

johnsnowlabs-4.2.4.tar.gz

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.8 Library Release

  • DocMapper, DocMapperApproach, DocObfuscator, DocMlClassifier, Resolution2Chunk, ContextParserModel for finance,legal and healthcare
  • Upgrade Enterprise NLP to 4.2.8
  • Upgrade Open Source NLP to 4.2.8
  • Upgrade Visual NLP to 4.3.0
  • better error messages when importing modules fails
  • Improving various error messages
  • fix bug causing dependency on nbformat
  • fix bugs with handling paths incorrectly on windows

- Python
Published by C-K-Loan about 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.2 - Spark NLP version Bump to 4.2.1

We are glad to announce johnsnowlabs 4.2.2 has been released!

Changes :

  • Version Bump Spark NLP to 4.2.1
  • Fix minor bug with type conversion during pypi standard instal

johnsnowlabs422.zip

- Python
Published by C-K-Loan over 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.1 Release - No more restarts required after installing licensed libs

We are pleased to announce that version 4.2.1 of the johnsnowlabs the library has been released! It comes with one crucial improvement: No more notebook restarts required after running jsl.install()

4.2.1johnsnowlabs.zip

- Python
Published by C-K-Loan over 3 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.0 Release

We are announcing with incredible excitement the release of the John Snow Labs 4.2.0 Library! It introduces - New Enterprise Syntax to easily access any feature of any JSL-Library. - Highly configurable Automatic Installers with various Authorization Flows and Installation Targets like 1-Click OAUTH, 1 Line Databricks, 1 Line for new Enterprise Compatible venv and extended Offline support. - Easily run a Python Function, Raw Python Code Snippet, Python Script or Python Module in a Databricks cluster in 1 line of code and create one if missing. - Smart License/Jar/Wheel Caching, never type your license twice on the same machine when starting up a SparkSession or Re-Installing licensed libs! - Various of Safety Mechanisms and Footguns removed, to reduce injuries :)

Introducing the new Enterprise Syntax for working with all of John Snow Labs libraries.
It bundles every relevant Function and Class you might ever need when working with JSL-Libraries into 1 simple import line. from johnsnowlabs import *
This single import gets your thorugh all of the certification Notebooks, with exception of a few third party libraries. The following modules will become avaiable :

Links to existing prodcuts

Usage&Overview for more details on import structure

  • nlp.MyAnno() and nlp.my_function() for every of Spark NLP's Python Functions/Classes/Modules
  • ocr.MyAnno() and ocr.my_function() for every of Spark OCR's Python Functions/Classes/Modules
  • legal.MyAnno() and legal.my_function() for every of Spark For Legal Python Functions/Classes/Modules
  • finance.MyAnno() and finance.my_function() for every of Spark For Finance Python Functions/Classes/Modules
  • medical.MyAnno() and medical.my_function() for every of Spark For Medical Python Functions/Classes/Modules
  • viz.MyVisualizer() for every of Spark NLP-Display Classes
  • jsl.load() and jsl.viz() from NLU

New Powerful Installation and Spark Session Start


The John Snow Labs libary aims to make installing Licensed Libraries and starting a Sparksession as easy as possible. Installation Docs & Launch a Spark Session Docs

  • jsl.install()
    • Authorization Flows (proove you have a License):
      • Auto-Detect Environment Variables
      • Auto Detect license files in current working dir
      • Auto Detect cached license information that was stored in ~/.johnsnowlabs from previous uns
      • Auto-Inject Local Browser Based OAuth
      • Auto-Inject Colab Button based Oauth
      • Manual Variable Definition
      • Manual Json Path
      • Access Token
    • Installation Targets (Where to install to?):
      • Currently running Python Process
      • Into a Python environment, which is not the currently running Process
      • Into a provided Venv
      • Into a freshly created venv by the john snow labs library
      • Airgap, by creating easy copy-pastable Zip file with all Jar/Wheels/Licenses to run in airgap
      • Databricks
  • jsl.start()
    • After having run jsl.install() you can just run jsl.start() .It remembers the license that was used to install and also has all jars pre-downloaded. Additionally, it gives very helpful Logs when launching a session, telling you loaded jars and their versions. You can even load a new license during jsl.start() , which supports all of the previously mentioned authorization flows.

License Management

List all of your usable jsl licenses with jsl.list_remote_licenses() And your locally cached licenses with jsl.list_local_licenses()


Databricks Utils

Easily submit any task to a Databricks cluster, in various formats, see Utils for databricks Docs

Run a Raw Python Code String in a Cluster and also create on on the fly. ```python from johnsnowlabs import * script = """ import nlu print(nlu.load('sentiment').predict('That was easy!'))"""

clusterid = jsl.install(jsonlicensepath=mylicense, databrickshost=myhost,databrickstoken=mytoken) jsl.runindatabricks(script, databricksclusterid=clusterid, databrickshost=myhost, databrickstoken=mytoken, runname='Python Code String Example') ```

Run a Python Function in a Cluster. ```python def myfunction(): import nlu medicaltext = """A 28-year-old female with a history of gestational diabetes presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .""" df = nlu.load('en.medner.diseases').predict(medicaltext) for c in df.columns: print(df[c])

my_function will run on databricks

jsl.runindatabricks(myfunction, databricksclusterid=clusterid, databrickshost=myhost, databrickstoken=mytoken, run_name='Function test')

```

Run a Python Script in a Cluster.

python jsl.run_in_databricks('path/to/my/script.py', databricks_cluster_id=cluster_id, databricks_host=my_host, databricks_token=my_token, run_name='Script test ')

Run a Python Module in a Cluster python import johnsnowlabs.auto_install.health_checks.nlp_test as nlp_test jsl.run_in_databricks(nlp_test, databricks_cluster_id=cluster_id, databricks_host=my_host, databricks_token=my_token, run_name='nlp_test')

Testing Utils

You can use the John Snow Labs library to automatically test 10000+ models and 100+ Notebooks in 1 line of code within a small machine like a single Google Colab Instance and generate very handy error reports of potentially broken Models, Notebooks or Models hub Markdown Snippets.

Automatically test Notebooks/Modelshub Markdwon via URL, File-Path and many more options!

Workshop Notebook Testing Utils

See Utils for Testing Notebooks docs

```python from johnsnowlabs.utils.notebooks import test_ipynb

Test a Local Markdown file with a Python Snippet

test_ipynb('path/to/local/notebook.ipynb')

Test a Modelshub Python Markdown Snippet via URL

testipynb('https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/CertificationTrainings/Healthcare/5.Spark_OCR.ipynb',)

Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout

test_ipynb('my/notebook/folder')

Test an Array of URLS/Paths to Markdown Fies

testipynb([ 'https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/CertificationTrainings/Healthcare/5.Spark_OCR.ipynb', 'path/to/local/notebook.ipynb',])

Run ALL notebooks in the Certification Folder

testresult = testipynb('WORKSHOP')

Only run Finance notebooks

testresult = testipynb('WORKSHOP-FIN')

Only run Legal notebooks

testresult = testipynb('WORKSHOP-LEG')

Only run Medical notebooks

testresult = testipynb('WORKSHOP-MED')

only run Open Source notebooks

testresult = testipynb('WORKSHOP-OS') ```

Modelshub Testing Utils

See Utils for Testing Models & Modelshub Markdown Snippets Docs

```python from johnsnowlabs.utils.modelhubmarkdown import testmarkdown

Test a Local Markdown file with a Python Snippet

test_markdown('path/to/my/file.md')

Test a Modelshub Python Markdown Snippet via URL

testmarkdown('https://nlp.johnsnowlabs.com/2022/08/31/legpipedeid_en.html')

Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout

test_markdown('my/markdown/folder')

Test an Array of URLS/Paths to Markdown Fies

testipynb(['legpipedeiden.html','path/to/local/markdownsnippet.md',]) ```

New Documentation Pages

- Python
Published by C-K-Loan over 3 years ago