https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.1.0 Release

John Snow Labs 6.1.0 has been released with the following upgrades:

Bump Enterprise-NLP to 6.1.0
Bump Visual-NLP to 6.1.0
Bump Spark-NLP to 6.1.1

- Python
Published by C-K-Loan 6 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.4 Release

John Snow Labs 6.0.4 comes with the following upgrades:

Bump Enterprise-NLP to 6.0.4
Bump Spark-NLP to 6.0.4

- Python
Published by C-K-Loan 7 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.3 Release

This release comes with improvements to the Snowflake and Docker Integrations, enabling now you to now deploy custom pipelines , any pretrained john snow labs pipeline or any NLU Pipelin as a Snowflake UDF or Docker container. Additionally Spark-NLP and Enterprise-NLP are bumped to version 6.0.3.

Improvements to Docker integration

Docs have been updated to reflect changes

Instead of providing a nlu model reference for deploying a model, users have now 2 options

Deploy Pipeline via Name, Language and Bucket

python nlp.build_image(pipeline_name=pipe_name, pipeline_language=pipe_lang, pipeline_bucket=pipe_bucket)

Deploy custom pipeline by providing a fitted pipeline object

python custom_pipe = nlp.Pipeline(stages=[...]).fit(df) nlp.build_image(custom_pipe=custom_pipe)

Deploy NLU pipelines as container

Any nlu pipeline can still be deployed as a custom pipe ```python

Load Model

pipe = nlp.load(modelnluref)

Predict so that under the hood pipeline is fitted and vanillatransformerpipe attribute is avaiable

pipe.predict('')

Now we can just handle it like a custom pipeline

nlp.buildimage(custompipe=pipe.vanillatransformerpipe) ```

Improvements to Snowflake integration

Docs have been updated to reflect changes
Compute pool parameters compute_pool_min_nodes,compute_pool_max_nodes,compute_pool_instance_family can now be configured in nlp.snowflake_common_setup with values defined in the Snowflake Documentation

Since the Snowflake integration uses the Docker, all of the changes for the docker utilities are reflected into Snowflake as well.

Instead of providing a nlu model reference for deploying a model, users have now 2 options

Deploy Pipeline via Name, Language and Bucket

python nlp.build_image(pipeline_name=pipe_name, pipeline_language=pipe_lang, pipeline_bucket=pipe_bucket)

Deploy custom pipeline by providing a fitted pipeline object

python custom_pipe = nlp.Pipeline(stages=[...]).fit(df) nlp.build_image(custom_pipe=custom_pipe)

Deploy NLU pipelines as container Any nlu pipeline can still be deployed ```python

Load Model

pipe = nlp.load(modelnluref)

Predict so that under the hood pipeline is fitted and vanillatransformerpipe attribute is avaiable

pipe.predict('')

Now we can just handle it like a custom pipeline

nlp.buildimage(custompipe=pipe.vanillatransformerpipe) ```

Version Bumps

The following dependency versions have been bumped: - Spark NLP to 6.0.3 - Enterprise NLP to 6.0.3

Bug Fixes

Fixed bug causing errors when creating Databricks cluster with nlp.install_to_databricks()

- Python
Published by C-K-Loan 8 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.2 Release

John Snow Labs 6.0.2 comes with the following upgrades:

Bump Enterprise-NLP to 6.0.2

- Python
Published by C-K-Loan 9 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.1 Release

John Snow Labs 6.0.1 comes with the following upgrades:

Bump Spark-NLP to 6.0.1
Bump Enterprise-NLP to 6.0.1

- Python
Published by C-K-Loan 9 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.0 Release

John Snow Labs 6.0.0 comes with the following upgrades: - Bump Spark-NLP to 6.0.0 - Bump Enterprise-NLP to 6.0.0 - Bump Visual-NLP to 6.0.0

Additionally a few bugfixes and improvements have been made: - Deprecated pkg_resources usage and refactored modules depending on it which caused import errors in some python versions - Made import of _shared_pyspark_ml_param more roubust which caused import errors in some python versions

- Python
Published by C-K-Loan 10 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.5 Release

John Snow Labs 5.5.5 comes with the following upgrades:

Bump Medical NLP to 5.5.3
Bump Spark-NLP to 5.5.3

- Python
Published by C-K-Loan 11 months ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.4 Release

John Snow Labs 5.5.4 comes with the following upgrades and fixes:

Bump Visual NLP to 5.5.0
Fix bug causing browser-based install to fail if you only have PAYG licenses in my.jsl

- Python
Published by C-K-Loan about 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs -

John Snow Labs 5.5.3 comes with the following upgrades:

Bump Enterprise NLP to 5.5.2
Bump Spark NLP to 5.5.2

- Python
Published by C-K-Loan about 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.2 Release

John Snow Labs 5.5.2 releases with the following upgrades:

Bump Spark-NLP to 5.5.1
Bump Enterprise NLP to 5.5.1
Bump Visual NP to 5.4.2

- Python
Published by C-K-Loan about 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.1 Release

John Snow Labs 5.5.1 comes with various improvements to the Snowflake utilities and docs.

New Snowflake Tutorial Notebook for creating Snowflake UDFs and calling them
Updated Snowflake Utility Docs
Automated Login to Docker Repo while creating Snowflake UDF
nlp.deploy_as_snowflake_udf now blocks until the Snowflake UDF is successfully created or timeout occurs
Less verbose logging when creating a snowflake UDF

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.0 Release

John Snow Labs 5.5.0 comes with the following upgrades:

Bump Enterprise NLP to 5.5.0
Bump Spark NLP to 5.5.0
Support for Pydantic >= 2.0

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.5 Release

John Snow Labs 5.4.5 comes with the following upgrades:

Bump Visual NLP to 5.4.1
Bump NLU to 5.4.1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.4 Release

John Snow Labs 5.4.4 is a hotfix release for import issues with Healthcare NLP to 5.4.1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.3 Release

John Snow Labs 5.4.3 has been released with the following upgrades:

Bump Spark NLP to 5.4.1
Bump Healthcare NLP to 5.4.1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.2 Release

bugfix for creating docker images via nlp.build_image

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.1 Release

John Snow Labs 5.4.0 has been released with the following changes:

Bump NLU to 5.4.1rc1

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.0 Release

We are very excited to Announce John Snow Labs 5.4.0 has been released, the entire JSL Suite has been upgraded to 5.4.0!

Changes:

Bump Spark NLP to 5.4.0
Bump Healthcare NLP to 5.4.0
Bump Visual NLP to 5.4.0
Bump NLU to 5.4.0

- Python
Published by C-K-Loan over 1 year ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.6 Library Release

bump NLU to 5.3.2
Fix Detection of Databricks environment https://github.com/JohnSnowLabs/johnsnowlabs/pull/1222 and follow up bugs in Endpoint Environments

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.5 Library Release

We are very excited to announce John Snow Labs 5.3.5 has been released! It features: - bump NLU to 5.3.1 - bump Spark-NLP to 5.3.2 - bump Medical NLP to 5.3.2 - bump Visual NLP to 5.3.2 - fixed bug that caused nbformat import exceptions

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.4 Library Release

John Snow Labs 5.3.4 Includes:

Bump Visual NLP to 5.3.1
Bump Medical NLP to 5.3.1
Fix bug with test paths

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - Snowflake UDFs and Docker Image creation for NLP, Healthcare and Visual Models in John Snow Labs 5.3.3

John Snow Labs 5.3.3 has been released with support for creating Snowflake UDF's from John Snow Labs Models as well as easy containerization of any John Snow Labs model, bundled with a simple REST API which supports text & file-prediction for any file-type supported by nlp.load() like PDF's, images and more!

For more infos see the new official documentation pages - Docker utility documentation - Snowflake utility documentation

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.1 Library Release

fix bug causing unexpected browser pop-up during install
fix bug causing databricks install to fail

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.1 Library Release

bump spark-nlp to 5.3.1

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.0 Library Release

We are excited to announce johnsnowlabs 5.3.0 has been released!

It Features - Bump Spark NLP to 5.3.0 - Bump Medical NLP to 5.3.0 - Bump Visual NLP to 5.3.0 - Bump NLU to 5.3.0 - Bump Spark NLP Display to 5.0.0 - Bugfixes for installing to existing Databricks Clusters

- Python
Published by C-K-Loan almost 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.8 Library Release

Fix bug with pip install

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.7 Library Release

Bump pyspark to 3.4.0
NerConverterInternal accessible via medical.NerConverter, finance.NerConverter, legal.NerConverter

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.6 Library Release

Fix minor bugs on Databricks marketplace causing some models to load improperly

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.5 Library Release

Fix minor bugs on Databricks marketplace causing some models to load improperly

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.4 Library Release

Fix bug on marketplace with when providing databricks host url with a trailing comma

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.3 Library Release

New Databricks Model Marketplace Utils and Notebook updates and tweaks

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.2 Library Release

New Databricks Model Marketplace Utils and Notebook

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.1 Library Release

bump Spark NLP to 5.2.2
bump Enterprise NLP to 5.2.1
bump NLU to 5.1.3

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.0 Library Release

bump Spark NLP to 5.2.0
bump Enterprise NLP to 5.2.0
bump Visual NLP to 5.1.2

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.9 Library Release

add IOB tagger to legal, finance and medical modules
bugfix for missing metadata in haystack
InternalDocumentSplitter for legal, finance and medical modules
bump visual to 510
bump medical to 514

- Python
Published by C-K-Loan about 2 years ago

https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.8 Library Release

Johnsnowlabs Haystack Integrations

Johnsnowlabs provides the following nodes which can be used inside the Haystack Framework for scalable pre-processing&embedding on spark clusters. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Haystack with Johnsnowlabs Tutorial Notebook and the new Haystack+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Haystack based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

```python

Create Pre-Processor which is connected to spark-cluster

from johnsnowlabs.llm import embeddingretrieval processor = embeddingretrieval.JohnSnowLabsHaystackProcessor( chunkoverlap=2, chunksize=20, explodesplits=True, keepseperators=True, patternsareregex=False, splitpatterns=["\n\n", "\n", " ", ""], trimwhitespace=True, )

Process document distributed on a spark-cluster

processor.process(some_documents) ```

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs in Haystack You must provide the NLU reference of a sentence embeddings to load it. If you want to use GPU with the Embedding Model, set GPU=True on localhost, it will start a spark-session with GPU jars. For clusters, you must setup cluster-env correctly, using nlp.installtodatabricks() is recommended.

```python from johnsnowlabs.llm import embeddingretrieval from haystack.documentstores import InMemoryDocumentStore

Write some processed data to Doc store, so we can retrieve it later

documentstore = InMemoryDocumentStore(embeddingdim=512) documentstore.writedocuments(some_documents)

Create Embedder which connects is connected to spark-cluster

retriever = embeddingretrieval.JohnSnowLabsHaystackEmbedder( embeddingmodel='en.embedsentence.bertbaseuncased', documentstore=documentstore, usegpu=False, )

Compute Embeddings distributed in a cluster

documentstore.updateembeddings(retriever) ```

Johnsnowlabs Langchain Integrations

Johnsnowlabs provides the following components which can be used inside the Langchain Framework for scalable pre-processing&embedding on spark clusters as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Langchain with Johnsnowlabs Tutorial Notebook and the new Langchain+Johnsnowlabs Documentation

JohnSnowLabsHaystackProcessor

Pre-Process you documents in a scalable fashion in Langchain based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters

```python from langchain.documentloaders import TextLoader from johnsnowlabs.llm import embeddingretrieval

loader = TextLoader('/content/stateofthe_union.txt') documents = loader.load()

from johnsnowlabs.llm import embedding_retrieval

Create Pre-Processor which is connected to spark-cluster

processor = embeddingretrieval.JohnSnowLabsLangChainCharSplitter( chunkoverlap=2, chunksize=20, explodesplits=True, keepseperators=True, patternsareregex=False, splitpatterns=["\n\n", "\n", " ", ""], trim_whitespace=True, )

Process document distributed on a spark-cluster

preprocesseddocs = jslsplitter.splitdocuments(documents)

```

JohnSnowLabsHaystackEmbedder

Scalable Embedding computation with any Sentence Embedding from John Snow Labs. You must provide the NLU reference of a sentence embeddings to load it. You can start a spark session by setting hardware_target as one of cpu, gpu, apple_silicon, or aarch on localhost environments. For clusters, you must setup the cluster-env correctly, using nlp.installtodatabricks() is recommended.

```python

Create Embedder which connects is connected to spark-cluster

from johnsnowlabs.llm import embeddingretrieval embeddings = embeddingretrieval.JohnSnowLabsLangChainEmbedder('en.embedsentence.bertbaseuncased',hardwaretarget='cpu')

Compute Embeddings distributed

from langchain.vectorstores import FAISS retriever = FAISS.fromdocuments(preprocesseddocs, embeddings).asretriever()

Create A tool

from langchain.agents.agenttoolkits import createretrievertool tool = createretrievertool( retriever, "searchstateofunion", "Searches and returns documents regarding the state-of-the-union." )

Use Create LLM Agent with the Tool

from langchain.agents.agenttoolkits import createconversationalretrievalagent from langchain.chatmodels import ChatOpenAI llm = ChatOpenAI(openaiapikey='YOURAPIKEY') agentexecutor = createconversationalretrievalagent(llm, [tool], verbose=True) result = agentexecutor({"input": "what did the president say about going to east of Columbus?"}) result['output']

Entering new AgentExecutor chain... Invoking: search_state_of_union with {'query': 'going to east of Columbus'} [Document(pagecontent='miles east of', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='in America.', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='out of America.', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='upside down.', metadata={'source': '/content/stateoftheunion.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address. Finished chain. I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address. ```

nlp.deployendpoint and nlp.queryendpoint

You can Query&Deploy John Snow Labs models with 1 line of code as Databricks Model Serve Endpoints.
Data is passed to the predict() function and predictions are shaped accordingly.
You must create endpoints from a Databricks cluster created by nlp.install.

See Cluster Creation Notebook and Databricks Endpoint Tutorial Notebook
These functions deprecate nlp.queryanddeployifmissing, which will be dropped in John Snow Labs 5.2.0

```python

You need `mlflow_by_johnsnowlabs` installed until next mlflow is released

! pip install mlflowbyjohnsnowlabs from johnsnowlabs import nlp nlp.deployendpoint('bert') nlp.queryendpoint('bert_ENDPOINT','My String to embed') ```

nlp.deploy_endpoint will register a ML-FLow model into your registry and deploy an Endpoint with a JSL license. It has the following parameters:

| Parameter | Description | |------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | model | Model to be deployed as endpoint which is converted into NluPipelines, supported classes are: String Reference to NLU Pipeline name like 'bert', NLUPipeline, List[Annotator], Pipeline, LightPipeline, PretrainedPipeline, PipelineModel. In case of a NLU reference, the endpoint name is auto-generated aus <nlu_ref>_ENDPOINT i.e. bert_ENDPOINT. '.' is replaced with '' in the nlu reference for the endpoint name | | `endpointname| Name for the deployed endpoint. Optional if using NLU model reference but mandatory for custom pipelines. | |recreateendpoint| if False, endpoint creation is skipped if one already exists. If True, it will delete existing endpoint if it exists | |recreatemodel| if False, model creation is skipped if one already exists. If True, model will be re-logged again, bumping the current version by 2 | |workloadsize| one of Small, Medium, Large. | |gpu|True/Falseto load GPU-optimized jars or CPU-optimized jars in the container. Must use a gpu basedworkloadtypeifgpu=true| |newrun| if True, mlflow will start a new run before logging the model | |blockuntildeployed| if True, this function will block until the endpoint is created | |workloadtype|CPUby default, useGPUSMALLto spawn a GPU based endpoint instead. Check Databricks docs for alternative values | |dbhost| the databricks host URL. If not specified, the DATABRICKS_HOST environment variable is used | |dbtoken` | the databricks Access Token. If not specified, the DATABRICKSTOKEN environment variable is used |

nlp.query_endpoint translates your query to JSON, sends it to the endpoint and returns the result as pandas DataFrame. It has the following parameters which are forwarded to the model.predict() call inside of the endpoint:

| Parameter | Description | |-----------------------------|---------------------------------------------------------------------------------------------------| | endpoint_name | Name of the endpoint to query | | query | str or list of strings or raw json string. If raw json, isjsonquery must be True | | is_json_query | if True, query is treated as raw json string | | output_level | One of token, chunk, sentence, relation, document to shape outputs | | positions | Set True/False to include or exclude character index position of predictions | | metadata | Set True/False to include additional metadata | | drop_irrelevant_cols | Set True/False to drop irrelevant columns | | get_embeddings | Set True/False to include embedding or not | | keep_stranger_features | Set True/False to return columns not named "text", 'image" or "filetype" from your input data | | multithread | Set True/False to use multi-Threading for inference. Auto-inferred if not set | | `dbhost| the databricks host URL. If not specified, the DATABRICKS_HOST environment variable is used | |dbtoken` | the databricks Access Token. If not specified, the DATABRICKSTOKEN environment variable is used |

nlp.query_endpoint and nlp.deploy_endpoint check the following mandatory env vars to resolve wheels for endpoints

| Env Var Name | Description | |-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| | HEALTHCARE_SECRET | Automatically set on your cluster if you run nlp.install() | | VISUAL_SECRET | Automatically set if you run. nlp.install(..., visual=True). You can only spawn visual endpoint from a cluster created by nlp.install(..., visual=True) | | JOHNSNOWLABS_LICENSE_JSON | JSON content of your john snow labs licensed to use for endpoints. Should be airgap license |

Version Bumps

Bug Fixes & Minor tweaks

Fixed a bug causing cached jars on johnsnowlabs home directory on databricks to not be used
fixed bugs with boto3 imports in certain envs
new parameter in nlp.run_in_databricks(return_job_url=True) to optionally return URL of Job