Recent Releases of https://github.com/johnsnowlabs/johnsnowlabs
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.1.0 Release
John Snow Labs 6.1.0 has been released with the following upgrades:
- Bump Enterprise-NLP to 6.1.0
- Bump Visual-NLP to 6.1.0
- Bump Spark-NLP to 6.1.1
- Python
Published by C-K-Loan 6 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.4 Release
John Snow Labs 6.0.4 comes with the following upgrades:
- Bump Enterprise-NLP to 6.0.4
- Bump Spark-NLP to 6.0.4
- Python
Published by C-K-Loan 7 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.3 Release
This release comes with improvements to the Snowflake and Docker Integrations, enabling now you to now deploy custom pipelines , any pretrained john snow labs pipeline or any NLU Pipelin as a Snowflake UDF or Docker container. Additionally Spark-NLP and Enterprise-NLP are bumped to version 6.0.3.
Improvements to Docker integration
Instead of providing a nlu model reference for deploying a model, users have now 2 options
Deploy Pipeline via Name, Language and Bucket
python
nlp.build_image(pipeline_name=pipe_name, pipeline_language=pipe_lang, pipeline_bucket=pipe_bucket)
Deploy custom pipeline by providing a fitted pipeline object
python
custom_pipe = nlp.Pipeline(stages=[...]).fit(df)
nlp.build_image(custom_pipe=custom_pipe)
** Deploy NLU pipelines as container**
Any nlu pipeline can still be deployed as a custom pipe ```python
Load Model
pipe = nlp.load(modelnluref)
Predict so that under the hood pipeline is fitted and vanillatransformerpipe attribute is avaiable
pipe.predict('')
Now we can just handle it like a custom pipeline
nlp.buildimage(custompipe=pipe.vanillatransformerpipe) ```
Improvements to Snowflake integration
- Docs have been updated to reflect changes
- Compute pool parameters
compute_pool_min_nodes,compute_pool_max_nodes,compute_pool_instance_familycan now be configured innlp.snowflake_common_setupwith values defined in the Snowflake Documentation
Since the Snowflake integration uses the Docker, all of the changes for the docker utilities are reflected into Snowflake as well.
Instead of providing a nlu model reference for deploying a model, users have now 2 options
Deploy Pipeline via Name, Language and Bucket
python
nlp.build_image(pipeline_name=pipe_name, pipeline_language=pipe_lang, pipeline_bucket=pipe_bucket)
Deploy custom pipeline by providing a fitted pipeline object
python
custom_pipe = nlp.Pipeline(stages=[...]).fit(df)
nlp.build_image(custom_pipe=custom_pipe)
Deploy NLU pipelines as container Any nlu pipeline can still be deployed ```python
Load Model
pipe = nlp.load(modelnluref)
Predict so that under the hood pipeline is fitted and vanillatransformerpipe attribute is avaiable
pipe.predict('')
Now we can just handle it like a custom pipeline
nlp.buildimage(custompipe=pipe.vanillatransformerpipe) ```
Version Bumps
The following dependency versions have been bumped: - Spark NLP to 6.0.3 - Enterprise NLP to 6.0.3
Bug Fixes
- Fixed bug causing errors when creating Databricks cluster with
nlp.install_to_databricks()
- Python
Published by C-K-Loan 8 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.2 Release
John Snow Labs 6.0.2 comes with the following upgrades:
- Bump Enterprise-NLP to 6.0.2
- Python
Published by C-K-Loan 9 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.1 Release
John Snow Labs 6.0.1 comes with the following upgrades:
- Bump Spark-NLP to 6.0.1
- Bump Enterprise-NLP to 6.0.1
- Python
Published by C-K-Loan 9 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 6.0.0 Release
John Snow Labs 6.0.0 comes with the following upgrades: - Bump Spark-NLP to 6.0.0 - Bump Enterprise-NLP to 6.0.0 - Bump Visual-NLP to 6.0.0
Additionally a few bugfixes and improvements have been made:
- Deprecated pkg_resources usage and refactored modules depending on it which caused import errors in some python versions
- Made import of _shared_pyspark_ml_param more roubust which caused import errors in some python versions
- Python
Published by C-K-Loan 10 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.5 Release
John Snow Labs 5.5.5 comes with the following upgrades:
- Bump Medical NLP to 5.5.3
- Bump Spark-NLP to 5.5.3
- Python
Published by C-K-Loan 11 months ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.4 Release
John Snow Labs 5.5.4 comes with the following upgrades and fixes:
- Bump Visual NLP to 5.5.0
- Fix bug causing browser-based install to fail if you only have PAYG licenses in my.jsl
- Python
Published by C-K-Loan about 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs -
John Snow Labs 5.5.3 comes with the following upgrades:
- Bump Enterprise NLP to 5.5.2
- Bump Spark NLP to 5.5.2
- Python
Published by C-K-Loan about 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.2 Release
John Snow Labs 5.5.2 releases with the following upgrades:
- Bump Spark-NLP to 5.5.1
- Bump Enterprise NLP to 5.5.1
- Bump Visual NP to 5.4.2
- Python
Published by C-K-Loan about 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.1 Release
John Snow Labs 5.5.1 comes with various improvements to the Snowflake utilities and docs.
- New Snowflake Tutorial Notebook for creating Snowflake UDFs and calling them
- Updated Snowflake Utility Docs
- Automated Login to Docker Repo while creating Snowflake UDF
nlp.deploy_as_snowflake_udfnow blocks until the Snowflake UDF is successfully created or timeout occurs- Less verbose logging when creating a snowflake UDF
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.5.0 Release
John Snow Labs 5.5.0 comes with the following upgrades:
- Bump Enterprise NLP to 5.5.0
- Bump Spark NLP to 5.5.0
- Support for Pydantic >= 2.0
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.5 Release
John Snow Labs 5.4.5 comes with the following upgrades:
- Bump Visual NLP to 5.4.1
- Bump NLU to 5.4.1
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.4 Release
John Snow Labs 5.4.4 is a hotfix release for import issues with Healthcare NLP to 5.4.1
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.3 Release
John Snow Labs 5.4.3 has been released with the following upgrades:
- Bump Spark NLP to 5.4.1
- Bump Healthcare NLP to 5.4.1
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.2 Release
- bugfix for creating docker images via
nlp.build_image
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.1 Release
John Snow Labs 5.4.0 has been released with the following changes:
- Bump NLU to 5.4.1rc1
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.4.0 Release
We are very excited to Announce John Snow Labs 5.4.0 has been released, the entire JSL Suite has been upgraded to 5.4.0!
Changes:
- Bump Spark NLP to 5.4.0
- Bump Healthcare NLP to 5.4.0
- Bump Visual NLP to 5.4.0
- Bump NLU to 5.4.0
- Python
Published by C-K-Loan over 1 year ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.6 Library Release
- bump NLU to 5.3.2
- Fix Detection of Databricks environment https://github.com/JohnSnowLabs/johnsnowlabs/pull/1222 and follow up bugs in Endpoint Environments
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.5 Library Release
We are very excited to announce John Snow Labs 5.3.5 has been released! It features: - bump NLU to 5.3.1 - bump Spark-NLP to 5.3.2 - bump Medical NLP to 5.3.2 - bump Visual NLP to 5.3.2 - fixed bug that caused nbformat import exceptions
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.4 Library Release
John Snow Labs 5.3.4 Includes:
- Bump Visual NLP to 5.3.1
- Bump Medical NLP to 5.3.1
- Fix bug with test paths
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - Snowflake UDFs and Docker Image creation for NLP, Healthcare and Visual Models in John Snow Labs 5.3.3
John Snow Labs 5.3.3 has been released with support for creating Snowflake UDF's from John Snow Labs Models as well as easy containerization of any John Snow Labs model, bundled with a simple REST API which supports text & file-prediction for any file-type supported by nlp.load() like PDF's, images and more!
For more infos see the new official documentation pages - Docker utility documentation - Snowflake utility documentation
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.1 Library Release
- fix bug causing unexpected browser pop-up during install
- fix bug causing databricks install to fail
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.1 Library Release
- bump spark-nlp to 5.3.1
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.3.0 Library Release
We are excited to announce johnsnowlabs 5.3.0 has been released!
It Features - Bump Spark NLP to 5.3.0 - Bump Medical NLP to 5.3.0 - Bump Visual NLP to 5.3.0 - Bump NLU to 5.3.0 - Bump Spark NLP Display to 5.0.0 - Bugfixes for installing to existing Databricks Clusters
- Python
Published by C-K-Loan almost 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.8 Library Release
Fix bug with pip install
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.7 Library Release
- Bump pyspark to 3.4.0
- NerConverterInternal accessible via medical.NerConverter, finance.NerConverter, legal.NerConverter
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.6 Library Release
Fix minor bugs on Databricks marketplace causing some models to load improperly
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.5 Library Release
Fix minor bugs on Databricks marketplace causing some models to load improperly
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.4 Library Release
Fix bug on marketplace with when providing databricks host url with a trailing comma
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.3 Library Release
New Databricks Model Marketplace Utils and Notebook updates and tweaks
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.2 Library Release
New Databricks Model Marketplace Utils and Notebook
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.1 Library Release
- bump Spark NLP to 5.2.2
- bump Enterprise NLP to 5.2.1
- bump NLU to 5.1.3
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.2.0 Library Release
- bump Spark NLP to 5.2.0
- bump Enterprise NLP to 5.2.0
- bump Visual NLP to 5.1.2
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.9 Library Release
- add IOB tagger to legal, finance and medical modules
- bugfix for missing metadata in haystack
- InternalDocumentSplitter for legal, finance and medical modules
- bump visual to 510
- bump medical to 514
- Python
Published by C-K-Loan about 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.8 Library Release
Johnsnowlabs Haystack Integrations
Johnsnowlabs provides the following nodes which can be used inside the Haystack Framework for scalable pre-processing&embedding on spark clusters. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Haystack with Johnsnowlabs Tutorial Notebook and the new Haystack+Johnsnowlabs Documentation
JohnSnowLabsHaystackProcessor
Pre-Process you documents in a scalable fashion in Haystack based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters
```python
Create Pre-Processor which is connected to spark-cluster
from johnsnowlabs.llm import embeddingretrieval processor = embeddingretrieval.JohnSnowLabsHaystackProcessor( chunkoverlap=2, chunksize=20, explodesplits=True, keepseperators=True, patternsareregex=False, splitpatterns=["\n\n", "\n", " ", ""], trimwhitespace=True, )
Process document distributed on a spark-cluster
processor.process(some_documents) ```
JohnSnowLabsHaystackEmbedder
Scalable Embedding computation with any Sentence Embedding from John Snow Labs in Haystack You must provide the NLU reference of a sentence embeddings to load it. If you want to use GPU with the Embedding Model, set GPU=True on localhost, it will start a spark-session with GPU jars. For clusters, you must setup cluster-env correctly, using nlp.installtodatabricks() is recommended.
```python from johnsnowlabs.llm import embeddingretrieval from haystack.documentstores import InMemoryDocumentStore
Write some processed data to Doc store, so we can retrieve it later
documentstore = InMemoryDocumentStore(embeddingdim=512) documentstore.writedocuments(some_documents)
Create Embedder which connects is connected to spark-cluster
retriever = embeddingretrieval.JohnSnowLabsHaystackEmbedder( embeddingmodel='en.embedsentence.bertbaseuncased', documentstore=documentstore, usegpu=False, )
Compute Embeddings distributed in a cluster
documentstore.updateembeddings(retriever) ```
Johnsnowlabs Langchain Integrations
Johnsnowlabs provides the following components which can be used inside the Langchain Framework for scalable pre-processing&embedding on spark clusters as Agent Tools and Pipeline components. With this you can create Easy-Scalable&Production-Grade LLM&RAG applications. See the Langchain with Johnsnowlabs Tutorial Notebook and the new Langchain+Johnsnowlabs Documentation
JohnSnowLabsHaystackProcessor
Pre-Process you documents in a scalable fashion in Langchain based on Spark-NLP's DocumentCharacterTextSplitter and supports all of it's parameters
```python from langchain.documentloaders import TextLoader from johnsnowlabs.llm import embeddingretrieval
loader = TextLoader('/content/stateofthe_union.txt') documents = loader.load()
from johnsnowlabs.llm import embedding_retrieval
Create Pre-Processor which is connected to spark-cluster
processor = embeddingretrieval.JohnSnowLabsLangChainCharSplitter( chunkoverlap=2, chunksize=20, explodesplits=True, keepseperators=True, patternsareregex=False, splitpatterns=["\n\n", "\n", " ", ""], trim_whitespace=True, )
Process document distributed on a spark-cluster
preprocesseddocs = jslsplitter.splitdocuments(documents)
```
JohnSnowLabsHaystackEmbedder
Scalable Embedding computation with any Sentence Embedding from John Snow Labs.
You must provide the NLU reference of a sentence embeddings to load it.
You can start a spark session by setting hardware_target as one of cpu, gpu, apple_silicon, or aarch on localhost environments.
For clusters, you must setup the cluster-env correctly, using nlp.installtodatabricks() is recommended.
```python
Create Embedder which connects is connected to spark-cluster
from johnsnowlabs.llm import embeddingretrieval embeddings = embeddingretrieval.JohnSnowLabsLangChainEmbedder('en.embedsentence.bertbaseuncased',hardwaretarget='cpu')
Compute Embeddings distributed
from langchain.vectorstores import FAISS retriever = FAISS.fromdocuments(preprocesseddocs, embeddings).asretriever()
Create A tool
from langchain.agents.agenttoolkits import createretrievertool tool = createretrievertool( retriever, "searchstateofunion", "Searches and returns documents regarding the state-of-the-union." )
Use Create LLM Agent with the Tool
from langchain.agents.agenttoolkits import createconversationalretrievalagent from langchain.chatmodels import ChatOpenAI llm = ChatOpenAI(openaiapikey='YOURAPIKEY') agentexecutor = createconversationalretrievalagent(llm, [tool], verbose=True) result = agentexecutor({"input": "what did the president say about going to east of Columbus?"}) result['output']
Entering new AgentExecutor chain... Invoking:
search_state_of_unionwith{'query': 'going to east of Columbus'}[Document(pagecontent='miles east of', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='in America.', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='out of America.', metadata={'source': '/content/stateoftheunion.txt'}), Document(pagecontent='upside down.', metadata={'source': '/content/stateoftheunion.txt'})]I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address. Finished chain. I'm sorry, but I couldn't find any specific information about the president's statement regarding going to the east of Columbus in the State of the Union address. ```
nlp.deployendpoint and nlp.queryendpoint
You can Query&Deploy John Snow Labs models with 1 line of code as Databricks Model Serve Endpoints.
Data is passed to the predict() function and predictions are shaped accordingly.
You must create endpoints from a Databricks cluster created by nlp.install.
See Cluster Creation Notebook
and Databricks Endpoint Tutorial Notebook
These functions deprecate nlp.queryanddeployifmissing, which will be dropped in John Snow Labs 5.2.0
```python
You need mlflow_by_johnsnowlabs installed until next mlflow is released
! pip install mlflowbyjohnsnowlabs from johnsnowlabs import nlp nlp.deployendpoint('bert') nlp.queryendpoint('bert_ENDPOINT','My String to embed') ```
nlp.deploy_endpoint will register a ML-FLow model into your registry and deploy an Endpoint with a JSL license.
It has the following parameters:
| Parameter | Description |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| model | Model to be deployed as endpoint which is converted into NluPipelines, supported classes are: String Reference to NLU Pipeline name like 'bert', NLUPipeline, List[Annotator], Pipeline, LightPipeline, PretrainedPipeline, PipelineModel. In case of a NLU reference, the endpoint name is auto-generated aus <nlu_ref>_ENDPOINT i.e. bert_ENDPOINT. '.' is replaced with '' in the nlu reference for the endpoint name |
| `endpointname| Name for the deployed endpoint. Optional if using NLU model reference but mandatory for custom pipelines. |
|recreateendpoint| if False, endpoint creation is skipped if one already exists. If True, it will delete existing endpoint if it exists |
|recreatemodel| if False, model creation is skipped if one already exists. If True, model will be re-logged again, bumping the current version by 2 |
|workloadsize| one of Small, Medium, Large. |
|gpu|True/Falseto load GPU-optimized jars or CPU-optimized jars in the container. Must use a gpu basedworkloadtypeifgpu=true|
|newrun| if True, mlflow will start a new run before logging the model |
|blockuntildeployed| if True, this function will block until the endpoint is created |
|workloadtype|CPUby default, useGPUSMALLto spawn a GPU based endpoint instead. Check Databricks docs for alternative values |
|dbhost| the databricks host URL. If not specified, the DATABRICKS_HOST environment variable is used |
|dbtoken` | the databricks Access Token. If not specified, the DATABRICKSTOKEN environment variable is used |
nlp.query_endpoint translates your query to JSON, sends it to the endpoint and returns the result as pandas DataFrame.
It has the following parameters which are forwarded to the model.predict() call inside of the endpoint:
| Parameter | Description |
|-----------------------------|---------------------------------------------------------------------------------------------------|
| endpoint_name | Name of the endpoint to query |
| query | str or list of strings or raw json string. If raw json, isjsonquery must be True |
| is_json_query | if True, query is treated as raw json string |
| output_level | One of token, chunk, sentence, relation, document to shape outputs |
| positions | Set True/False to include or exclude character index position of predictions |
| metadata | Set True/False to include additional metadata |
| drop_irrelevant_cols | Set True/False to drop irrelevant columns |
| get_embeddings | Set True/False to include embedding or not |
| keep_stranger_features | Set True/False to return columns not named "text", 'image" or "filetype" from your input data |
| multithread | Set True/False to use multi-Threading for inference. Auto-inferred if not set |
| `dbhost| the databricks host URL. If not specified, the DATABRICKS_HOST environment variable is used |
|dbtoken` | the databricks Access Token. If not specified, the DATABRICKSTOKEN environment variable is used |
nlp.query_endpoint and nlp.deploy_endpoint check the following mandatory env vars to resolve wheels for endpoints
| Env Var Name | Description |
|-----------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| HEALTHCARE_SECRET | Automatically set on your cluster if you run nlp.install() |
| VISUAL_SECRET | Automatically set if you run. nlp.install(..., visual=True). You can only spawn visual endpoint from a cluster created by nlp.install(..., visual=True) |
| JOHNSNOWLABS_LICENSE_JSON | JSON content of your john snow labs licensed to use for endpoints. Should be airgap license |
Version Bumps
Bug Fixes & Minor tweaks
- Fixed a bug causing cached jars on johnsnowlabs home directory on databricks to not be used
- fixed bugs with boto3 imports in certain envs
- new parameter in
nlp.run_in_databricks(return_job_url=True)to optionally return URL of Job
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.7 Library Release
- enterprise nlp bump to 5.1.2
- open source nlp bump to 5.1.2
- nlu bump to 5.0.4rc2
- support for deploying endpoints with GPU infrastructure in databricks via the
workload_typeparameter innlp.query_and_deploy - yarn mode support for EMR configs-
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.6 Library Release
- bump visual NLP to 5.0.2
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.5 Library Release
- bump NLU to 5.0.3
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.4 Library Release
- upgrade NLU to 5.0.2
- remove pandas >=2 downgrade for databricks clusters
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.3 Library Release
- Fix update Databricks cluster
- nlp.install(med_license=
) should work without aws keys for floating licenses - add nlp.installtodatabricks and add deprecation warning for nlp.install() when creating new databricks cluster. Will be dropped next release
- Fixed pandas to 1.5.3 for newly created Databricks clusters until NLU supports pandas>=2
new
parametersparameter in nlp.runindatabricks for parameterizing submitted databricks jobs and new documentationnew parameter
extra_pip_installswhich can be used to install additional pypi dependencies when creating a Databricks cluster or installing to an existing cluster.
example of extra_pip_installs
```python
nlp.installtodatabricks( databricksclusterid=clusterid, databrickshost=host, databrickstoken=token, extrapip_installs=["farm-haystack==1.21.2", "langchain"], ) ```
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.2 Library Release
- bump Healthcare NLP to 5.1.1
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.1 Library Release
- bump Enterprise NLP to 5.1.1
- bump Healthcare NLP to 5.1.1
- support for submitting jupyter notebooks in nlp.runindatabricks and new docs for notebook submission
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.1.0 Library Release
- bump Enterprise NLP to 5.1.0
- bump Healthcare NLP to 5.1.0
- bump Visual NLP to 5.0.1
- AWS EMR auto install & utilities see EMR cluster creation notebook and EMR Workshop and John Snow Labs EMR Docs
- AWS GLUE auto install & utilities see GLUE custer creation notebook and GLUE Workshop and John Snow Labs GLUE Docs
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.8 Library Release
nlp.queryanddeployifmissing() has been upgraded with new powerful features!
- support for
gpujar injection into endpoint containers support for all parameters of model.predict()
| Parameter | Description |
|-----------------------------|----------------------------------------------------------------------------------------------------|
| output_level | One of token, chunk, sentence, relation, document to shape outputs |
| positions | Set True/False to include or exclude character index position of predictions |
| metadata | Set True/False to include additional metadata |
| drop_irrelevant_cols | Set True/False to drop irrelevant columns |
| get_embeddings | Set True/False to include embedding or not |
| keep_stranger_features | Set True/False to return columns not named "text", 'image" or "file_type" from your input data |
| multithread | Set True/False to use multi-Threading for inference. Auto-inferred if not set |
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.7 Library Release
Hotfix for bad package
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.6 Library Release
clean_clusterin nlp.install() to clean databricks cluster before installing johnsnowlabs software. Default Truewrite_db_credentialsin nlp.install() to write databricks host and access token into env variables which will be used for endpoint creation. Default True- fixed bug which caused visual library to be installed to databricks cl usters, even if visual=False
- updated documentation
- New powerful 1-liner
nlp.query_and_deploy_if_missing()which deploys a john snow labs model as databricks serve endpoint and queries it. If model is already deployed, it will not be deployed again. For more details see
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.5 Library Release
- new
block_till_completeparameter in nlp.runindatabricks and logging of Databricks Task URL for monitoring - optimized Databricks configs for Visual NLP clusters
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.4 Library Release
- bump spark-nlp to 5.0.2
- bum healthcare to 5.0.2
- bump ocr to 5.0.0
- bump nlu to 5.0.0
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.1 Library Release
- bump spark-nlp to 5.0.1
- bum healthcare to 5.0.1
bump ocr to 4.4.4
nlp.install(hardware_target='m1')is nownlp.install(hardware_target='apple_silicon')nlp.start(hardware_target='m1')is nownlp.start(hardware_target='apple_silicon')
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 5.0.0 Library Release
We are very excited to announce John Snow Labs 5.0.0 has been released.
- bump spark-nlp to 5.0.0
- bump enterprise-nlp to 5.0.0
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.11 Library Release
hotfix pydantic version to 1.10.11 because of validation bug
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.10 Library Release
Add ChunkFiltererApproach to finance, legal and medical modules
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.9 Library Release
- bump Spark-NLP to 4.4.4
- bump Enterpprise-NLP to 4.4.4
- bump Visual-NLP to 4.4.3
Improved nlp.install() :
When providing credentials with outdated secrets for the library versions, they will automatically be upgraded to the latest recommended versions, as long as you have a valid license and settings.enforce_versions=True
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.8 Library Release
- upgrade NLU to 4.2.2
- support for
JOHNSNOWLABS_LICENSE_JSONas single env variable to provide credentials. This is the raw json string of your license file.
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.7 Library Release
- bump visual NLP to 4.4.2
- bump enterprise NLP to 4.4.3
- bump NLU to 4.2.1
https://github.com/JohnSnowLabs/johnsnowlabs/issues/333 and partially https://github.com/JohnSnowLabs/johnsnowlabs/issues/348
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.6 Library Release
John Snow Labs 4.4.6 Library ReleaseWe are very excited to announce johnsnowlabs 4.4.6 has been released!
Features :
- create_jsl_home_if_missing parameter added to nlp.start() which can be set to False to disable the creation of the
~/.johnsnowlabs directory. This is useful when jars are provided directly via jar_paths parameter.
Dynamic Wheel resolution for spark nlp, enabling you to set settings.nlp_version='4.4.2' and it will automatically use the appropriate jars and wheels when starting a session or building an envirornment.
Fixed erronous handling of enterprise-secrets which have
<VERSION>.<PR-NUM>.<COMMIT_HASH>pattern,Bump Enterprise NLP version to 4.4.2
Bump OCR version to 4.4.1
- Python
Published by C-K-Loan over 2 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.5 Library Release
bump enterprise NLP to 4.4.1
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.4 Library Release
- bugfix in databricks installation
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.3 Library Release
- Bump Spark-NLP to 441, introducing
DistilBertForZeroShotClassificationto the nlp module - Fix type hint bug
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.2 Library Release
Open Source NLP:
- Bump version to 4.4.0
- Support for ConvNextForImageClassification, BartTransformer, BertForZeroShotClassification
Enterprise NLP:
- Bump version to 4.4.0
- Support for QuestionAnswering, TextGenerator, Summarizer, and WindowedSentenceModel in Finance, Legal and Medical modules
Visual NLP:
- PretrainedPipeline support via visual.PretrainedPipeline
General:
Enhanced backward compatibility mechanisms which ensure all available classes, functions and submodules are usable for the finance, legal, and medical modules, even if an outdated version of Enterprise NLP with breaking changes is installed.
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.1 Library Release
Downgrade Enterprise and Spark NLP to 4.3.2 for the workshop
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.4.0 Library Release
Open Source NLP : - Bump version to 4.4.0 - Support for ConvNextForImageClassification, BartTransformer, BertForZeroShotClassification
Enterprise NLP: - Bump version to 4.4.0 - Support for QuestionAnswering, TextGenerator, Summarizer, WindowedSentenceModel in Finance, Legal and Medical modules
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.5 Library Release
- fix bug that caused failure when installing enterprise-nlp 4.3.2
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.4 Library Release
- Better Secret Handling for Databricks
- Bump NLU Version to 4.2.0
- Bump NLP Version to 4.3.2
- Bump Enterprise NLP Version to 4.3.2
- Bump Visual Version to 4.3.3
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.3 Library Release
- Bump Visual NLP to 4.3.1
- access to visual.LightpPipeline
- Python
Published by C-K-Loan almost 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.2 Library Release
Bump NLU version Update client ID to dedicated jsl-lib secret for OAUTH johnsnowlabs-4.3.2.tar.gz
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.1 Library Release
- Hotfix for bug that causes pip install to fail because of dependency conflicts from NLU
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.3.0 Library Release
- bump enterprise NLP and open source NLP to 4.3.0
- Generic Log Reg and Generic SVM available for
finance,legalandmedicalmodules - Hubert, Swin Transformer, Zero-SHOT NER, CamemBert for QA for
nlpmodule
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.9 Library Release
- New TextSplitter Annotator for Finance & Legal which is just a use-case focused alias for
SentenceDetector - Fix bug with NLP module not properly refreshing attached classes after running
nlp.install()
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.5 Release
- visual to 4.2.4 bump
- enterprise nlp to 4.2.5 bump
- traininglogparser
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.3 Release
- version bumps
- docstring updates
- new nlp annotators from enterprise 4.2.1
https://github.com/JohnSnowLabs/johnsnowlabs/pull/12
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.4 Release
bump nlp-enterprise version fix bad import mapping for some legal annotators fix bug with setting license env variables on Databricks
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.8 Library Release
- DocMapper, DocMapperApproach, DocObfuscator, DocMlClassifier, Resolution2Chunk, ContextParserModel for finance,legal and healthcare
- Upgrade Enterprise NLP to 4.2.8
- Upgrade Open Source NLP to 4.2.8
- Upgrade Visual NLP to 4.3.0
- better error messages when importing modules fails
- Improving various error messages
- fix bug causing dependency on nbformat
- fix bugs with handling paths incorrectly on windows
- Python
Published by C-K-Loan about 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.2 - Spark NLP version Bump to 4.2.1
We are glad to announce johnsnowlabs 4.2.2 has been released!
Changes :
- Version Bump Spark NLP to 4.2.1
- Fix minor bug with type conversion during pypi standard instal
- Python
Published by C-K-Loan over 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs 4.2.1 Release - No more restarts required after installing licensed libs
We are pleased to announce that version 4.2.1 of the johnsnowlabs the library has been released!
It comes with one crucial improvement:
No more notebook restarts required after running jsl.install()
- Python
Published by C-K-Loan over 3 years ago
https://github.com/johnsnowlabs/johnsnowlabs - John Snow Labs Library 4.2.0 Release
We are announcing with incredible excitement the release of the John Snow Labs 4.2.0 Library! It introduces - New Enterprise Syntax to easily access any feature of any JSL-Library. - Highly configurable Automatic Installers with various Authorization Flows and Installation Targets like 1-Click OAUTH, 1 Line Databricks, 1 Line for new Enterprise Compatible venv and extended Offline support. - Easily run a Python Function, Raw Python Code Snippet, Python Script or Python Module in a Databricks cluster in 1 line of code and create one if missing. - Smart License/Jar/Wheel Caching, never type your license twice on the same machine when starting up a SparkSession or Re-Installing licensed libs! - Various of Safety Mechanisms and Footguns removed, to reduce injuries :)
Introducing the new Enterprise Syntax for working with all of John Snow Labs libraries.
It bundles every relevant Function and Class you might ever need when working with JSL-Libraries into 1 simple import line.
from johnsnowlabs import *
This single import gets your thorugh all of the certification Notebooks, with exception of a few third party libraries.
The following modules will become avaiable :
Links to existing prodcuts
Usage&Overview for more details on import structure
nlp.MyAnno()andnlp.my_function()for every of Spark NLP's Python Functions/Classes/Modulesocr.MyAnno()andocr.my_function()for every of Spark OCR's Python Functions/Classes/Moduleslegal.MyAnno()andlegal.my_function()for every of Spark For Legal Python Functions/Classes/Modulesfinance.MyAnno()andfinance.my_function()for every of Spark For Finance Python Functions/Classes/Modulesmedical.MyAnno()andmedical.my_function()for every of Spark For Medical Python Functions/Classes/Modulesviz.MyVisualizer()for every of Spark NLP-Display Classesjsl.load()andjsl.viz()from NLU
New Powerful Installation and Spark Session Start
The John Snow Labs libary aims to make installing Licensed Libraries and starting a Sparksession as easy as possible. Installation Docs & Launch a Spark Session Docs
jsl.install()- Authorization Flows (proove you have a License):
- Auto-Detect Environment Variables
- Auto Detect license files in current working dir
- Auto Detect cached license information that was stored in
~/.johnsnowlabsfrom previous uns - Auto-Inject Local Browser Based OAuth
- Auto-Inject Colab Button based Oauth
- Manual Variable Definition
- Manual Json Path
- Access Token
- Installation Targets (Where to install to?):
- Currently running Python Process
- Into a Python environment, which is not the currently running Process
- Into a provided Venv
- Into a freshly created venv by the john snow labs library
- Airgap, by creating easy copy-pastable Zip file with all Jar/Wheels/Licenses to run in airgap
- Databricks
- Authorization Flows (proove you have a License):
jsl.start()- After having run
jsl.install()you can just runjsl.start().It remembers the license that was used to install and also has all jars pre-downloaded. Additionally, it gives very helpful Logs when launching a session, telling you loaded jars and their versions. You can even load a new license duringjsl.start(), which supports all of the previously mentioned authorization flows.
- After having run
License Management
List all of your usable jsl licenses with jsl.list_remote_licenses()
And your locally cached licenses with jsl.list_local_licenses()
Databricks Utils
Easily submit any task to a Databricks cluster, in various formats, see Utils for databricks Docs
Run a Raw Python Code String in a Cluster and also create on on the fly. ```python from johnsnowlabs import * script = """ import nlu print(nlu.load('sentiment').predict('That was easy!'))"""
clusterid = jsl.install(jsonlicensepath=mylicense, databrickshost=myhost,databrickstoken=mytoken) jsl.runindatabricks(script, databricksclusterid=clusterid, databrickshost=myhost, databrickstoken=mytoken, runname='Python Code String Example') ```
Run a Python Function in a Cluster. ```python def myfunction(): import nlu medicaltext = """A 28-year-old female with a history of gestational diabetes presented with a one-week history of polyuria , polydipsia , poor appetite , and vomiting .""" df = nlu.load('en.medner.diseases').predict(medicaltext) for c in df.columns: print(df[c])
my_function will run on databricks
jsl.runindatabricks(myfunction, databricksclusterid=clusterid, databrickshost=myhost, databrickstoken=mytoken, run_name='Function test')
```
Run a Python Script in a Cluster.
python
jsl.run_in_databricks('path/to/my/script.py',
databricks_cluster_id=cluster_id,
databricks_host=my_host,
databricks_token=my_token,
run_name='Script test ')
Run a Python Module in a Cluster
python
import johnsnowlabs.auto_install.health_checks.nlp_test as nlp_test
jsl.run_in_databricks(nlp_test,
databricks_cluster_id=cluster_id,
databricks_host=my_host,
databricks_token=my_token,
run_name='nlp_test')
Testing Utils
You can use the John Snow Labs library to automatically test 10000+ models and 100+ Notebooks in 1 line of code within a small machine like a single Google Colab Instance and generate very handy error reports of potentially broken Models, Notebooks or Models hub Markdown Snippets.
Automatically test Notebooks/Modelshub Markdwon via URL, File-Path and many more options!
Workshop Notebook Testing Utils
See Utils for Testing Notebooks docs
```python from johnsnowlabs.utils.notebooks import test_ipynb
Test a Local Markdown file with a Python Snippet
test_ipynb('path/to/local/notebook.ipynb')
Test a Modelshub Python Markdown Snippet via URL
testipynb('https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/CertificationTrainings/Healthcare/5.Spark_OCR.ipynb',)
Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout
test_ipynb('my/notebook/folder')
Test an Array of URLS/Paths to Markdown Fies
testipynb([ 'https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/tutorials/CertificationTrainings/Healthcare/5.Spark_OCR.ipynb', 'path/to/local/notebook.ipynb',])
Run ALL notebooks in the Certification Folder
testresult = testipynb('WORKSHOP')
Only run Finance notebooks
testresult = testipynb('WORKSHOP-FIN')
Only run Legal notebooks
testresult = testipynb('WORKSHOP-LEG')
Only run Medical notebooks
testresult = testipynb('WORKSHOP-MED')
only run Open Source notebooks
testresult = testipynb('WORKSHOP-OS') ```
Modelshub Testing Utils
See Utils for Testing Models & Modelshub Markdown Snippets Docs
```python from johnsnowlabs.utils.modelhubmarkdown import testmarkdown
Test a Local Markdown file with a Python Snippet
test_markdown('path/to/my/file.md')
Test a Modelshub Python Markdown Snippet via URL
testmarkdown('https://nlp.johnsnowlabs.com/2022/08/31/legpipedeid_en.html')
Test a folder of Markdown Snippets and generate a Report file, which captures all stderr/stdout
test_markdown('my/markdown/folder')
Test an Array of URLS/Paths to Markdown Fies
testipynb(['legpipedeiden.html','path/to/local/markdownsnippet.md',]) ```
New Documentation Pages
- Installation
- Launch a Spark Session
- Usage and imports overview
- Settings & Caching
- Utils for databricks
- Utils for Testing Notebooks
- Utils for Testing Models & Modelshub Markdown Snippets
- Release Notes
- Python
Published by C-K-Loan over 3 years ago