Recent Releases of https://github.com/lablup/backend.ai
https://github.com/lablup/backend.ai - 25.13.4
Fixes
- Add missing scheduler options to AllowedScalingGroup and update related components (#5730)
Full Changelog
Check out the full changelog until this release (25.13.4).
Full Commit Logs
Check out the full commit logs between release (25.13.3) and (25.13.4).
- Python
Published by github-actions[bot] 6 months ago
https://github.com/lablup/backend.ai - 25.13.3
Fixes
- Improve HTTP request proxying in the webserver to be transparent with content-encoding (#5709)
- Add null-user check in resource usage query (#5712)
- Ensure id parameter of chown function is an int (#5713)
- Refresh agent fields in kernel when rescheduling (#5717)
- Fix issue where App-Proxy failed to query worker circuits due to incorrect variable reference (#5718)
- Add missing network cleanup when creating overlay network (#5721)
Full Changelog
Check out the full changelog until this release (25.13.3).
Full Commit Logs
Check out the full commit logs between release (25.13.2) and (25.13.3).
- Python
Published by github-actions[bot] 6 months ago
https://github.com/lablup/backend.ai - 25.13.2
Features
- The mouse-selected or copy-mode selected texts in the intrinsic ttyd app with tmux are now directly copied to the user-side clipboard, without needing to
set mouse=offin the tmux session (#5688) - feat: Improvement redis keys command to scan_iter for manager cli (#5704)
Fixes
- Add missing all-smi manpage file in the wheel packages (#5685)
- Updated RedisProfileTarget to handle cases where 'addr' is missing or None in the input data, preventing errors during address parsing. (#5695)
- fixes a duplicate joins issue during serialization when using pydantic by removing the join filter from the TOMLStringListField's _transform method. (#5700)
- Fix coordinator not performing health check for all endpoints (#5702)
- Fix session creation failing with
not allowed scaling grouperror (#5706) - Enhance endpoint creation logic to update existing records and handle circuits (#5707)
Full Changelog
Check out the full changelog until this release (25.13.2).
Full Commit Logs
Check out the full commit logs between release (25.13.1) and (25.13.2).
- Python
Published by github-actions[bot] 6 months ago
https://github.com/lablup/backend.ai - 25.13.1
Fixes
- Fix session ordering in sessionpendingqueue query resolver (#5682)
- fix: Ensure redis address is nullable (#5683)
Full Changelog
Check out the full changelog until this release (25.13.1).
Full Commit Logs
Check out the full commit logs between release (25.13.0) and (25.13.1).
- Python
Published by github-actions[bot] 6 months ago
https://github.com/lablup/backend.ai - 25.13.0
Features
- Introduce
strawberry, and strawberry-basedArtifactRegistryGQL types (#5232) - Add
ModelDeployment,ModelRevisionstrawberry GQL types migrated from existing federated graphene schema (#5249) - Open-source and integrate Backend.AI App Proxy into the main codebase (#5275)
- Add
storagesAPI to storage proxy (#5286) - Add OpenTelemetry and service discovery configuration to appproxy (#5296)
- Implement connection monitoring and reconnection logic in ValkeyStandaloneClient (#5298)
- Implement Sokovan orchestrator architecture (#5361)
- Add
HuggingFacescanner, and API to storage proxy (#5362) - Split out container log processing to a more concrete
ValkeyContainerLogClient(based onValkeyClientwith default behavior) and use a separate Redis instance dedicated for log streaming (#5375) - Implement scheduling prioritizers (#5378)
- Add validators for scheduling (#5380)
- Ship
all-smiso that users can execute it inside any session container (#5381) - Implement sokovan scheduler agent selectors (#5383)
- Integrate Agent selector with allocator in sokovan orchestrator (#5393)
- Add
UserNodeas a field ofComputeSessionNode(#5403) - Enhance Scheduler allocation logic and add comprehensive tests (#5404)
- Add allocation methods in scheduler repository (#5406)
- Add TTL support to Redis key operations in AppProxy (#5416)
- Unify separate GraphQL subgraph endpoints into single Apollo Router supergraph with web-server proxy integration to enable single endpoint access for clients (#5419)
- Integrate sokovan orchestrator in manager (#5421)
- Add
sourcefield to roles table to distinguish system-defined roles from custom-defined roles, enabling automatic permission grants for system roles when new entity types or operations are introduced (#5440) - Add phase tracking in scheduling (#5441)
- Implement scheduler coordinator in sokovan orchestrator (#5455)
- Changed the behavior to terminate "terminating session" in batch processing (#5467)
- Implement session sweeping functionality and related handlers (#5485)
- Inject
storagesconfig to storage-proxy (#5491) - Add
object_storagestable to DB (#5498) - Add request_timeout configuration for Redis clients (#5502)
- Add decrementkeypairconcurrencies method and update session termination logic (#5504)
- Add
hugging_registriesDB table, and GQL schema (#5508) - Replace the existing
ArtifactGroupmodel withArtifact, and replaceArtifactwithArtifactRevision(#5510) - Integrate
Artifactservice to Manager (#5514) - Add Valkey client for Background Task Manager (#5519)
- Improve
logging.BraceStyleAdapterto support user-defined kwargs and access toextradata including contextual fields. (#5523) - Add Background Task heartbeat loop to refresh TTL (#5531)
- Modify value reading to avoid cache-based scheduling (#5533)
- Implement scheduling controller (#5547)
- Implement kernel state engine (#5551)
- Add Background Task retry loop (#5555)
- Allow specifying multiple endpoint addresses in the etcd config (#5564)
- Update session limits to allow None and 0 as indicators for unlimited concurrent sessions (#5567)
- Add configuration option for Sokovan orchestrator usage (#5568)
- Implement health monitoring for scheduling operations (#5569)
- Enhance session management by adding checks for truly stuck pulling and creating sessions (#5570)
- Add Valkey Client TLS configuration (#5573)
- Implement Generalized pagination on Strawberry GQL API (#5575)
- Implement session transition hooks for various session types (#5579)
- Implement deployment management with Sokovan integration (#5580)
- Implement batch scheduling events and event propagation through Event Hub (#5589)
- Apply centralized distributed locking for Sokovan scheduling operations (#5592)
- Implement cache-through pattern for keypair concurrency management in SchedulerRepository (#5594)
- Apply READ COMMITTED isolation level for scheduler operations (#5600)
- Add Volume Pool field to
RootContextof Storage-Proxy (#5603) - Add Bgtask handler Registry (#5606)
- Implement Valkey-based leader election in manager (#5607)
- Apply retry feature to VFolder clone bgtask (#5611)
- Add
object_storage_metaDB table for managing buckets (#5617) - Add operation metrics observer for session termination tracking (#5623)
- Implement EventPropagatorMetricObserver for tracking event propagator metrics (#5630)
- Apply cache propagator when broadcasting scheduling event (#5638)
- Implement deployment controller and integrate with sokovan orchestrator (#5639)
- Added automated GraphQL supergraph generation using rover CLI to CI pipeline for improved schema management (#5645)
- Add
--waitoption tobackend.ai eventscommand for easier scripting and automation (#5650) - Implement session wait logic in AgentRegistry for improved scheduling handling (#5659)
- Manage object storage buckets using
storage_namespace(#5667) - Add scheduling detail info for pending sessions (#5676)
Fixes
- Correct the asyncio connection sharing pattern in alembic
env.pyso that we could usealembic-rebase.pyscript and other alembic-based automation seamlessly. (#5151) - Use persistent
aiohttp.ClientSessioninstances per route in App Proxy circuits to benefit from keep-alive connections and resource reuse (#5287) - Add missing resolver of VFolder permissions field in Compute session node (#5322)
- Let insepct.signature handle stringified types generated by
__future__annotations by setting theeval_stroption to True (#5325) - Handle None user when request context setup in auth middleware (#5327)
- Add missing database transaction retry logic when setting network ID of new sessions (#5329)
- Apply memoization to the scheduler plugin loaders to reduce runtime overheads when running the scheduler loop (#5342)
- Broken Agent, Webserver in HA development environment (#5343)
- Add missing components in HA development environment (#5345)
- Make
--log-leveland--debugflag behavior and description consistent across allstart-servercommands (#5366) - Defer imports in the CLI and server entrypoints to reduce CLI startup times and avoid unnecessary cross-component imports (#5372)
- Fix and improve optimization to glob-based BUILD file scanning when loading CLI entrypoints, improving the CLI command initialization latency for about 15% (e.g., 3.5 sec -> 3.0 sec) (#5377)
- Fix missing
event_logstable creation when populating the database schema withmgr schema oneshot, which may have caused issues in fresh installations (#5391) - Add Docker image rescan exception handling logic when the image config is
None(#5394) - Serialize
ResourceSlottype values in GQL resolvers (#5433) - Remove wrong
ImageRefstringification when push image (#5434) - Fix wrong image type search logic used in
ImageNodeinstance creation (#5435) - Wrong error handling of
UserCRUD mutations (#5446) - Fix App Proxy health-check done against sub-kernels due to misgeneration of route information for cluster sessions, which had wrongly included sub-kernel service ports (#5447)
- Change
Circuitquery logic in AppProxy forCircuitobject's serializability (#5448) - Wrong request header validation error handling of Webserver (#5449)
- Fix NUMA-aware affinity allocation to find the larged connected component with the most remaining resource capacity when grouping devices per NUMA node (#5454)
- Prevent the Agent from producing error events in the heartbeat loop to avoid loop termination due to Redis connection failures (#5469)
- Fixed session being incorrectly set on failed login attempts and ensured
X-BackendAI-SessionIDheader is always included when login succeeds (#5473) - Respect the inherited ulimits when setting ulimits of new containers (#5489)
- Handle clone tasks through events to avoid clone status hanging caused by potential termination of clone-tracking tasks (#5493)
- Document the internal logic of affinity-aware device allocation and improve error messages (#5521)
- Update all import occurrences of
BraceStyleAdapterin App Proxy to use the coreai.backend.loggingpackage so that the App Proxy codebase is compatible with #5523 (#5550) - Fix manager.wsproxy's HTTP requests to use relative URLs as the default factory sets
base_urlofaiohttp.ClientSessioninstances (#5576) - Improve agent idempotency when provisioning kernel resources to better support the new Sokovan scheduler's retry mechanisms (#5584)
- Fix NUMA node alignment of subsequent device allocation (#5587)
- Fix regression of agent's per-package log-level configurations (#5614)
- Enhance kernel status handling for resource occupancy tracking (#5619)
- Fix hanging kernel creation in the new Sokovan scheduler when the host account's UID/GID is not 1000 (#5626)
- Rename
object_storage_metatable toobject_storage_namespace(#5666)
External Dependency Updates
- Upgrade the base CPython version from 3.13.3 to 3.13.7 (#5536)
- Update all-smi binaries to v0.8.0 (#5588)
- Update all-smi binaries to v0.9.0 (#5677)
Miscellaneous
- Refactor the import structure for
RepositoryArgsby moving it to a dedicatedai.backend.manager.repositories.typesmodule (#5409) - Upgrade the CI toolchain such as Pantsbuild (2.23.1 -> 2.27.0), Ruff (0.8.5 -> 0.12.9), and Mypy (1.15.0 -> 1.17.1) with merging BUILD files for faster dependency resolution to reduce human mistakes on managing them and cleaning up various lint warnings (#5529)
- Add Strawberry GraphQL mypy plugin to fix mypy compatibility issues with cutom types from Strawberrt GraphQL (#5574)
Test Updates
- Add Backgroundtask unit tests (#5625)
Full Changelog
Check out the full changelog until this release (25.13.0).
Full Commit Logs
Check out the full commit logs between release (25.12.1) and (25.13.0).
- Python
Published by github-actions[bot] 6 months ago
https://github.com/lablup/backend.ai - 25.11.3
No significant changes.
Full Changelog
Check out the full changelog until this release (25.11.3).
Full Commit Logs
Check out the full commit logs between release (25.11.2) and (25.11.3).
- Python
Published by github-actions[bot] 6 months ago
https://github.com/lablup/backend.ai - 25.13.0rc1
Features
- Introduce
strawberry, and strawberry-basedArtifactRegistryGQL types (#5232) - Add
ModelDeployment,ModelRevisionstrawberry GQL types migrated from existing federated graphene schema (#5249) - Open-source and integrate Backend.AI App Proxy into the main codebase (#5275)
- Add
storagesAPI to storage proxy (#5286) - Add OpenTelemetry and service discovery configuration to appproxy (#5296)
- Implement connection monitoring and reconnection logic in ValkeyStandaloneClient (#5298)
- Implement Sokovan orchestrator architecture (#5361)
- Implement scheduling prioritizers (#5378)
- Add validators for scheduling (#5380)
- Ship
all-smiso that users can execute it inside any session container (#5381) - Implement sokovan scheduler agent selectors (#5383)
- Integrate Agent selector with allocator in sokovan orchestrator (#5393)
- Add
UserNodeas a field ofComputeSessionNode(#5403) - Enhance Scheduler allocation logic and add comprehensive tests (#5404)
- Add allocation methods in scheduler repository (#5406)
- Add TTL support to Redis key operations in AppProxy (#5416)
- Integrate sokovan orchestrator in manager (#5421)
Fixes
- Correct the asyncio connection sharing pattern in alembic
env.pyso that we could usealembic-rebase.pyscript and other alembic-based automation seamlessly. (#5151) - Add missing resolver of VFolder permissions field in Compute session node (#5322)
- Let insepct.signature handle stringified types generated by
__future__annotations by setting theeval_stroption to True (#5325) - Handle None user when request context setup in auth middleware (#5327)
- Add missing database transaction retry logic when setting network ID of new sessions (#5329)
- Apply memoization to the scheduler plugin loaders to reduce runtime overheads when running the scheduler loop (#5342)
- Broken Agent, Webserver in HA development environment (#5343)
- Add missing components in HA development environment (#5345)
- Make
--log-leveland--debugflag behavior and description consistent across allstart-servercommands (#5366) - Defer imports in the CLI and server entrypoints to reduce CLI startup times and avoid unnecessary cross-component imports (#5372)
- Fix and improve optimization to glob-based BUILD file scanning when loading CLI entrypoints, improving the CLI command initialization latency for about 15% (e.g., 3.5 sec -> 3.0 sec) (#5377)
- Fix missing
event_logstable creation when populating the database schema withmgr schema oneshot, which may have caused issues in fresh installations (#5391) - Add Docker image rescan exception handling logic when the image config is
None(#5394)
Miscellaneous
- Refactor the import structure for
RepositoryArgsby moving it to a dedicatedai.backend.manager.repositories.typesmodule (#5409)
Full Changelog
Check out the full changelog until this release (25.13.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.12.1) and (25.13.0rc1).
- Python
Published by github-actions[bot] 7 months ago
https://github.com/lablup/backend.ai - 25.12.1
Features
- Agent heartbeat handler queries Kernel ids instead of Agent id (#4766)
- Implement ActionValidator (#5244)
- Implement reconnection logic in ValkeySentinelClient (#5276)
Improvements
- Apply simple model query pattern for readability (#4767)
Fixes
- Fix model service creation failure when
service-definition.tomlis missing (#5264) - Fix model service deletion failure for non super-admin users (#5266)
- Broken VFolder
Cloneservice (#5269) - Fixed a problem with deserializing dataclass (#5271)
- Fix broken VFolder
GetTaskLogsservice (#5272) - Add missing TRACE log-level option in ai.backend.logging package (#5274)
status_datanot initialized properly when creating multi node session (#5280)- Apply a workaround to avoid segfault upon fast termination of
mgr etcdCLI commands that queries and updates etcd configurations (#5283)
Full Changelog
Check out the full changelog until this release (25.12.1).
Full Commit Logs
Check out the full commit logs between release (25.12.0) and (25.12.1).
- Python
Published by github-actions[bot] 7 months ago
https://github.com/lablup/backend.ai - 25.12.0
Breaking Changes
- - Health check capability temporarily broken on OSS AppProxy due to architectural changes
- Users must disable health check feature in
model-definition.yamlto use model services on Open Source Backend.AI - OSS AppProxy support will be restored in future releases (#5134)
- Users must disable health check feature in
Features
- Add VFolder share test verifying shared project vfolder has override permission to shared user (#4971)
- Add metadata support to event handling and message payloads (#4992)
- Add tests for both successful and failed purge group operations (#5006)
- Add RBAC DB schema (#5025)
- Apply valkey client for redis_image (#5031)
- Implement ValkeyLiveClient for Redis interactions and add related tests (#5032)
- Apply ValkeyStatClient (#5035)
- Implement ValkeyRateLimitClient (#5036)
- Add ValkeyStreamLockClient (#5039)
- Unify valkey client codes (#5053)
- Support OTEL to storage-proxy and webserver components (#5054)
- Add TRACE Level for log (#5092)
- Implement configuration management CLI for agent, storage, and webserver (#5103)
- Implement ValkeySessionClient for session management using Valkey-Glide (#5114)
- Add
triggered_bytoAuditLogtable (#5115) - - Offload model service health check architecture to AppProxy with Redis-based route management for improved scalability and real-time endpoint monitoring (#5134)
- Impl Role management Service (#5159)
- Add layer-aware repository decorators to various layers (#5161)
- Add
typefield toImageNode(#5207) - Add chown feature to Agent to allow change owner of mount path (#5213)
- Apply client pool in web component (#5223)
- Sync model service's health information real-time with AppProxy (#5230)
- Add
mount_ids,mount_id_mapfields to session creation config (#5237) - Apply client pool to wsproxy client (#5253)
- Add retry metrics for layer observer (#5255)
Improvements
- Apply provisioner pattern to Agent kernel lifecycle "mount" stage (#4979)
- Apply provisioner pattern to Agent kernel lifecycle "image" stage (#4981)
- Apply provisioner pattern to Agent kernel lifecycle "network" stage (#4982)
- Apply provisioner pattern to Agent kernel lifecycle "resource" stage (#4983)
- Apply provisioner pattern to Agent kernel lifecycle "scratch" stage (#4984)
- Apply provisioner pattern to Agent kernel lifecycle "ssh" stage (#4985)
- Apply provisioner pattern to Agent kernel lifecycle "environ" stage (#4986)
- Apply pydantic config in storage-proxy (#5062)
- Apply pydantic config in webserver (#5064)
- Apply pydantic config in agent (#5068)
- Separate repository layer from auth service (#5071)
- Separate repository layer from model serving service (#5072)
- Separate repository layer from image service (#5074)
- Separate repository layer from user service (#5076)
- Separate repository layer from container registry service (#5095)
- Add repository and inject repositories dependency (#5097)
- Separate repository layer from domain service (#5099)
- Separate repository layer from scheduler (#5101)
- Separate repository layer from group service (#5107)
- Refactor endpoint status resolution to use
EndpointStatusenum instead of string literals (#5109) - Separate repository layer from session service (#5110)
- Separate repository layer from vfolder service (#5111)
- Separate repository layer from agent service (#5128)
- Separate repository layer from resource preset service (#5130)
- Separate agent client layer (#5157)
- Separate exceptions file (#5164)
- Separate wsproxy client (#5165)
- Implement Storage Proxy client layer (#5224)
Fixes
- Wrong
service portsfield name of Session creation response (#5047) - Fix GraphQL resolver for compute session to return only a unique set of vfolder mount names (#5056)
- Fix unbound
vfolder_idwhen model-type folder is used in serviceextra_mounts(#5059) - Manager correctly handles already-deleted VFolders (#5080)
- Fix cloud provider detection working on Azure and future-compatible by using versioned metadata URLs instead of hacky sysfs DMI vendor information checks (#5086)
- Enable continuous code execution tasks to work properly in Agent (#5112)
- Enable Agent starts if scratch already cleaned before destroy container (#5118)
- Handle empty consumer handlers in EventDispatcher to avoid retry (#5136)
- Relax Decimal serialization of Agent stats (#5142)
- Fix webserver 404 not found issue (#5170)
- Fix auth action to pass stoken param (#5211)
status_datanot initialized properly when creating single node session (#5217)- Fix potential consumer loop hang by handling
glide.TimeoutErrorfrom Valkey-glidexreadgroup(#5222) - Fix Grafana configuration for halfstack (#5248)
Miscellaneous
- Add
[tool.pyright]section topyproject.tomlso that IDEs using Pyright as the default LSP works out of the box by detecting Pants-specific configurations (#5088)
Test Updates
- Add session service unit test (#4265)
- Add unit test for Project Resource Policy service & repository (#5192)
- Add test code for container utilization metric service (#5194)
- Add test code for resource preset service (#5195)
- Add
Domainservice unit test (#5196) - Add test code for user service (#5197)
- Add test code for user resource policy service (#5198)
- Add
KeypairResourcePolicyservice unit test (#5203) - Add service layer, repository layer unit test for
Model Service(#5214) - Add unit(service, repository layer), integration test for
Auth(#5234) - Add unit(service, repository layer), integration test for
Container Registry(#5258)
Full Changelog
Check out the full changelog until this release (25.12.0).
Full Commit Logs
Check out the full commit logs between release (25.11.0) and (25.12.0).
- Python
Published by github-actions[bot] 7 months ago
https://github.com/lablup/backend.ai - 25.12.0rc1
Breaking Changes
- - Health check capability temporarily broken on OSS AppProxy due to architectural changes
- Users must disable health check feature in
model-definition.yamlto use model services on Open Source Backend.AI - OSS AppProxy support will be restored in future releases (#5134)
- Users must disable health check feature in
Features
- Add VFolder share test verifying shared project vfolder has override permission to shared user (#4971)
- Add metadata support to event handling and message payloads (#4992)
- Add tests for both successful and failed purge group operations (#5006)
- Add RBAC DB schema (#5025)
- Apply valkey client for redis_image (#5031)
- Implement ValkeyLiveClient for Redis interactions and add related tests (#5032)
- Apply ValkeyStatClient (#5035)
- Implement ValkeyRateLimitClient (#5036)
- Add ValkeyStreamLockClient (#5039)
- Unify valkey client codes (#5053)
- Support OTEL to storage-proxy and webserver components (#5054)
- Add TRACE Level for log (#5092)
- Implement configuration management CLI for agent, storage, and webserver (#5103)
- Implement ValkeySessionClient for session management using Valkey-Glide (#5114)
- Add
triggered_bytoAuditLogtable (#5115) - - Offload model service health check architecture to AppProxy with Redis-based route management for improved scalability and real-time endpoint monitoring (#5134)
- Impl Role management Service (#5159)
- Add layer-aware repository decorators to various layers (#5161)
- Add
typefield toImageNode(#5207) - Add chown feature to Agent to allow change owner of mount path (#5213)
- Apply client pool in web component (#5223)
- Sync model service's health information real-time with AppProxy (#5230)
Improvements
- Apply provisioner pattern to Agent kernel lifecycle "mount" stage (#4979)
- Apply provisioner pattern to Agent kernel lifecycle "image" stage (#4981)
- Apply provisioner pattern to Agent kernel lifecycle "network" stage (#4982)
- Apply provisioner pattern to Agent kernel lifecycle "resource" stage (#4983)
- Apply provisioner pattern to Agent kernel lifecycle "scratch" stage (#4984)
- Apply provisioner pattern to Agent kernel lifecycle "ssh" stage (#4985)
- Apply provisioner pattern to Agent kernel lifecycle "environ" stage (#4986)
- Apply pydantic config in storage-proxy (#5062)
- Apply pydantic config in webserver (#5064)
- Apply pydantic config in agent (#5068)
- Separate repository layer from auth service (#5071)
- Separate repository layer from model serving service (#5072)
- Separate repository layer from image service (#5074)
- Separate repository layer from user service (#5076)
- Separate repository layer from container registry service (#5095)
- Add repository and inject repositories dependency (#5097)
- Separate repository layer from domain service (#5099)
- Separate repository layer from scheduler (#5101)
- Separate repository layer from group service (#5107)
- Refactor endpoint status resolution to use
EndpointStatusenum instead of string literals (#5109) - Separate repository layer from session service (#5110)
- Separate repository layer from vfolder service (#5111)
- Separate repository layer from agent service (#5128)
- Separate repository layer from resource preset service (#5130)
- Separate agent client layer (#5157)
- Separate exceptions file (#5164)
- Separate wsproxy client (#5165)
- Implement Storage Proxy client layer (#5224)
Fixes
- Wrong
service portsfield name of Session creation response (#5047) - Fix GraphQL resolver for compute session to return only a unique set of vfolder mount names (#5056)
- Fix unbound
vfolder_idwhen model-type folder is used in serviceextra_mounts(#5059) - Manager correctly handles already-deleted VFolders (#5080)
- Fix cloud provider detection working on Azure and future-compatible by using versioned metadata URLs instead of hacky sysfs DMI vendor information checks (#5086)
- Enable continuous code execution tasks to work properly in Agent (#5112)
- Enable Agent starts if scratch already cleaned before destroy container (#5118)
- Handle empty consumer handlers in EventDispatcher to avoid retry (#5136)
- Relax Decimal serialization of Agent stats (#5142)
- Fix webserver 404 not found issue (#5170)
- Fix auth action to pass stoken param (#5211)
status_datanot initialized properly when creating single node session (#5217)- Fix potential consumer loop hang by handling
glide.TimeoutErrorfrom Valkey-glidexreadgroup(#5222)
Miscellaneous
- Add
[tool.pyright]section topyproject.tomlso that IDEs using Pyright as the default LSP works out of the box by detecting Pants-specific configurations (#5088)
Test Updates
- Add session service unit test (#4265)
- Add unit test for Project Resource Policy service & repository (#5192)
- Add test code for container utilization metric service (#5194)
- Add test code for resource preset service (#5195)
- Add
Domainservice unit test (#5196) - Add test code for user service (#5197)
- Add test code for user resource policy service (#5198)
- Add
KeypairResourcePolicyservice unit test (#5203) - Add service layer, repository layer unit test for
Model Service(#5214) - Add unit(service, repository layer), integration test for
Auth(#5234)
Full Changelog
Check out the full changelog until this release (25.12.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.11.0) and (25.12.0rc1).
- Python
Published by github-actions[bot] 7 months ago
https://github.com/lablup/backend.ai - 25.11.2
No significant changes.
Full Changelog
Check out the full changelog until this release (25.11.2).
Full Commit Logs
Check out the full commit logs between release (25.11.1) and (25.11.2).
- Python
Published by github-actions[bot] 7 months ago
https://github.com/lablup/backend.ai - 25.11.1
Fixes
- Wrong
service portsfield name of Session creation response (#5047) - Fix GraphQL resolver for compute session to return only a unique set of vfolder mount names (#5056)
- Fix unbound
vfolder_idwhen model-type folder is used in serviceextra_mounts(#5059) - Manager correctly handles already-deleted VFolders (#5080)
- Enable continuous code execution tasks to work properly in Agent (#5112)
- Enable Agent starts if scratch already cleaned before destroy container (#5118)
- Handle empty consumer handlers in EventDispatcher to avoid retry (#5136)
- Relax Decimal serialization of Agent stats (#5142)
Full Changelog
Check out the full changelog until this release (25.11.1).
Full Commit Logs
Check out the full commit logs between release (25.11.0) and (25.11.1).
- Python
Published by github-actions[bot] 7 months ago
https://github.com/lablup/backend.ai - 25.11.0
Features
- Add Model Service Endpoint Health check, and Authentication test (#4774)
- Add Model Service Replicas Scale success, fail test (#4777)
- Implement
UserResourcePolicyCRUD SDK functions (#4782) - Implement ValkeyStreamClient for managing Valkey Streams (#4792)
- Add more filterspecs to
ImageNode(#4803) - Add
environtoservice-definition.tomlschema (#4826) - Print warning message when
bootstrap.shexists, but is not executable (#4829) - Add a new webserver configuration option
default_file_browser_imageto specify a default file browser image. (#4836) - Implement the
VFolder.get_idSDK function to query a vfolder's ID by its name. (#4856) - Add VFolder soft-delete, restore, and purge tests (#4873)
- Add VFolder upload, and download file tests (#4877)
- Add VFolder file list, rename, move, deletion tests (#4884)
- Add VFolder clone test (#4885)
- Add VFolder invitation test (#4903)
- Make Endpoint GQL Query filterable by endpoint owner's UUID (#4907)
- Reduce dangling kernel logging in Agent (#4912)
- Add
exclude_tagsto tester.toml to exclude tests requiring extra config from default runs, improving the accessibility of the Tester package (#4914) - Observe count of trigger and result Agent stat collection task (#4926)
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Add aliases to Agent
announce-addr,service-addrconfigurations (#4940) - Add Agent stat stage metric (#4944)
- Built-in WSProxy exposes advertised address (#4975)
- Implement ValkeySentinelClient with connection management and master monitoring (#4987)
- Add JSON log formatter for OTEL (#4991)
- Add configuration management CLI commands and sample generator (#5019)
Fixes
- Fix GQL endpoint list resolver sorting by lifecycle_stage (#4776)
- Improve logging for inspecting missing containers (#4784)
- Allow None type of session id in
RouteInfopreventing pydantic type error when queryingModelService.get_inforight after creating model service (#4786) - Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong exit code when
BaseRunner.query()fails (#4798) - Fix model service creation failing due to parameter validation by excluding None values in model service create SDK request (#4799)
- Name session test using
test_idin failure cases (#4831) - Fix incorrect query filter definition in GQL image node (#4843)
- Fix compute plugin's
cleanup()method not being called upon agent shutdown (#4851) - Prevent model service creation with project type vfolder (#4852)
- Fixed auto scaling rule processing to skip deleted model services, preventing unnecessary operations (#4875)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Skip logging agent status heartbeat to event logs (#4895)
- Fix wrong error message when vfolder invitation row not found (#4906)
- Return null for empty
unmanaged_pathfield in vfolder GQL type (#4913) - Ensure endpoints are properly cleaned up during group purge operations (#4917)
- Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Handle invalid moving statistics value (#4934)
- Skip exporter containers when detecting postgres container in dbshell command (#4935)
- Wrong indent in Agent container stat function (#4946)
- Fix broken tester's
PlainTextFilesUploaderin macOS (#4953) - Remove
session,kernel's foreign key constraints withusers,keypairs, and fixPurgeUserGQL mutation not working when active compute session exist (#4954) - Change to use json.dumps in Agent to properly serialize yarl types (#4957)
- Fixed cache read failure in Message Queue (#4968)
- Handle unexpected errors in EventDispatcher consume and subscribe loops (#5009)
- Fix incorrect filetype check of
PureStorageVFolder (#5018)
Documentation Updates
- Add extra documentation to
testerpackage'sREADME.md(#4845)
External Dependency Updates
- Fix aiodns version to 3.2.0 (#4950)
Miscellaneous
Full Changelog
Check out the full changelog until this release (25.11.0).
Full Commit Logs
Check out the full commit logs between release (25.10.1) and (25.11.0).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.6.12
Features
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Built-in WSProxy exposes advertised address (#4975)
Fixes
- Improve logging for inspecting missing containers (#4784)
- Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Cool down Redis error logs
AttributeError: 'NoneType' object has no attribute 'get'(#4795) - Fix wrong
AcceptHeader onHarborRegistryV2._process_oci_index()(#4807) - Fix incorrect query filter definition in GQL image node (#4843)
- Prevent model service creation with project type vfolder (#4852)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Ensure endpoints are properly cleaned up during group purge operations (#4917)
- Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Wrong indent in Agent container stat function (#4946)
- Change to use json.dumps in Agent to properly serialize yarl types (#4957)
- Calculate correct VFolder permissions when admins query (#4962)
Full Changelog
Check out the full changelog until this release (25.6.12).
Full Commit Logs
Check out the full commit logs between release (25.6.11) and (25.6.12).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 24.09.12
Features
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Built-in WSProxy exposes advertised address (#4975)
Fixes
- Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong
AcceptHeader onHarborRegistryV2._process_oci_index()(#4807) - Prevent model service creation with project type vfolder (#4852)
- Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Wrong indent in Agent container stat function (#4946)
- Calculate correct VFolder permissions when admins query (#4963)
- Fix issue preventing admins from leaving invited vfolders (#4964)
Full Changelog
Check out the full changelog until this release (24.09.12).
Full Commit Logs
Check out the full commit logs between release (24.09.11) and (24.09.12).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.11.0rc4
Features
- Add Model Service Endpoint Health check, and Authentication test (#4774)
- Implement
UserResourcePolicyCRUD SDK functions (#4782) - Implement ValkeyStreamClient for managing Valkey Streams (#4792)
- Add more filterspecs to
ImageNode(#4803) - Add
environtoservice-definition.tomlschema (#4826) - Print warning message when
bootstrap.shexists, but is not executable (#4829) - Add a new webserver configuration option
default_file_browser_imageto specify a default file browser image. (#4836) - Implement the
VFolder.get_idSDK function to query a vfolder's ID by its name. (#4856) - Add VFolder soft-delete, restore, and purge tests (#4873)
- Add VFolder upload, and download file tests (#4877)
- Add VFolder file list, rename, move, deletion tests (#4884)
- Add VFolder clone test (#4885)
- Add VFolder invitation test (#4903)
- Make Endpoint GQL Query filterable by endpoint owner's UUID (#4907)
- Reduce dangling kernel logging in Agent (#4912)
- Add
exclude_tagsto tester.toml to exclude tests requiring extra config from default runs, improving the accessibility of the Tester package (#4914) - Observe count of trigger and result Agent stat collection task (#4926)
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Add aliases to Agent
announce-addr,service-addrconfigurations (#4940) - Add Agent stat stage metric (#4944)
- Implement ValkeySentinelClient with connection management and master monitoring (#4987)
Fixes
- Fix GQL endpoint list resolver sorting by lifecycle_stage (#4776)
- Improve logging for inspecting missing containers (#4784)
- Allow None type of session id in
RouteInfopreventing pydantic type error when queryingModelService.get_inforight after creating model service (#4786) - Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong exit code when
BaseRunner.query()fails (#4798) - Fix model service creation failing due to parameter validation by excluding None values in model service create SDK request (#4799)
- Name session test using
test_idin failure cases (#4831) - Fix incorrect query filter definition in GQL image node (#4843)
- Fix compute plugin's
cleanup()method not being called upon agent shutdown (#4851) - Prevent model service creation with project type vfolder (#4852)
- Fixed auto scaling rule processing to skip deleted model services, preventing unnecessary operations (#4875)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Skip logging agent status heartbeat to event logs (#4895)
- Fix wrong error message when vfolder invitation row not found (#4906)
- Return null for empty
unmanaged_pathfield in vfolder GQL type (#4913) - Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Handle invalid moving statistics value (#4934)
- Skip exporter containers when detecting postgres container in dbshell command (#4935)
- Wrong indent in Agent container stat function (#4946)
- Fix broken tester's
PlainTextFilesUploaderin macOS (#4953) - Remove
session,kernel's foreign key constraints withusers,keypairs, and fixPurgeUserGQL mutation not working when active compute session exist (#4954) - Change to use json.dumps in Agent to properly serialize yarl types (#4957)
- Fixed cache read failure in Message Queue (#4968)
Documentation Updates
- Add extra documentation to
testerpackage'sREADME.md(#4845)
External Dependency Updates
- Fix aiodns version to 3.2.0 (#4950)
Miscellaneous
Full Changelog
Check out the full changelog until this release (25.11.0rc4).
Full Commit Logs
Check out the full commit logs between release (25.10.1) and (25.11.0rc4).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.11.0rc3
Features
- Add Model Service Endpoint Health check, and Authentication test (#4774)
- Implement
UserResourcePolicyCRUD SDK functions (#4782) - Implement ValkeyStreamClient for managing Valkey Streams (#4792)
- Add more filterspecs to
ImageNode(#4803) - Add
environtoservice-definition.tomlschema (#4826) - Print warning message when
bootstrap.shexists, but is not executable (#4829) - Add a new webserver configuration option
default_file_browser_imageto specify a default file browser image. (#4836) - Implement the
VFolder.get_idSDK function to query a vfolder's ID by its name. (#4856) - Add VFolder soft-delete, restore, and purge tests (#4873)
- Add VFolder upload, and download file tests (#4877)
- Add VFolder file list, rename, move, deletion tests (#4884)
- Add VFolder clone test (#4885)
- Add VFolder invitation test (#4903)
- Make Endpoint GQL Query filterable by endpoint owner's UUID (#4907)
- Reduce dangling kernel logging in Agent (#4912)
- Add
exclude_tagsto tester.toml to exclude tests requiring extra config from default runs, improving the accessibility of the Tester package (#4914) - Observe count of trigger and result Agent stat collection task (#4926)
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Add aliases to Agent
announce-addr,service-addrconfigurations (#4940) - Add Agent stat stage metric (#4944)
Fixes
- Fix GQL endpoint list resolver sorting by lifecycle_stage (#4776)
- Improve logging for inspecting missing containers (#4784)
- Allow None type of session id in
RouteInfopreventing pydantic type error when queryingModelService.get_inforight after creating model service (#4786) - Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong exit code when
BaseRunner.query()fails (#4798) - Fix model service creation failing due to parameter validation by excluding None values in model service create SDK request (#4799)
- Name session test using
test_idin failure cases (#4831) - Fix incorrect query filter definition in GQL image node (#4843)
- Fix compute plugin's
cleanup()method not being called upon agent shutdown (#4851) - Prevent model service creation with project type vfolder (#4852)
- Fixed auto scaling rule processing to skip deleted model services, preventing unnecessary operations (#4875)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Skip logging agent status heartbeat to event logs (#4895)
- Fix wrong error message when vfolder invitation row not found (#4906)
- Return null for empty
unmanaged_pathfield in vfolder GQL type (#4913) - Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Handle invalid moving statistics value (#4934)
- Skip exporter containers when detecting postgres container in dbshell command (#4935)
- Wrong indent in Agent container stat function (#4946)
- Fix broken tester's
PlainTextFilesUploaderin macOS (#4953) - Change to use json.dumps in Agent to properly serialize yarl types (#4957)
Documentation Updates
- Add extra documentation to
testerpackage'sREADME.md(#4845)
External Dependency Updates
- Fix aiodns version to 3.2.0 (#4950)
Miscellaneous
Full Changelog
Check out the full changelog until this release (25.11.0rc3).
Full Commit Logs
Check out the full commit logs between release (25.10.1) and (25.11.0rc3).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.11.0rc2
Features
- Add Model Service Endpoint Health check, and Authentication test (#4774)
- Implement
UserResourcePolicyCRUD SDK functions (#4782) - Implement ValkeyStreamClient for managing Valkey Streams (#4792)
- Add more filterspecs to
ImageNode(#4803) - Add
environtoservice-definition.tomlschema (#4826) - Print warning message when
bootstrap.shexists, but is not executable (#4829) - Add a new webserver configuration option
default_file_browser_imageto specify a default file browser image. (#4836) - Implement the
VFolder.get_idSDK function to query a vfolder's ID by its name. (#4856) - Add VFolder soft-delete, restore, and purge tests (#4873)
- Add VFolder upload, and download file tests (#4877)
- Add VFolder file list, rename, move, deletion tests (#4884)
- Add VFolder clone test (#4885)
- Add VFolder invitation test (#4903)
- Make Endpoint GQL Query filterable by endpoint owner's UUID (#4907)
- Reduce dangling kernel logging in Agent (#4912)
- Add
exclude_tagsto tester.toml to exclude tests requiring extra config from default runs, improving the accessibility of the Tester package (#4914) - Observe count of trigger and result Agent stat collection task (#4926)
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Add aliases to Agent
announce-addr,service-addrconfigurations (#4940) - Add Agent stat stage metric (#4944)
Fixes
- Fix GQL endpoint list resolver sorting by lifecycle_stage (#4776)
- Improve logging for inspecting missing containers (#4784)
- Allow None type of session id in
RouteInfopreventing pydantic type error when queryingModelService.get_inforight after creating model service (#4786) - Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong exit code when
BaseRunner.query()fails (#4798) - Fix model service creation failing due to parameter validation by excluding None values in model service create SDK request (#4799)
- Name session test using
test_idin failure cases (#4831) - Fix incorrect query filter definition in GQL image node (#4843)
- Fix compute plugin's
cleanup()method not being called upon agent shutdown (#4851) - Prevent model service creation with project type vfolder (#4852)
- Fixed auto scaling rule processing to skip deleted model services, preventing unnecessary operations (#4875)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Skip logging agent status heartbeat to event logs (#4895)
- Fix wrong error message when vfolder invitation row not found (#4906)
- Return null for empty
unmanaged_pathfield in vfolder GQL type (#4913) - Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Handle invalid moving statistics value (#4934)
- Skip exporter containers when detecting postgres container in dbshell command (#4935)
- Wrong indent in Agent container stat function (#4946)
Documentation Updates
- Add extra documentation to
testerpackage'sREADME.md(#4845)
External Dependency Updates
- Fix aiodns version to 3.2.0 (#4950)
Miscellaneous
Full Changelog
Check out the full changelog until this release (25.11.0rc2).
Full Commit Logs
Check out the full commit logs between release (25.10.1) and (25.11.0rc2).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 24.09.12rc1
Features
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
Fixes
- Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong
AcceptHeader onHarborRegistryV2._process_oci_index()(#4807) - Prevent model service creation with project type vfolder (#4852)
- Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Wrong indent in Agent container stat function (#4946)
Full Changelog
Check out the full changelog until this release (24.09.12rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.11) and (24.09.12rc1).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.6.12rc1
Features
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
Fixes
- Improve logging for inspecting missing containers (#4784)
- Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Cool down Redis error logs
AttributeError: 'NoneType' object has no attribute 'get'(#4795) - Fix wrong
AcceptHeader onHarborRegistryV2._process_oci_index()(#4807) - Fix incorrect query filter definition in GQL image node (#4843)
- Prevent model service creation with project type vfolder (#4852)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Wrong indent in Agent container stat function (#4946)
Full Changelog
Check out the full changelog until this release (25.6.12rc1).
Full Commit Logs
Check out the full commit logs between release (25.6.11) and (25.6.12rc1).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.11.0rc1
Features
- Add Model Service Endpoint Health check, and Authentication test (#4774)
- Implement
UserResourcePolicyCRUD SDK functions (#4782) - Implement ValkeyStreamClient for managing Valkey Streams (#4792)
- Add more filterspecs to
ImageNode(#4803) - Add
environtoservice-definition.tomlschema (#4826) - Print warning message when
bootstrap.shexists, but is not executable (#4829) - Add a new webserver configuration option
default_file_browser_imageto specify a default file browser image. (#4836) - Implement the
VFolder.get_idSDK function to query a vfolder's ID by its name. (#4856) - Add VFolder soft-delete, restore, and purge tests (#4873)
- Add VFolder upload, and download file tests (#4877)
- Add VFolder file list, rename, move, deletion tests (#4884)
- Add VFolder clone test (#4885)
- Add VFolder invitation test (#4903)
- Make Endpoint GQL Query filterable by endpoint owner's UUID (#4907)
- Reduce dangling kernel logging in Agent (#4912)
- Add
exclude_tagsto tester.toml to exclude tests requiring extra config from default runs, improving the accessibility of the Tester package (#4914) - Observe count of trigger and result Agent stat collection task (#4926)
- Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
- Add aliases to Agent
announce-addr,service-addrconfigurations (#4940) - Add Agent stat stage metric (#4944)
Fixes
- Fix GQL endpoint list resolver sorting by lifecycle_stage (#4776)
- Improve logging for inspecting missing containers (#4784)
- Allow None type of session id in
RouteInfopreventing pydantic type error when queryingModelService.get_inforight after creating model service (#4786) - Status code is missing when the
Acceptheader is not set toapplication/jsonin the wsproxy exception middleware (#4788) - Fix Agent Memory plugin to handle multiple IO device stat (#4789)
- Fix invalid state error when setting kernel termination future (#4791)
- Fix wrong exit code when
BaseRunner.query()fails (#4798) - Fix model service creation failing due to parameter validation by excluding None values in model service create SDK request (#4799)
- Name session test using
test_idin failure cases (#4831) - Fix incorrect query filter definition in GQL image node (#4843)
- Fix compute plugin's
cleanup()method not being called upon agent shutdown (#4851) - Prevent model service creation with project type vfolder (#4852)
- Fixed auto scaling rule processing to skip deleted model services, preventing unnecessary operations (#4875)
- Allow GQL modifyuser mutation to update users `mainaccess_key` (#4879)
- Skip logging agent status heartbeat to event logs (#4895)
- Fix wrong error message when vfolder invitation row not found (#4906)
- Return null for empty
unmanaged_pathfield in vfolder GQL type (#4913) - Handle
NoSuchProcessproperly when gather process memory stat (#4922) - Skip kernel destroy when agent shutdown (#4923)
- Check if Agent is daemon process before query docker netstat (#4929)
- Handle invalid moving statistics value (#4934)
- Skip exporter containers when detecting postgres container in dbshell command (#4935)
- Wrong indent in Agent container stat function (#4946)
Documentation Updates
- Add extra documentation to
testerpackage'sREADME.md(#4845)
External Dependency Updates
- Fix aiodns version to 3.2.0 (#4950)
Miscellaneous
Full Changelog
Check out the full changelog until this release (25.11.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.10.1) and (25.11.0rc1).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.10.1
Features
- Configure the logging module's config file to use Pydantic (#2834)
- Add installed field to GQL image node query schema (#4757)
Fixes
- Fixed optional field handling in
AutoScalingRulemodify action by replacingNonewithUndefined(#1636) - Cool down Redis error logs
AttributeError: 'NoneType' object has no attribute 'get'(#4795) - Fix wrong
AcceptHeader onHarborRegistryV2._process_oci_index()(#4807)
Full Changelog
Check out the full changelog until this release (25.10.1).
Full Commit Logs
Check out the full commit logs between release (25.10.0) and (25.10.1).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 24.09.11
Fixes
- Fix Backend.AI agent to gracefully handle missing
Config.Labelsfield in Docker image inspection (#4576) - Include endpoint loading in route retrieval for delete_route function (#4595)
- Replace assert statements in
load_model_definitionwith raising exception (#4599) - Support more Accept headers in
BaseContainerRegistry.scan_tag(#4627) - Update container ports validation to catch omitted or empty list cases, preventing potential
IndexError(#4656) - Fix broken
untag_image_from_registrySDK method (#4720) - Skip gathering metrics of non-existent processes (#4753)
Full Changelog
Check out the full changelog until this release (24.09.11).
Full Commit Logs
Check out the full commit logs between release (24.09.10) and (24.09.11).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.10.0
Features
- Add
image purgeCLI command for hard deletingImageRow. (#3951) - Add
service-definition.tomlhandling logic in the ModelServing creation service (#4220) - Implement test specification management and execution framework (#4614)
- Add creation wrappers for each session type and test cases for session creation (#4654)
- Add Tester configuration, and template for config injection (#4661)
- Add
ComputeSessionNodequery method in compute session sdk to offer detail info about session (#4669) - Replace hardcoded bgtask status event codes with
BgtaskStatusenum values (#4671) - Agent sends container status through heartbeat (#4677)
- Add session rename test verifying success renaming and prevent duplicate renaming (#4680)
- Add parametrize functionality in tester spec (#4692)
- Add session status history retrieval test validating history not empty and contains valid statuses (#4697)
- Add session dependency graph retrieval test validating dependency graphs between sessions (#4698)
- Improve Exporter to easily check error information when an error occurs (#4701)
- Add metric for sync container lifecycle task (#4704)
- Add python code execution tests to Tester (#4709)
- Add Container purge RPC to agent (#4710)
- Add session imagify, commit test to Tester (#4713)
- Manager purges kernels and containers with mismatched status between DB and agent heartbeat (#4717)
- Add session creation test using various options (#4729)
- Add vfolder mounted session tests verifying file upload and downloads (#4732)
- Add additional logging to kernel creation and termination (#4737)
- Add
EndpointTemplatefor testing Model Service (#4744) - Add
runtime_variantparameter to Service creation SDK function and CLI command (#4749) - Agent heartbeat handler queries Kernel ids instead of Agent id (#4766)
Improvements
- Refactor event dispatchers registration of idle checkers (#4516)
- Differenciate consume and subscribe events (#4620)
Fixes
- Add missing event handler for SessionCheckingPrecondEvent (#4619)
- Fix manager config models being validated only by alias (#4624)
- Support more Accept headers in
BaseContainerRegistry.scan_tag(#4627) - Resolve broken image rescanning on macOS due to
aiotoolsupstream issue (#4628) - Fix container utilization metric service config reload not working (#4640)
- Update container ports validation to catch omitted or empty list cases, preventing potential
IndexError(#4656) - Prevent remount NFS path (#4663)
- Agent skips failure of code runner initialization (#4679)
- Fix Session Rename API to block duplicate session names for the same user (#4690)
- Update
endpoints.destroyed_atcolumn when serving is terminated (#4696) - Fix license config not included in PluginConfig after Pydantic migration (#4718)
- Fix broken
untag_image_from_registrySDK method (#4720) - Include
image_idtomessagefield ofimagifyREST API's bgtask response (#4723) - Close code runner of agent gracefully (#4740)
- Allow
ComputeSessionSDK methods to identify sessions by id (#4750) - Skip gathering metrics of non-existent processes (#4753)
- Fix kernel runner and agent rpc server return code completion result (#4754)
- Use correct attribute
routingsinstead ofroutesfor endpoints (#4756)
Miscellaneous
- Remove
enable2FAconfig from webserver (#4653) - Ensure the
Vitebuild tool, required by thebackend.ai-uipackage, is installed before building the WebUI (#4725)
Full Changelog
Check out the full changelog until this release (25.10.0).
Full Commit Logs
Check out the full commit logs between release (25.9.1) and (25.10.0).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.10.0rc1
Features
- Add
service-definition.tomlhandling logic in the ModelServing creation service (#4220) - Implement test specification management and execution framework (#4614)
- Add creation wrappers for each session type and test cases for session creation (#4654)
- Add Tester configuration, and template for config injection (#4661)
- Add
ComputeSessionNodequery method in compute session sdk to offer detail info about session (#4669) - Replace hardcoded bgtask status event codes with
BgtaskStatusenum values (#4671) - Agent sends container status through heartbeat (#4677)
- Add parametrize functionality in tester spec (#4692)
- Add session dependency graph retrieval test validating dependency graphs between sessions (#4698)
- Improve Exporter to easily check error information when an error occurs (#4701)
- Add metric for sync container lifecycle task (#4704)
- Add python code execution tests to Tester (#4709)
- Add Container purge RPC to agent (#4710)
- Manager purges kernels and containers with mismatched status between DB and agent heartbeat (#4717)
- Add additional logging to kernel creation and termination (#4737)
Improvements
- Refactor event dispatchers registration of idle checkers (#4516)
- Differenciate consume and subscribe events (#4620)
Fixes
- Add missing event handler for SessionCheckingPrecondEvent (#4619)
- Fix manager config models being validated only by alias (#4624)
- Support more Accept headers in
BaseContainerRegistry.scan_tag(#4627) - Resolve broken image rescanning on macOS due to
aiotoolsupstream issue (#4628) - Fix container utilization metric service config reload not working (#4640)
- Update container ports validation to catch omitted or empty list cases, preventing potential
IndexError(#4656) - Prevent remount NFS path (#4663)
- Agent skips failure of code runner initialization (#4679)
- Fix Session Rename API to block duplicate session names for the same user (#4690)
- Update
endpoints.destroyed_atcolumn when serving is terminated (#4696) - Fix license config not included in PluginConfig after Pydantic migration (#4718)
- Fix broken
untag_image_from_registrySDK method (#4720) - Include
image_idtomessagefield ofimagifyREST API's bgtask response (#4723)
Miscellaneous
- Remove
enable2FAconfig from webserver (#4653) - Ensure the
Vitebuild tool, required by thebackend.ai-uipackage, is installed before building the WebUI (#4725)
Full Changelog
Check out the full changelog until this release (25.10.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.9.1) and (25.10.0rc1).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.6.10
Features
- Add Container purge RPC to agent (#4710)
Fixes
- Add missing event handler for SessionCheckingPrecondEvent (#4619)
- Support more Accept headers in
BaseContainerRegistry.scan_tag(#4627) - Resolve broken image rescanning on macOS due to
aiotoolsupstream issue (#4628) - Update container ports validation to catch omitted or empty list cases, preventing potential
IndexError(#4656) - Agent skips failure of code runner initialization (#4679)
- Fix broken
untag_image_from_registrySDK method (#4720)
Full Changelog
Check out the full changelog until this release (25.6.10).
Full Commit Logs
Check out the full commit logs between release (25.6.9) and (25.6.10).
- Python
Published by github-actions[bot] 8 months ago
https://github.com/lablup/backend.ai - 25.6.9
Features
- Prevent batch kernel termination when an agent shutdown (#4587)
Fixes
- Include endpoint loading in route retrieval for delete_route function (#4594)
- Replace assert statements in
load_model_definitionwith raising exception (#4599)
Full Changelog
Check out the full changelog until this release (25.6.9).
Full Commit Logs
Check out the full commit logs between release (25.6.8) and (25.6.9).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.9.1
Features
- Prevent batch kernel termination when an agent shutdown (#4587)
- Add Redis-based Service Discovery (#4609)
Fixes
- Fix
ModelServingService.delete_route()to query theRouteRowwithload_endpoint=True, ensuring theendpointrelationship is eagerly loaded (#4590) - Replace assert statements in
load_model_definitionwith raising exception (#4599) - Resolve
generate-rpc-keypairCLI command failure due torpc_auth_manager_keypairnot found error (#4612)
Documentation Updates
- Added clarifying comments to prevent
AuthSDK config confusion (#4607)
Miscellaneous
- Add Etcd, Redis, and PostgreSQL exporters/scrapers in local development environment configuration. (#4606)
Full Changelog
Check out the full changelog until this release (25.9.1).
Full Commit Logs
Check out the full commit logs between release (25.9.0) and (25.9.1).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 24.09.10
Fixes
- Fix broken
KeypairSDK methods (activate,deactivate) (#4547)
Full Changelog
Check out the full changelog until this release (24.09.10).
Full Commit Logs
Check out the full commit logs between release (24.09.9) and (24.09.10).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.6.8
Improvements
- Improve logging for error handling (#4544)
Fixes
- Fix broken
KeypairSDK methods (activate,deactivate) (#4547) - Fix missing entity id in processor (#4555)
- Fix Backend.AI agent to gracefully handle missing
Config.Labelsfield in Docker image inspection (#4576) - Do null-check of kernel service-ports when query direct access info of a compute session (#4581)
Full Changelog
Check out the full changelog until this release (25.6.8).
Full Commit Logs
Check out the full commit logs between release (25.6.7) and (25.6.8).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.9.0
Features
- Add Action Tests for
Image. (#4048) - Enable TOTP registration for anonymous users (#4354)
- Refactor event dispatcher and handlers directory structure (#4497)
- Add
EventDomain.WORKFLOWenum value to support workflow-related event categorization (#4499) - Create new manager CLI command
backend.ai mgr scheduler last-execution-timeto let administrators fetch each manager scheduler's last execution time (#4507) - Add stage package to support deterministic step-by-step execution (#4509)
- Make resource fragmentation configurable (#4533)
- Add missing
GET /status_historyendpoint to the session REST API (#4543)
Improvements
- Refactor
keypair_preparationfrom a classmethod of the Graphene class to a utility function to decouple logic from GraphQL (#4510) - Introduce service layer in
authAPIs to apply audit logs for user login APIs (#4535) - Improve logging for error handling in various modules (#4540)
- Initialize device env vars with/without restart to make session restart successfully (#4585)
Fixes
- heartbeat register service when service is dead (#4492)
- Fix missing log output of GraphQL top-level query fields by improving graphene's resolver info object usage (#4505)
- Fix Backend.AI agent equipped with mock accelerator refusing to allocate mock accelerator to session after agent restart (#4532)
- Fix broken
list_presetsAPI, SDK (#4541) - Fix broken
usage_per_monthmethod in Resource SDK (#4546) - Fix broken
KeypairSDK methods (activate,deactivate) (#4547) - Broken
stream_ptymethod in Session SDK (#4548) - Fix missing entity id in processor (#4550)
- Fix wrong idle checker init arguments (#4557)
- Fix broken
NetworkSDK implementations to work properly (#4558) - Skip processing messages with None data in RedisQueue (#4559)
- Add missing
messagefield toBgTaskFailedEventto provide information about the occurred error (#4563) - Fix
AgentWatcher.get_statusSDK API to use query parameter instead of using body (#4569) - Correct the worker process ID and names in the server log outputs, which had been unintentionally overriden as the main process information (#4572)
- Resolve session creation failure due to incorrect resource label loading (#4573)
- Remove duplicated error logging in Session service (#4574)
- Fix Backend.AI agent to gracefully handle missing
Config.Labelsfield in Docker image inspection (#4576) - Do null-check of kernel service-ports when query direct access info of a compute session (#4581)
Miscellaneous
- Remove outdated Image SDK methods (
get_image_import_form,build) (#4537) - Remove useless print in
ScalingGroup.list_available(#4538)
Full Changelog
Check out the full changelog until this release (25.9.0).
Full Commit Logs
Check out the full commit logs between release (25.8.1) and (25.9.0).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.9.0rc3
Features
- Add Action Tests for
Image. (#4048) - Enable TOTP registration for anonymous users (#4354)
- Refactor event dispatcher and handlers directory structure (#4497)
- Add
EventDomain.WORKFLOWenum value to support workflow-related event categorization (#4499) - Create new manager CLI command
backend.ai mgr scheduler last-execution-timeto let administrators fetch each manager scheduler's last execution time (#4507) - Add stage package to support deterministic step-by-step execution (#4509)
- Make resource fragmentation configurable (#4533)
- Add missing
GET /status_historyendpoint to the session REST API (#4543)
Improvements
- Refactor
keypair_preparationfrom a classmethod of the Graphene class to a utility function to decouple logic from GraphQL (#4510) - Introduce service layer in
authAPIs to apply audit logs for user login APIs (#4535) - Improve logging for error handling in various modules (#4540)
Fixes
- heartbeat register service when service is dead (#4492)
- Fix missing log output of GraphQL top-level query fields by improving graphene's resolver info object usage (#4505)
- Fix Backend.AI agent equipped with mock accelerator refusing to allocate mock accelerator to session after agent restart (#4532)
- Fix broken
list_presetsAPI, SDK (#4541) - Fix broken
usage_per_monthmethod in Resource SDK (#4546) - Fix broken
KeypairSDK methods (activate,deactivate) (#4547) - Broken
stream_ptymethod in Session SDK (#4548) - Fix missing entity id in processor (#4550)
- Fix wrong idle checker init arguments (#4557)
- Fix broken
NetworkSDK implementations to work properly (#4558) - Skip processing messages with None data in RedisQueue (#4559)
Miscellaneous
- Remove outdated Image SDK methods (
get_image_import_form,build) (#4537) - Remove useless print in
ScalingGroup.list_available(#4538)
Full Changelog
Check out the full changelog until this release (25.9.0rc3).
Full Commit Logs
Check out the full commit logs between release (25.8.1) and (25.9.0rc3).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.9.0rc2
Features
- Add Action Tests for
Image. (#4048) - Enable TOTP registration for anonymous users (#4354)
- Refactor event dispatcher and handlers directory structure (#4497)
- Add
EventDomain.WORKFLOWenum value to support workflow-related event categorization (#4499) - Create new manager CLI command
backend.ai mgr scheduler last-execution-timeto let administrators fetch each manager scheduler's last execution time (#4507) - Add stage package to support deterministic step-by-step execution (#4509)
- Make resource fragmentation configurable (#4533)
- Add missing
GET /status_historyendpoint to the session REST API (#4543)
Improvements
- Refactor
keypair_preparationfrom a classmethod of the Graphene class to a utility function to decouple logic from GraphQL (#4510) - Introduce service layer in
authAPIs to apply audit logs for user login APIs (#4535) - Improve logging for error handling in various modules (#4540)
Fixes
- heartbeat register service when service is dead (#4492)
- Fix missing log output of GraphQL top-level query fields by improving graphene's resolver info object usage (#4505)
- Fix Backend.AI agent equipped with mock accelerator refusing to allocate mock accelerator to session after agent restart (#4532)
- Fix broken
list_presetsAPI, SDK (#4541) - Fix broken
usage_per_monthmethod in Resource SDK (#4546) - Fix broken
KeypairSDK methods (activate,deactivate) (#4547) - Broken
stream_ptymethod in Session SDK (#4548) - Fix missing entity id in processor (#4550)
- Fix wrong idle checker init arguments (#4557)
Miscellaneous
- Remove outdated Image SDK methods (
get_image_import_form,build) (#4537) - Remove useless print in
ScalingGroup.list_available(#4538)
Full Changelog
Check out the full changelog until this release (25.9.0rc2).
Full Commit Logs
Check out the full commit logs between release (25.8.1) and (25.9.0rc2).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.6.8rc1
Improvements
- Improve logging for error handling (#4544)
Fixes
- Fix broken
KeypairSDK methods (activate,deactivate) (#4547) - Fix missing entity id in processor (#4555)
Full Changelog
Check out the full changelog until this release (25.6.8rc1).
Full Commit Logs
Check out the full commit logs between release (25.6.7) and (25.6.8rc1).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.9.0rc1
Features
- Add Action Tests for
Image. (#4048) - Enable TOTP registration for anonymous users (#4354)
- Refactor event dispatcher and handlers directory structure (#4497)
- Add
EventDomain.WORKFLOWenum value to support workflow-related event categorization (#4499) - Create new manager CLI command
backend.ai mgr scheduler last-execution-timeto let administrators fetch each manager scheduler's last execution time (#4507) - Add stage package to support deterministic step-by-step execution (#4509)
- Make resource fragmentation configurable (#4533)
- Add missing
GET /status_historyendpoint to the session REST API (#4543)
Improvements
- Refactor
keypair_preparationfrom a classmethod of the Graphene class to a utility function to decouple logic from GraphQL (#4510) - Introduce service layer in
authAPIs to apply audit logs for user login APIs (#4535) - Improve logging for error handling in various modules (#4540)
Fixes
- heartbeat register service when service is dead (#4492)
- Fix missing log output of GraphQL top-level query fields by improving graphene's resolver info object usage (#4505)
- Fix Backend.AI agent equipped with mock accelerator refusing to allocate mock accelerator to session after agent restart (#4532)
- Fix broken
list_presetsAPI, SDK (#4541) - Fix broken
usage_per_monthmethod in Resource SDK (#4546) - Fix broken
KeypairSDK methods (activate,deactivate) (#4547) - Broken
stream_ptymethod in Session SDK (#4548) - Fix missing entity id in processor (#4550)
Miscellaneous
- Remove outdated Image SDK methods (
get_image_import_form,build) (#4537) - Remove useless print in
ScalingGroup.list_available(#4538)
Full Changelog
Check out the full changelog until this release (25.9.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.8.1) and (25.9.0rc1).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.8.1
Fixes
- Fixed client SDK method
Service.create()signature to comply withNewServiceRequestModelschema (#4449)
Full Changelog
Check out the full changelog until this release (25.8.1).
Full Commit Logs
Check out the full commit logs between release (25.8.0) and (25.8.1).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 24.09.9
Fixes
- Fix silent failure of
DockerAgent.push_image(),DockerAgent.pull_image(). (#2572) - Filter vfolders by status before initiating a vfolder deletion task (#3446)
Full Changelog
Check out the full changelog until this release (24.09.9).
Full Commit Logs
Check out the full commit logs between release (24.09.8) and (24.09.9).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.6.7
Fixes
- Fixed client SDK method
Service.create()signature to comply withNewServiceRequestModelschema (#4449) - Fixed
Service.create()in client SDK to truncate the default generated session name to the maximum allowed length (#4450)
Full Changelog
Check out the full changelog until this release (25.6.7).
Full Commit Logs
Check out the full commit logs between release (25.6.6) and (25.6.7).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.8.1rc1
Fixes
- Fixed client SDK method
Service.create()signature to comply withNewServiceRequestModelschema (#4449)
Full Changelog
Check out the full changelog until this release (25.8.1rc1).
Full Commit Logs
Check out the full commit logs between release (25.8.0) and (25.8.1rc1).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.8.0
Features
- Add Manager config implementations based on Pydantic. (#3994)
- Add Action Test Code for
Group(#4051) - Add Action Test Code for
User(#4059) - Add model service & processors (#4109)
- Apply error code in BackendError (#4245)
- Migrate manager config to Pydantic. (#4317)
- Introduce
LabelNameenum to avoid hardcoded image/container labels (#4328) - Add error code to API exception message (#4336)
- Add error code to metric (#4337)
- Add etcd service discovery (#4343)
- Introduce ConfigLoaders, UnifiedConfig, and refactor existing config logic. (#4351)
- Add VFolder force-delete API to Python client SDK (#4353)
- Refactor event propagation (#4358)
- Integrate of
ManagerLocalConfigwithManagerSharedConfig, and Make all manager configs to share the same Chain Config Loader. (#4370) - Add GQL APIs for querying and updating the current config status. (#4376)
- Introduce ProcessorPackage for list up each action types that can be processed by ActionProcessor. (#4379)
- Add kernel last-seen event and handler (#4386)
- Add event logger for consumer handlers (#4387)
- Introduce
ActionSpec. (#4393) - Support relative path for
AutoDirectoryPath. (#4413) - Register service discovery and add http service discovery for prometheus (#4438)
- Add OpenTelemetry dependencies for enhanced observability (#4479)
Improvements
- Refactor the
aliasofManagerSharedConfigintovalidation_aliasandserialization_alias. (#4365)
Fixes
- Resolve
BgTaskFailedErroris not propagated to the client. (#4272) - Fix invalid msg_id type in hiredis message queue (#4309)
- Prevent invalid resource slot creation, and mutation in
ResourcePresetService. (#4314) - Agent retries retrieving kernel service info if it fails during the kernel creation step (#4321)
- Add TypeError handling in redis_helper (#4339)
- Add default value of task_info value (#4340)
- Use label's items for making resource info (#4341)
- Add missing
KernelStatus.ERRORto dead kernel status set (#4371) - Make BaseAction's
entity_type(),operation_type()classmethod. (#4377) - Revert addition of
SessionStatus.ERRORandKernelStatus.ERRORto dead status sets (#4384) - Fix event handling observer to report success or failure after handling completes (#4392)
- Revert sane default config update. (#4395)
- Add missing
UserBgtaskEventimplementation. (#4404) - Change pyzmq version on python-kernel, compatible with python 3.13 (#4405)
- Fix
vfolder lsCLI command which referred deprecated response schema fields. (#4425) - Fix
backend.ai admin resource usage-per-periodCLI command. (#4429) - Add import statement of
KernelLifecycleEventReasonto load legacy kernels in agents (#4436) - Increase blocking timeout for message retrieval in redis message queue (#4441)
- Add UUID serialization support in ExtendedJSONEncoder (#4442)
- Improve error handling for token generation in ModelServingService (#4443)
- Fix issue preventing admins from leaving invited vfolders (#4446)
- Fixed session environment variable init during route creation when
endpoint.environisNone(#4447) - Check unregistered email and update error code when vfolder invitation conflicts & Enhance error handling with detailed debug responses in exception middleware (#4448)
- Fixed
Service.create()in client SDK to truncate the default generated session name to the maximum allowed length (#4450) - Add missing defaults to BootstrapConfig. (#4453)
- Fix issue preventing users from uploading files to compute sessions (#4457)
- Calculate correct VFolder permissions when admins query (#4459)
- Fix incorrect handling of disallowed permissions in GQL middleware. (#4463)
- Handle
NoItemsexception correctly in CLI framework. (#4465) - Fix wrong method name in rpc call metric (#4475)
Documentation Updates
- Update Python Version Compatibility in README (#4306)
- Update towncrier command documentation (#4364)
Miscellaneous
- Add build wheel script (#4313)
- Add release script (#4316)
- Renamed
RedisConfigtoRedisTargetto avoid name conflicts with the existingconfig. (#4363) - Remove subscribed_actions config, and change AuditLogReporter to AuditLogMonitor. (#4400)
Full Changelog
Check out the full changelog until this release (25.8.0).
Full Commit Logs
Check out the full commit logs between release (25.7.0) and (25.8.0).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.6.6
Features
- Add VFolder force-delete API to Python client SDK (#4353)
Fixes
- Add missing
KernelStatus.ERRORto dead kernel status set (#4371) - Revert addition of
SessionStatus.ERRORandKernelStatus.ERRORto dead status sets (#4384) - Fix event handling observer to report success or failure after handling completes (#4392)
- Change pyzmq version on python-kernel, compatible with python 3.13 (#4405)
- Fix
vfolder lsCLI command which referred deprecated response schema fields. (#4425) - Fix
backend.ai admin resource usage-per-periodCLI command. (#4429) - Fixed a loophole where consume could be missing at event dispatcher startup time (#4444)
- Fix issue preventing admins from leaving invited vfolders (#4446)
- Fixed session environment variable init during route creation when
endpoint.environisNone(#4447) - Fix issue preventing users from uploading files to compute sessions (#4457)
Miscellaneous
- Remove subscribed_actions config, and change AuditLogReporter to AuditLogMonitor. (#4400)
Full Changelog
Check out the full changelog until this release (25.6.6).
Full Commit Logs
Check out the full commit logs between release (25.6.5) and (25.6.6).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.6.6rc1
Features
- Add VFolder force-delete API to Python client SDK (#4353)
Fixes
- Add missing
KernelStatus.ERRORto dead kernel status set (#4371) - Revert addition of
SessionStatus.ERRORandKernelStatus.ERRORto dead status sets (#4384) - Fix event handling observer to report success or failure after handling completes (#4392)
- Change pyzmq version on python-kernel, compatible with python 3.13 (#4405)
- Fix
vfolder lsCLI command which referred deprecated response schema fields. (#4425) - Fix
backend.ai admin resource usage-per-periodCLI command. (#4429)
Miscellaneous
- Remove subscribed_actions config, and change AuditLogReporter to AuditLogMonitor. (#4400)
Full Changelog
Check out the full changelog until this release (25.6.6rc1).
Full Commit Logs
Check out the full commit logs between release (25.6.5) and (25.6.6rc1).
- Python
Published by github-actions[bot] 9 months ago
https://github.com/lablup/backend.ai - 25.6.5
Fixes
- Agent retries retrieving kernel service info if it fails during the kernel creation step (#4321)
- Add TypeError handling in redis_helper (#4339)
- Add default value of task_info value (#4340)
- Use label's items for making resource info (#4341)
Full Changelog
Check out the full changelog until this release (25.6.5).
Full Commit Logs
Check out the full commit logs between release (25.6.4) and (25.6.5).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.4
Fixes
- Fix invalid msg_id type in hiredis message queue (#4309)
Full Changelog
Check out the full changelog until this release (25.6.4).
Full Commit Logs
Check out the full commit logs between release (25.6.3) and (25.6.4).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.7.0
No significant changes.
Full Changelog
Check out the full changelog until this release (25.7.0).
Full Commit Logs
Check out the full commit logs between release (25.6.3) and (25.7.0).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.3
No significant changes.
Full Changelog
Check out the full changelog until this release (25.6.3).
Full Commit Logs
Check out the full commit logs between release (25.6.2) and (25.6.3).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.2
Fixes
- Fix
Images.supported_acceleratorsGQL field not containing every accelerators that image supports (#4230) - Resolve
ImageAliasDatatype not found issue inImageAliasData.to_dataclass(). (#4232) - Filter out duplicate vFolder mounts when enqueueing sessions (#4247)
- Fix invalid logging format by replacing percent-style (
%s) with brace-style ({}) for compatibility withBraceStyleAdapter(#4256) - Added missing
versionfield to theprojectsection in pyproject.toml to resolve build failures when installing the package via pip from a Git repository (#4259) - Add missing parser to
AuditLogNode'sstatus,durationfields in queryfilter. (#4260) - Fix kernel cleanup by ensuring kernels and their containers are properly destroyed during initialization failures (#4264)
- Fix wrong fieldspec in
User(#4268) - Make enable reading both Enum name and Enum value (#4269)
- Improve duplicate vfolder mount detection by checking both folder ID and subpath, allowing multiple mounts from the same folder with different subpaths (#4274)
- Fix wrong value of Session download-single file API result (#4276)
- Upgrade pyjwt version due to security vulnerability (#4287)
Full Changelog
Check out the full changelog until this release (25.6.2).
Full Commit Logs
Check out the full commit logs between release (25.6.1) and (25.6.2).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.1
Features
- Add more filterspec to
EndPointGQL. (#4210)
Fixes
- Fix wrong types in
ResourcePolicyGQL modifier. (#4154) - Fix VFolder clone by correcting access to user info (#4214)
- Skip destroying session's network if session does not have any network (#4215)
- Calculate RBAC admin role in project correctly (#4218)
- Fix extra mounts of model services (#4224)
- Add missing
dtparseforcreated_atin queryfilter (#4225) - Revert response schema of vfolder mkdir API (#4227)
Full Changelog
Check out the full changelog until this release (25.6.1).
Full Commit Logs
Check out the full commit logs between release (25.6.0) and (25.6.1).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.0
Breaking Changes
- Add
force,nopruneoptions toPurgeImagesGQL API, and allowPurgeImagesto be performed on multiple agents (breaking change). (#3987)
Features
- Add
enable_model_foldersoption to control the visibility of the Models tab and MODEL usage mode on the Data page in the Backend.AI Web UI. (#3503) - Add
project_nodefield of type GroupConnection to GQLUserNodetype (#3529) - Make
imagestable'sresourcecolumn to contain only admin-customized values (Not updated when rescanning the image), and make custom resource limits non-volatile. (#3986) - Add
AuditLogGQL interface. (#4001) - Add vfolder services and processes (#4002)
- Add
domainservice & processors. (#4012) - Add
groupservice and processors (#4026) - Add GQL query for user utilization metrics (#4027)
- Add Action Test Code for
Domain(#4030) - Add
UserService & Processors (#4058) - Refactor Redis message queue to follow ABC pattern (#4064)
- Use
http.HTTPStatusenum for HTTP status codes (#4069) - Add
AuditLogActionMonitor and Reporter. (#4087) - (#4091)
- Add new metrics reporters that report raw utilization values without hook modifications (#4099)
- Align action architecture code (#4103)
- Add RequestID context & middleware (#4104)
- Add available min/max resource slot fields to Resource group GQL types (#4108)
- Add
SMTPActionMonitor and Reporter. (#4118) - Update parser to process model definition YAML according to YAML 1.2 spec (#4124)
- Add
action_idin ActionProcessor for tracking Action. (#4131) - Make a reporter hub for configurable monitoring (#4165)
- Add prometheus monitor (#4167)
- Add sample configuration for
audit_log,smtpreporter. (#4170) - Validate SSH keypairs when upload them (#4176)
- Customize SMTP mail template. (#4207)
Fixes
- Remove
privatevalue in kernel-feature label before commiting images to list committed images on the session launcher (#3641) - Unload the removed docker images from Redis cache. (#3923)
- Fix customized image visibility issue. (#3939)
- Add
Imageservice & processors. (#3997) - Add
Resourceservice & processors. (#4016) - Add missing newline at end of customized dotfiles. (#4047)
- Add
Sessionservice & processors. (#4061) - Remove wrong
Acceptheader from Image rescanning logic. (#4066) - Setup source at producer creation (#4068)
- Add missing
AuditLogmodule import to ensure AuditLog table created at initial installation. (#4079) - Change default value of
domaintable columns (#4081) - Avoid kernel DB full scan when resolving GQL Agent queries (#4086)
- Filter Compute Session list query by project when project id scope is specified (#4089)
- Fix pydantic validation error from wrong type aliasing. (#4094)
- Replace aiodataloader-ng (a fork based on 0.2.1) with the managed upstream aiodataloader package (0.4.2) (#4098)
- Fix wrong name of quota scope type fields (#4123)
- Fix wrong project-id parsing when creating project vfolders (#4144)
- Fix broken model service creation logic due to field name change. (#4159)
- Fix broken single image rescan on
HarborRegistryV2. (#4161) - Fix Broken
UntagImageFromRegistryGQL mutation. (#4177) - Replace
aiosmtplibwithsmtplibin SMTP reporter. (#4179) - Fix wrong custom image owner check logic. (#4181)
- Add missing
application/vnd.oci.image.manifest.v1type handling on HarborRegistryV2's single image rescan. (#4188) - Fix the representation that shows invited VFolder users having VFolder deletion permissions (#4190)
- Fix wrong
Acceptheader handling in image rescanning. (#4191) - Fix VFolder permission update mutation not working. (#4194)
- Fix
chunk too bigerror of admin image rescan command. (#4198) - Unify BaseAction's
operation_typeconvention. (#4200)
External Dependency Updates
- Update most native dependencies to make them compatible with Python 3.13, and downgrade multidict to avoid memory leak in the upstream (#4122)
- Retire etcetra in favor of etcd-client-py (#4152)
- Upgrade the base CPython from 3.12.8 to 3.13.3, which will bring huge performance improvements with asyncio (#4153)
Full Changelog
Check out the full changelog until this release (25.6.0).
Full Commit Logs
Check out the full commit logs between release (25.5.2) and (25.6.0).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.0rc4
Breaking Changes
- Add
force,nopruneoptions toPurgeImagesGQL API, and allowPurgeImagesto be performed on multiple agents (breaking change). (#3987)
Features
- Add
enable_model_foldersoption to control the visibility of the Models tab and MODEL usage mode on the Data page in the Backend.AI Web UI. (#3503) - Add
project_nodefield of type GroupConnection to GQLUserNodetype (#3529) - Make
imagestable'sresourcecolumn to contain only admin-customized values (Not updated when rescanning the image), and make custom resource limits non-volatile. (#3986) - Add
AuditLogGQL interface. (#4001) - Add vfolder services and processes (#4002)
- Add
domainservice & processors. (#4012) - Add
groupservice and processors (#4026) - Add GQL query for user utilization metrics (#4027)
- Add Action Test Code for
Domain(#4030) - Add
UserService & Processors (#4058) - Refactor Redis message queue to follow ABC pattern (#4064)
- Use
http.HTTPStatusenum for HTTP status codes (#4069) - Add
AuditLogActionMonitor and Reporter. (#4087) - (#4091)
- Add new metrics reporters that report raw utilization values without hook modifications (#4099)
- Align action architecture code (#4103)
- Add RequestID context & middleware (#4104)
- Add available min/max resource slot fields to Resource group GQL types (#4108)
- Add
SMTPActionMonitor and Reporter. (#4118) - Update parser to process model definition YAML according to YAML 1.2 spec (#4124)
- Add
action_idin ActionProcessor for tracking Action. (#4131) - Make a reporter hub for configurable monitoring (#4165)
- Add prometheus monitor (#4167)
Fixes
- Remove
privatevalue in kernel-feature label before commiting images to list committed images on the session launcher (#3641) - Unload the removed docker images from Redis cache. (#3923)
- Fix customized image visibility issue. (#3939)
- Add
Imageservice & processors. (#3997) - Add
Resourceservice & processors. (#4016) - Add missing newline at end of customized dotfiles. (#4047)
- Add
Sessionservice & processors. (#4061) - Remove wrong
Acceptheader from Image rescanning logic. (#4066) - Setup source at producer creation (#4068)
- Add missing
AuditLogmodule import to ensure AuditLog table created at initial installation. (#4079) - Change default value of
domaintable columns (#4081) - Avoid kernel DB full scan when resolving GQL Agent queries (#4086)
- Filter Compute Session list query by project when project id scope is specified (#4089)
- Fix pydantic validation error from wrong type aliasing. (#4094)
- Replace aiodataloader-ng (a fork based on 0.2.1) with the managed upstream aiodataloader package (0.4.2) (#4098)
- Fix wrong name of quota scope type fields (#4123)
- Fix wrong project-id parsing when creating project vfolders (#4144)
- Fix broken model service creation logic due to field name change. (#4159)
- Fix broken single image rescan on
HarborRegistryV2. (#4161)
External Dependency Updates
- Update most native dependencies to make them compatible with Python 3.13, and downgrade multidict to avoid memory leak in the upstream (#4122)
- Retire etcetra in favor of etcd-client-py (#4152)
- Upgrade the base CPython from 3.12.8 to 3.13.3, which will bring huge performance improvements with asyncio (#4153)
Full Changelog
Check out the full changelog until this release (25.6.0rc4).
Full Commit Logs
Check out the full commit logs between release (25.5.2) and (25.6.0rc4).
- Python
Published by github-actions[bot] 10 months ago
https://github.com/lablup/backend.ai - 25.6.0rc3
Breaking Changes
- Add
force,nopruneoptions toPurgeImagesGQL API, and allowPurgeImagesto be performed on multiple agents (breaking change). (#3987)
Features
- Add
project_nodefield of type GroupConnection to GQLUserNodetype (#3529) - Add
AuditLogGQL interface. (#4001) - Add vfolder services and processes (#4002)
- Add
domainservice & processors. (#4012) - Add
groupservice and processors (#4026) - Add Action Test Code for
Domain(#4030) - Add
UserService & Processors (#4058) - Refactor Redis message queue to follow ABC pattern (#4064)
- Use
http.HTTPStatusenum for HTTP status codes (#4069) - (#4091)
- Add new metrics reporters that report raw utilization values without hook modifications (#4099)
- Align action architecture code (#4103)
- Add RequestID context & middleware (#4104)
- Add
action_idin ActionProcessor for tracking Action. (#4131)
Fixes
- Fix customized image visibility issue. (#3939)
- Add
Imageservice & processors. (#3997) - Add
Resourceservice & processors. (#4016) - Add missing newline at end of customized dotfiles. (#4047)
- Add
Sessionservice & processors. (#4061) - Setup source at producer creation (#4068)
- Add missing
AuditLogmodule import to ensure AuditLog table created at initial installation. (#4079) - Change default value of
domaintable columns (#4081) - Avoid kernel DB full scan when resolving GQL Agent queries (#4086)
- Filter Compute Session list query by project when project id scope is specified (#4089)
- Fix pydantic validation error from wrong type aliasing. (#4094)
- Replace aiodataloader-ng (a fork based on 0.2.1) with the managed upstream aiodataloader package (0.4.2) (#4098)
- Fix wrong name of quota scope type fields (#4123)
Full Changelog
Check out the full changelog until this release (25.6.0rc3).
Full Commit Logs
Check out the full commit logs between release (25.5.2) and (25.6.0rc3).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.6.0rc2
Breaking Changes
- Add
force,nopruneoptions toPurgeImagesGQL API, and allowPurgeImagesto be performed on multiple agents (breaking change). (#3987)
Features
- Add
AuditLogGQL interface. (#4001) - Refactor Redis message queue to follow ABC pattern (#4064)
- Use
http.HTTPStatusenum for HTTP status codes (#4069) - (#4091)
Fixes
- Fix customized image visibility issue. (#3939)
- Add missing newline at end of customized dotfiles. (#4047)
- Setup source at producer creation (#4068)
- Add missing
AuditLogmodule import to ensure AuditLog table created at initial installation. (#4079) - Change default value of
domaintable columns (#4081) - Avoid kernel DB full scan when resolving GQL Agent queries (#4086)
- Fix pydantic validation error from wrong type aliasing. (#4094)
Full Changelog
Check out the full changelog until this release (25.6.0rc2).
Full Commit Logs
Check out the full commit logs between release (25.5.2) and (25.6.0rc2).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.6.0rc1
Breaking Changes
- Add
force,nopruneoptions toPurgeImagesGQL API, and allowPurgeImagesto be performed on multiple agents (breaking change). (#3987)
Features
- Add
AuditLogGQL interface. (#4001) - Refactor Redis message queue to follow ABC pattern (#4064)
- Use
http.HTTPStatusenum for HTTP status codes (#4069)
Fixes
- Fix customized image visibility issue. (#3939)
- Add missing newline at end of customized dotfiles. (#4047)
- Setup source at producer creation (#4068)
Full Changelog
Check out the full changelog until this release (25.6.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.5.2) and (25.6.0rc1).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.5.2
Features
- Split
hanging_session_scanner_ctxinto separate stale session and kernel sweepers to more robustly handle orphaned kernels, caused by session status update mismatches (#3992) - Fix typo in keypair resource policy affecting
max_concurrent_sftp_sessions(#4050) - Add raise exception in action processor (#4056)
Fixes
- Fix image rescanning wrong exception handling logic. (#4057)
- Fix docker network not created when bootstraping multi-container session (#4062)
Full Changelog
Check out the full changelog until this release (25.5.2).
Full Commit Logs
Check out the full commit logs between release (25.5.1) and (25.5.2).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.5.1
Fixes
- Fix exceeding amount of sessions removed when scaling down model service (#4037)
- Add missing migration revision history file (#4039)
- Missing
watcherarg in storage volume init methods (#4041) - Do not check storage host permission in VFolder RBAC function (#4045)
Full Changelog
Check out the full changelog until this release (25.5.1).
Full Commit Logs
Check out the full commit logs between release (25.5.0) and (25.5.1).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.5.0
Features
- Add GQL schema to query total resource slots of compute sessions in scopes or specified conditions. (#2849)
- Add
AuditLogtable. (#3712) - Improve performance of vfolder
list_host()API handler through task parallel execution. (#3935) - Add accelerator quantum size field to GQL scaling group (#3940)
- Add container labels to simplify metric queries (#3980)
- Separate internal api port (#3989)
- Make action processor work with async functions (#3999)
- Enhance session status transition management (#4029)
Fixes
- Properly handle zero-value unknown resource limits when creating session. (#3925)
- Fix
PurgeImageByIdmutation not working whenimage_aliassare present. (#3972) - Fix wrong error message logging when
model_pathdoes not exist. (#3990) - Allow superadmins to query all GQL agent nodes (#3996)
- Fix wrong JSON serialization for response of list presets API handler (#4006)
- Fix a potential race condition error in the kernel runner's OOM logger (#4008)
- Fix initialization of Storage proxy event dispatcher (#4010)
- Update common structure. (Remove
request_idfromBaseAction, addprocessors_ctx). (#4022) - Fix wrong parse of auto-mount vfolders inputs (#4025)
- Fix
modify_compute_sessionGQL mutation error caused by missing kernel loading option. (#4032)
Full Changelog
Check out the full changelog until this release (25.5.0).
Full Commit Logs
Check out the full commit logs between release (25.4.0) and (25.5.0).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.5.0rc1
Features
- Add
AuditLogtable. (#3712) - Improve performance of vfolder
list_host()API handler through task parallel execution. (#3935) - Add accelerator quantum size field to GQL scaling group (#3940)
- Add container labels to simplify metric queries (#3980)
- Separate internal api port (#3989)
- Make action processor work with async functions (#3999)
- Enhance session status transition management (#4029)
Fixes
- Properly handle zero-value unknown resource limits when creating session. (#3925)
- Fix
PurgeImageByIdmutation not working whenimage_aliassare present. (#3972) - Fix wrong error message logging when
model_pathdoes not exist. (#3990) - Allow superadmins to query all GQL agent nodes (#3996)
- Fix wrong JSON serialization for response of list presets API handler (#4006)
- Fix a potential race condition error in the kernel runner's OOM logger (#4008)
- Fix initialization of Storage proxy event dispatcher (#4010)
- Update common structure. (Remove
request_idfromBaseAction, addprocessors_ctx). (#4022) - Fix wrong parse of auto-mount vfolders inputs (#4025)
- Fix
modify_compute_sessionGQL mutation error caused by missing kernel loading option. (#4032)
Full Changelog
Check out the full changelog until this release (25.5.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.4.0) and (25.5.0rc1).
- Python
Published by github-actions[bot] 11 months ago
https://github.com/lablup/backend.ai - 25.4.0
Features
- Implement GQL API for scanning GPU allocation map (#2273)
- Add imagenode and vfoldernode fields to ComputeSession schema (#2987)
- Cache
gpu_alloc_mapin Redis, and AddRescanGPUAllocMapsmutation for update thegpu_alloc_maps. (#3293) - Collect metrics for the RPC server (#3555)
- Add
statusColumn to theImagetable, andImageRowunique constraint. (#3619) - Add
statustoImage,ImageNodeGQL Fields. (#3620) - Update
ForgetImage,ForgetImageById,ClearImagesto perform soft delete and addPurgeImageByIdAPI for hard delete. (#3628) - Implement noop storage backend (#3629)
- Assign the noop storage host to unmanaged vfolders (#3630)
- Update vfolder CLI cmd to support unmanaged vfolders (#3631)
- Implement
Imagestatus filtering logics. (e.g. adding an optional argument to theImage,ImageNodeGQL resolvers to enable querying deleted images as well.) (#3647) - Add
enforce_spreading_endpoint_replicascheduling option to theConcentratedAgentSelector, which prioritizes availability over available resource slots when selecting an agent for inference sessions. (#3693) - Implement
PurgeImagesAPI for securing storage space on a specific agent. (#3704) - Add sum of resource slots field to scaling group schema (#3707)
- Add
scaling_group_namecolumn toresource_presetstable. (#3718) - Update resource preset APIs to support mapping a resource group (#3719)
- Split redis config for each connection pool (#3725)
- Register v2 volume handler to router in storage-proxy (#3785)
- Support
application/vnd.oci.image.manifest.v1+jsontype images. (#3814) - Add
enable_interactive_login_account_switchwebserver option to control the visibility of the "Sign in with a different account" button on the interactive login page (#3835) - Add centralized action processor design (#3859)
- Export utilization metrics to Prometheus (#3878)
- Add kernel bootstrap timeout config (#3880)
- Add GQL exception and metric middleware (#3891)
- Allow delegation of service ownership when purge user (#3898)
- Add filter/order argument to GQL
resource_presetsquery (#3916)
Fixes
- Fix failure of the whole image rescanning task when there is a misconfigured container registry. (#3652)
- Insert default
domain_namevalue to vfolders (#3767) - Let Auth API handler check all keypairs owned by user (#3780)
- Fix
descriptionfield in user update CLI command, ensuring it is properly omitted when not provided (#3831) - Add missing vfolder mount permission mapping in new RBAC implementation (#3851)
- Aggregate multi-kernel session utilization in idle checker (#3861)
- Fix rescan only the latest tag in
HarborRegistryV2. (#3871) - Retry zmq socket connections from kernel (#3880)
- Fix intermittent image rescan DB serialization error due to parallel DB access of
rescan_single_registry()calls. (#3883) - GQL
vfolder_nodequery does not resolvepermissionsfield (#3900) - Fix GQL Agent
live_statresolver to properly parse UUID keys in JSON data as strings (#3928)
Full Changelog
Check out the full changelog until this release (25.4.0).
Full Commit Logs
Check out the full commit logs between release (25.3.3) and (25.4.0).
- Python
Published by github-actions[bot] 12 months ago
https://github.com/lablup/backend.ai - 25.4.0rc1
Features
- Implement GQL API for scanning GPU allocation map (#2273)
- Add imagenode and vfoldernode fields to ComputeSession schema (#2987)
- Cache
gpu_alloc_mapin Redis, and AddRescanGPUAllocMapsmutation for update thegpu_alloc_maps. (#3293) - Collect metrics for the RPC server (#3555)
- Add
statusColumn to theImagetable, andImageRowunique constraint. (#3619) - Add
statustoImage,ImageNodeGQL Fields. (#3620) - Update
ForgetImage,ForgetImageById,ClearImagesto perform soft delete and addPurgeImageByIdAPI for hard delete. (#3628) - Implement noop storage backend (#3629)
- Assign the noop storage host to unmanaged vfolders (#3630)
- Update vfolder CLI cmd to support unmanaged vfolders (#3631)
- Implement
Imagestatus filtering logics. (e.g. adding an optional argument to theImage,ImageNodeGQL resolvers to enable querying deleted images as well.) (#3647) - Add
enforce_spreading_endpoint_replicascheduling option to theConcentratedAgentSelector, which prioritizes availability over available resource slots when selecting an agent for inference sessions. (#3693) - Implement
PurgeImagesAPI for securing storage space on a specific agent. (#3704) - Add sum of resource slots field to scaling group schema (#3707)
- Add
scaling_group_namecolumn toresource_presetstable. (#3718) - Update resource preset APIs to support mapping a resource group (#3719)
- Split redis config for each connection pool (#3725)
- Register v2 volume handler to router in storage-proxy (#3785)
- Support
application/vnd.oci.image.manifest.v1+jsontype images. (#3814) - Add centralized action processor design (#3859)
- Export utilization metrics to Prometheus (#3878)
- Add kernel bootstrap timeout config (#3880)
- Add GQL exception and metric middleware (#3891)
- Allow delegation of service ownership when purge user (#3898)
- Add filter/order argument to GQL
resource_presetsquery (#3916)
Fixes
- Fix failure of the whole image rescanning task when there is a misconfigured container registry. (#3652)
- Insert default
domain_namevalue to vfolders (#3767) - Let Auth API handler check all keypairs owned by user (#3780)
- Fix
descriptionfield in user update CLI command, ensuring it is properly omitted when not provided (#3831) - Add missing vfolder mount permission mapping in new RBAC implementation (#3851)
- Aggregate multi-kernel session utilization in idle checker (#3861)
- Fix rescan only the latest tag in
HarborRegistryV2. (#3871) - Retry zmq socket connections from kernel (#3880)
- Fix intermittent image rescan DB serialization error due to parallel DB access of
rescan_single_registry()calls. (#3883) - GQL
vfolder_nodequery does not resolvepermissionsfield (#3900)
Full Changelog
Check out the full changelog until this release (25.4.0rc1).
Full Commit Logs
Check out the full commit logs between release (25.3.3) and (25.4.0rc1).
- Python
Published by github-actions[bot] 12 months ago
https://github.com/lablup/backend.ai - 25.3.3
Features
- Let endpoints with
PROVISIONINGroutes deleted without manual session removal (#3842)
Fixes
- Fix
CreateNetworkGQL mutation not working (#3843) - Fix
EndpointAutoScalingRuleNodeGQL query not working (#3845)
Full Changelog
Check out the full changelog until this release (25.3.3).
Full Commit Logs
Check out the full commit logs between release (25.3.2) and (25.3.3).
- Python
Published by github-actions[bot] 12 months ago
https://github.com/lablup/backend.ai - 25.3.3rc1
Features
- Let endpoints with
PROVISIONINGroutes deleted without manual session removal (#3842)
Fixes
- Fix
CreateNetworkGQL mutation not working (#3843) - Fix
EndpointAutoScalingRuleNodeGQL query not working (#3845)
Full Changelog
Check out the full changelog until this release (25.3.3rc1).
Full Commit Logs
Check out the full commit logs between release (25.3.2) and (25.3.3rc1).
- Python
Published by github-actions[bot] 12 months ago
https://github.com/lablup/backend.ai - 25.3.2
Fixes
- Add
service_portsfield resolver to GQL ComputeSessionNode type (#3782) - Fix the GQL VirtualFolderNode resolver to accept a filter argument (#3799)
- Fix wrong Python interpreter embedded in the installer scie builds (#3810)
- Fix a DB migration script that fails when the system has a default domain with a name other than 'default' (#3816)
- Use correct lock ID for schedulers and event producers (#3817)
- Broken image rescanning on
HarborRegistry_v1due to type error of credential value. (#3821) - Ensure that the scie build of install.config also includes the files in the folder and the yaml file. (#3824)
- Fix wrong alembic migration scripts (#3829)
Full Changelog
Check out the full changelog until this release (25.3.2).
Full Commit Logs
Check out the full commit logs between release (25.3.1) and (25.3.2).
- Python
Published by github-actions[bot] 12 months ago
https://github.com/lablup/backend.ai - 25.3.1
Features
- Add New API Logging Aligned with Pydantic (#3731)
- Configurable global lock lifetime (#3774)
- Let metric observers catch base exceptions in event handlers (#3779)
Fixes
- Fix missing argument in Redis event dispatcher initializer (#3773)
- Fix wrong Python interpreter versions included in the scie builds (#3793)
Full Changelog
Check out the full changelog until this release (25.3.1).
Full Commit Logs
Check out the full commit logs between release (25.3.0) and (25.3.1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 25.3.0
Features
- Add project scope implementation to Image RBAC. (#3035)
- Implement
ImageNodeGQL resolver based on RBAC. (#3036) - Implement
AssociationContainerRegistriesGroupsas m2m table ofcontainer_registries, andgroups. (#3065) - Implement CRUD API for managing Harbor per-project Quota. (#3090)
- Implement Image Rescanning using Harbor Webhook API. (#3116)
- Create KVS Interface (#3645)
- Add configurable setup for kernel initialization polling (#3657)
- Add a
show_kernel_listconfig to webserver to show/hide kernel list in the session detail panel.
Add configs not specified in sample.toml (#3671) * Make security policy configurable (#3680) * Make CSP configurable (#3682) * Sort vfolder list fields in compute session GQL objects (#3751)
Improvements
- Add the skeleton interface of vfolder CRUD handlers in storage-proxy (#3516)
- Apply pydantic handling decorator to VFolder APIs in storage-proxy (#3565)
- Move abc.py and storage system modules to volumes package (#3567)
- Extract listvolumes and getvolume into pool.py (#3569)
- Add Service Layer to Avoid Direct Volume and Vfolder Operations in Storage-Proxy Handler (#3588)
- Change Absolute Imports to Relative Imports in Storage-Proxy (#3685)
Fixes
- Revamp
ContainerRegistryNodeAPI. (#3424) - Change port numbers using ephemeral ports (#3614)
- Handle cancel and timeout when creating kernels (#3648)
- Correct the number of concurrent SFTP sessions queried from DB (#3654)
- Increase Backend.AI Kernel's app startup timeout (#3679)
- Fix ContainerRegistry per-project API misc bugs. (#3701)
- Fix model service not removed when auto scaling rules are set (#3711)
- Validate duplicate session names during compute session modification (#3715)
- Revert tmux version upgrade from 3.4 to 3.5a due to compatibility issues on aarch64 architecture (#3740)
- Fix compute session rename API handler to query DB correctly (#3746)
- Suppress
SELECT statement has a cartesian product between FROM element(s) "endpoints" and FROM element "endpoint_auto_scaling_rules"log (#3747)
Full Changelog
Check out the full changelog until this release (25.3.0).
Full Commit Logs
Check out the full commit logs between release (25.3.0rc1) and (25.3.0).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.7
Features
- Add configurable setup for kernel initialization polling (#3657)
- Make security policy configurable (#3680)
- Make CSP configurable (#3682)
Fixes
- Handle cancel and timeout when creating kernels (#3648)
- Correct the number of concurrent SFTP sessions queried from DB (#3654)
Full Changelog
Check out the full changelog until this release (24.09.7).
Full Commit Logs
Check out the full commit logs between release (24.09.6) and (24.09.7).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 25.2.0
Features
- Update tmux version from 3.4 to 3.5a (#3000)
- Enable per-user UID/GID set for containers via user creation and update GraphQL APIs (#3352)
- Update SDK and CLI to support per-user UID/GID configuration (#3361)
- Add timeout configuration for Docker image push (#3412)
- Add configurable directory permission for vfolders to support mount vfolders on customized UID/GID containers (#3510)
- Add new Pydantic handling api decorator for Request/Response validation (#3511)
- Add force delete API for VFolder that bypasses the trash bin (#3546)
- Add storage-watcher API to delete VFolders with elevated permissions (#3548)
Improvements
- Add skeleton vFolder handler Interface of manager (#3493)
Fixes
- Add reject middleware for web security (#2937)
- Optimize the route selection in App Proxy using
random.choices()based on the native C implementation in CPython (#3199) - Fix GQL
vfolder_mountsfield resolver ofcompute_sessiontype (#3461) - Fix empty tag image scan error in docker registry. (#3513)
- Fixed "permission denied" error by creating the
grafana-datadirectory with 757 permissions (#3570) - Fix Broken CSS by allowing
unsafe-inlinecontent security policy. (#3572) - Updated route pattern to allow any path ending with "login/" for POST requests to
/pipeline/{path:.*login/$}(#3574) - Fix vfolder delete SDK function to call 'delete by id' API rather than 'delete by name' API (#3581)
- Check intrinsic time files exist before mount (#3583)
- Fixed to ensure unique values in the mount list of the compute session (#3593)
- The installer changes from downloading the checksum files for each package separately to receiving a consolidated checksum file and using them separately. (#3597)
- Remove foreign key constraint from
EndpointRow.imagecolumn. (#3599)
Full Changelog
Check out the full changelog until this release (25.2.0).
Full Commit Logs
Check out the full commit logs between release (25.1.1) and (25.2.0).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.6
Features
- Update SDK to retrieve and use IDs for VFolder API operations instead of names (#3471)
Fixes
- Add reject middleware for web security (#2937)
- Refactor container registries' projects traversal logic of the image rescanning. (#2979)
- Fix regression of outdated
vfolderGQL resolver. (#3047) - Fix image without metadata label not working (#3341)
- Enforce VFolder name length restriction through the API schema, not by the DB column constraint (#3363)
- Fix password based SSH login not working on sessions based on certain images (#3387)
- Fix purge API to allow deletion of owner-deleted VFolders by directly retrieving VFolders using the folder ID (#3388)
- Fix formatting errors when logging exceptions raised from the current local process that did not pass our custom serialization step (#3410)
- Fix scanning and loading container images with no labels at all (
nullin the image manifests) (#3411) - Fix missing CPU architecture name lookup in
LocalRegistryto directly scan and load container images from the local Docker daemon in dev setups (#3420) - Utilization idle checker computes kernel resource usages correctly (#3442)
- Fix a mis-implementation that has prevented using UUIDs to indicate an exact vfolder when invoking the vfolder REST API (#3451)
- Fix GQL
vfolder_mountsfield resolver ofcompute_sessiontype (#3461) - Raise exception if multiple VFolders exist in decorator (#3465)
- Fix empty tag image scan error in docker registry. (#3513)
- Fix Broken CSS by allowing
unsafe-inlinecontent security policy. (#3572) - Check intrinsic time files exist before mount (#3583)
- Remove foreign key constraint from
EndpointRow.imagecolumn. (#3599)
Documentation Updates
- Deprecate non relay container registry GQL explicitly. (#3231)
Full Changelog
Check out the full changelog until this release (24.09.6).
Full Commit Logs
Check out the full commit logs between release (24.09.5) and (24.09.6).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 25.1.1
Features
- Implement fine-grained seccomp profile managed by Backend.AI Agent. (#3019)
- Enable image rescanning by project. (#3237)
- Support auto-scaling of model services by observing proxy and app-specific metrics as configured by autoscaling rules bound to each endpoint (#3277)
- Deprecate the JWT-based
X-BackendAI-SSOheader to reduce complexity in authentication process for the pipeline service (#3353) - Add Grafana and Prometheus to Docker Compose (#3458)
- Integrate Pyroscope with Backend.AI (#3459)
- Update SDK to retrieve and use IDs for VFolder API operations instead of names (#3471)
Fixes
- Refactor container registries' projects traversal logic of the image rescanning. (#2979)
- Fix regression of outdated
vfolderGQL resolver. (#3047) - Fix image without metadata label not working (#3341)
- Enforce VFolder name length restriction through the API schema, not by the DB column constraint (#3363)
- Fix password based SSH login not working on sessions based on certain images (#3387)
- Fix purge API to allow deletion of owner-deleted VFolders by directly retrieving VFolders using the folder ID (#3388)
- Fix certain customized images not being pushed to registry properly (#3391)
- Fix formatting errors when logging exceptions raised from the current local process that did not pass our custom serialization step (#3410)
- Fix scanning and loading container images with no labels at all (
nullin the image manifests) (#3411) - Fix missing CPU architecture name lookup in
LocalRegistryto directly scan and load container images from the local Docker daemon in dev setups (#3420) - Utilization idle checker computes kernel resource usages correctly (#3442)
- Filter vfolders by status before initiating a vfolder deletion task (#3446)
- Fix a mis-implementation that has prevented using UUIDs to indicate an exact vfolder when invoking the vfolder REST API (#3451)
- Fix the required state output logic in the openopi reference documentation correctly (#3460)
- Raise exception if multiple VFolders exist in decorator (#3465)
Documentation Updates
- Deprecate non relay container registry GQL explicitly. (#3231)
Miscellaneous
- Upgrade pantsbuild from 2.21 to 2.23, replacing the scie plugin with the intrinsic pex's scie build support (#3377)
Full Changelog
Check out the full changelog until this release (25.1.1).
Full Commit Logs
Check out the full commit logs between release (24.12.1) and (25.1.1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.5
Features
- Allow specifying a full shell script string in
start_commandofmodel-definition.yamlwhile preserving shell variable expansions to allow access to environment variables in service definitions (#3248) - Add several commonly used GPU configuration environment variables defined in containers by default:
GPU_TYPE,GPU_COUNT,GPU_CONFIG,GPU_MODEL_NAMEandTF_GPU_MEMORY_ALLOC(#3275)
Fixes
- Fix the TUI installer to make the install path always visible (#3029)
- Fix broken session CLI commands due to invalid initialization of
ComputeSession. (#3222) - Fix CLI test failures caused by
yarl.URL._valtype change. (#3235) - Prevent vfolder
request-downloadAPI from accessing host filesystem. (#3241) - Fix
1d42c726d8a3revision execution failing (#3254)
Miscellaneous
- Upgrade the base CPython version from 3.12.6 to 3.12.8 (#3302)
Full Changelog
Check out the full changelog until this release (24.09.5).
Full Commit Logs
Check out the full commit logs between release (24.09.4) and (24.09.5).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.12.1
Fixes
- Fix broken session CLI commands due to invalid initialization of
ComputeSession. (#3222) - Fix a regression that modifying a model service endpoint's replica count always sets it to 1 regardless of the user input (#3337)
Miscellaneous
- Fix the commit message format when assigning the PR number to an anonymous news fragment (#3309)
Full Changelog
Check out the full changelog until this release (24.12.1).
Full Commit Logs
Check out the full commit logs between release (24.12.0) and (24.12.1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.12.0
Breaking Changes
- Add
PREPAREDstatus for compute sessions and kernels to indicate completion of pre-creation tasks such as image pull (#2647) - Add new
CREATINGsession status to represent container creation phase, and redefinePREPARINGstatus to specifically indicate pre-container preparation phases (#3114)
Features
- Migrate container registry config storage from
EtcdtoPostgreSQL(#1917) - Add background task that reports manager DB status. (#2566)
- Add manager DB stat API compatible with Prometheus. (#2567)
- Allow regular users to assign agent manually if
hide-agentconfiguration is disabled (#2614) - Implement ID-based client workflow to ContainerRegistry API. (#2615)
- Rafactor Base ContainerRegistry's
scan_tagand implementMEDIA_TYPE_DOCKER_MANIFESTtype handling. (#2620) - Support GitHub Container Registry. (#2621)
- Support GitLab Container Registry. (#2622)
- Support AWS ECR Public Container Registry. (#2623)
- Support AWS ECR Private Container Registry. (#2624)
- Replace rescan command's
--localflag with local container registry record. (#2665) - Add public API webapp to allow externel services to query insensitive metrics (#2695)
- Add
projectcolumn to the images table and refactoringImageReflogic. (#2707) - Check if agent has the required image before creating compute kernels (#2721)
- Introduce network feature (#2726)
- Support docker image manifest v2 schema1. (#2815)
- Support setting health check interval for model service. (#2825)
- Add session status checker GQL mutation. (#2836)
- Add
filterandorderparameters to Group GQL Relay API. (#2863) - Add GQL
agenttype and resolver (#2873) - Add
vast_use_auth_tokenconfig to utilize VASTData API token optionally. (#2901) - Use a valid value for the
idfield in the GQL schema query resolver forContainerRegistry. (#2908) - Add GQL Relay domain query schema and resolver (#2934)
- Add
namespace,base_image_name,tagsandversionfields to GQL image schema (#2939) - Allow container user to join extra Linux groups. (#2944)
- Add filtering and ordering by
open_to_publicfield in endpoint queries (#2954) - Hide FastTrack (
pipeline) menu by default on installation byinstall-dev.shscript. (#3010) - Support batch session timeout. (#3066)
- Add an
show_non_installed_imagesoption to show all images regardless of installation on environment select section in session/service launcher page. (#3124) - Allow destroying sessions in
PULLINGstatus for all users (#3128) - Show live stats from inference framework when supported (#3133)
- Allow specifying a full shell script string in
start_commandofmodel-definition.yamlwhile preserving shell variable expansions to allow access to environment variables in service definitions (#3248) - Rename
endpoint.desired_session_counttoendpoint.replicas(#3257) - Add several commonly used GPU configuration environment variables defined in containers by default:
GPU_TYPE,GPU_COUNT,GPU_CONFIG,GPU_MODEL_NAMEandTF_GPU_MEMORY_ALLOC(#3275) - Populate
BACKEND_MODEL_NAMEenvironment variable automatically on inference session (#3281) - Fix container cleanup process failing with error
AttributeError: 'DockerKernel' object has no attribute 'network_driver'(#3286)
Improvements
- Convert VFolder deletion from blocking response to event-driven pattern (#3063)
Fixes
- Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.shfor when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Fix silent failure of
DockerAgent.push_image(),DockerAgent.pull_image(). (#2572) - Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
- Add missing implementation of wsproxy and manager CLI's log-level customization options (#2698)
- Add missing batch execution call after session starts (#2884)
- Fix a regression of the unicode-aware slug update that prevented creation of dot-prefixed (automount) vfolders (#2892)
- Fix invalid image format log spam in Agent (#2894)
- Fix wrong creation of
raw_configsin_create_kernels_in_one_agent(#2896) - Disallow
Noneid encoding inAsyncNode.to_global_id(). (#2898) - Assign valid value to
idfield inContainerRegistryNodeGQL schema query resolver. (#2899) - Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_loginconfig to enable login before every REST API call (#2911) - Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
- Fix
modify_endpoint()mutation to handle emptyJSONStringproperly for environment variables (#2922) - Fix
orderGQL query argument parser ofgroup_nodes(#2927) - Set the
postgres_readonlyflag tofalsewhen begin generic sessions (#2946) - Fix wrong container registry migration script. (#2949)
- Let GPFS client keep polling when GPFS job is running (#2961)
- Handle
IndexErrorwhen parse string toBinarySize(#2962) - Handle error when convert
shmemstring value intoBinarySize(#2972) - Make image, container_registry table's
projectcolumn nullable and improve container registry storage config migration script. (#2978) - Fix a wrong parameter when call 'recalcagentresource_occupancy()' (#2982)
- Allow the
modify_compute_sessionmutation works withoutpriorityfield in input argument and let the mutation validatesnamevalue (#2985) - Fix wrong password limit in container registry migration script. (#2986)
- Fix
architecturecondition not applied when queryimagesrows (#2989) - Deprecate
project_idGQL argument and add nullablescope_idGQL argument (#2991) - Strengthen join condition between kernels and images to prevent incorrect matches (#2993)
- Enable session commit to different registry, project. (#2997)
- Wrong field reference in
ImageNoderesolver (#3002) - Fix obsolete logic of
untag()ofHarborRegistry_v2. (#3004) - Fix
Agent.compute_containersGraphQL field by adding missing resolver (#3011) - Fix
AgentGQL Regression error. (#3013) - Fix
backend.ai appscommand's faulty argument handling logic. (#3015) - Check Vast data quota with a given name exists before creating quota and change default value of
force_loginconfig to true (#3023) - Fix model service traffics not distributed equally to every sessions when there are 10 or more replicas (#3027)
- Fix the TUI installer to make the install path always visible (#3029)
- Prevent redis password from being logged. (#3031)
- Fix
get_logs_from_agent()to raiseInstanceNotFoundexception for kernels not assigned to agents (#3032) - Fix regression of
ComputeContainerGraphQL queries due to newly introduced relationship fields (#3042) - Fix regression of the
AgentSummaryresolver caused by an incorrectbatch_load_funcassignment. (#3045) - Fix regression of
LegacyComputeSessionGraphQL queries. (#3046) - Include missing legacy logging module in the pex. (#3054)
- Change the name of deleted vfolders with a timestamp suffix when sending them to DELETE_ONGOING status to allow reuse of the vfolder name, for cases when actual deletion takes a long time (#3061)
- Fix model service not routing traffics based on traffic ratio (#3075)
- Fix the broken
ComputeContainer.batch_load_detaildue to the misuse ofselectinloadas follow-up to #3042 (#3078) - Fix session
status_infonot being updated correctly when batch executions fail, ensuring failed batch execution states are properly reflected in the sessions table (#3085) - agent not loading
krunner-extractorimage when Docker instance does not support loading XZ compressed images (#3101) - Fix outdated image string join logic in
ImageRow.image_ref. (#3125) - Allow admins to delete other users' vfolders by enabling vfolder fetching for precondition checks (#3137)
- Fix Libc version not detected on unlabeled images when image has custom entrypoint set (#3173)
- Fix service not started when
[logging].rotation-sizeconfig is set (#3174) - Allow purging vfolders by enabling name-based queries of deleted VFolders (#3176)
- Fix the issue where the value of occupying slots abnormally multiplies when creating a compute session (#3186)
- Add missing
extrafield toContainerRegistryNodeGQL query, mutations. (#3208) - Fix purge functionality that deletes VFolder records by allowing admins to query other users' VFolders (#3223)
- Fix CLI test failures caused by
yarl.URL._valtype change. (#3235) - Prevent vfolder
request-downloadAPI from accessing host filesystem. (#3241) - Fix
1d42c726d8a3revision execution failing (#3254) - Ensure the string formatting of BinarySize values containing subtle fractions to be floating point numbers (instead of scientific notations) always (#3272)
- Fix invalid API version checks in the session creation API of Manager. (#3291)
Documentation Updates
- Update the package installation documentation to include instructions on adding the manager's RPC key pair. (#2052)
Miscellaneous
- Upgrade the base CPython version from 3.12.6 to 3.12.8 (#3302)## 24.09.0b1 (2024-09-30)
Features
- Add support for optional payload encryption in the client SDK and CLI as a follow-up to #484 (#493)
- Allow unicode characters in project(user group) name and domain name. (#1663)
- Improve exception logging stability by pre-formatting exception objects instead of pickling/unpickling them (#1759)
- Add new API to create new image from live session (#1973)
- Clear
error_logsrecords in theclear-historycommand (#1989) - Introduce
mgr schema dump-historyandmgr schema apply-missing-revisionscommand to ease the major upgrade involving deviation of database migration histories (#2002) - Update
image forgetCLI command to untag image from registry before forgetting it from the database (#2010) - Update
etcd-client-pyto 0.3.0 (#2014) - Allow self-ssh in single-node single-container compute sessions. (#2032)
- Prevent deleting mounted folders. (#2036)
- Allow agent to report its internal registry snapshot via UNIX domain socket server (#2038)
- New redis client (experimental) (#2041)
- Expose user info to environment variables (#2043)
- Introduce the
rolling_countGraphQL field to provide the current rate limit counter for a keypair within the designated time window slice (#2050) - Deprecate the reliance on HTTP cookies for authenticating the pipeline service, switching to the use of HTTP headers instead (#2051)
- Allow user to explicitly set filename of model definition YAML (#2063)
- Add the
backend.ai plugin scancommand to inspect the plugin scan results from various entrypoint sources (#2070) - Bring back etcetra-backed Etcd as an option for ditributed lock backend (#2079)
- Enable distribute-lock configuration (#2080)
- Cache volume objects in
RootContext.get_volume(#2081) - Revamp images GQL query by changing image filtering from flag-based to feature set-based and add
aliasesfield to customized image GQL schema (#2136) - Added missing fields for
keypair_resource_policyin client-py, models, etc. (#2146) - Add parameters to
check-presetsSDK function (#2153) - Add relay-aware
VirtualFolderNodeGQL Query (#2165) - Also perform basic model service validation process when updating model service via
ModifyEndpoint(#2167) - Add support for mounting arbitrary VFolders on model service session (#2168)
- Add support for CentOS 8 based kernels (#2220)
- Clear zombie routes automatically (#2229)
- Add
scaling_group.agent_count_by_statusandscaling_group.agent_total_resource_slots_by_statusGQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Allow modifying model service session's environment variable setup (#2255)
- Add
endpoint.runtime_variantcolumn (#2256) - Add new API to show list of supported inference runtimes (#2258)
- Add support for model service provisioning without
model-definition.yaml(#2260) - Allow superadmins to force-update session status through destroy API. (#2275)
- Add session status check & update API. (#2312)
- Add support for fetching container logs of a specific kernel. (#2364)
- Introduce Python native WSProxy (#2372)
- Implement scanning plugin entrypoints of external packages (#2377)
- Add
row_id,typeandcontainer_registryfields to theGroupNodeGQL schema. (#2409) - Add support for PureStorage RapidFiles Toolkit v2 (#2419)
- Add API that extends lifespan of webserver's login session. (#2456)
- Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
- Match container's timezone to container host OS when available (#2503)
- Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
- Now Backend.AI can run arbitrary container images without Backend.AI-specific metadata labels by introducing good default values and replacing intrinsic kernel-runner binaries with statically built ones (#2582)
- Allow
Beareras valid token type on model service authentication (#2583) - Introduce automatic creation of a 'model-store' group upon inserting a new domain. (#2611)
- Add support for declaring custom description field for GraphQL
relayedge types. (#2643) - Add an
enable_LLM_playgroundoption to show/hide the LLM playground tab on the serving page. (#2677) - Add
max_gaudi2_devices_per_containerconfig on webserver (#2685) - Add
max_atom_plus_device_per_containerconfig on webserver (#2686) - Introduce Account-manager component. (#2688)
- * Add query depth limit config of GQL.
- Add page size limit config of GQL Connection.
- Set default page size of GQL Connection to 10. (#2709)
- Add compute session GQL Relay query schema. (#2711)
- Allow
DataLoaderManagerto get a loader function by function itself rather than function name. (#2717) - Allow filter and order in endpointlist gql request. (#2723)
- Add new vfolder API to update sharing status. (#2740)
- Avoid raising a type error even if a particular table in the toml file is empty, as long as the default value for all settings exists. (#2782)
- Add an explicit configuration
scaling-group-typetoagent.tomlso that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796) - Add per-session priority attributes and
ModifyComputeSessionGraphQL mutation to update session names and priorities (#2840) - Add dependee/dependent/graph ComputeSessionNode connection queries (#2844)
- Implement the priority-aware scheduler that applies to any arbitrary scheduler plugin (#2848)
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
Improvements
- Enable robust DB connection handling by allowing
pool-pre-pingsetting. (#1991) - Enhance update mechanism of session & kernel status. (#2311)
- Remove database-level foreign key constraints in
vfolders.{user,group}columns to decouple the timing of vfolder deletion and user/group deletion. (#2404) - Implement storage-host RBAC interface. (#2505)
- Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
- Split out
ai.backend.loggingpackage from theai.backend.commonto improve reusability and reduce the startup time (i.e., import latencies) (#2760) - Avoid using
collections.OrderedDictwhen not necessary in the manager API and client SDK (#2842)
Deprecations
- Remove no longer used
env-tester-{admin,user,user2}.shscripts and all references (#1956)
Fixes
- Merge
kernels.roleintosessions.session_typeand check the image compatibility based on comparison with theai.backend.rolelabel (#1587) - Refactor
PendingSessionScheduler intoPendingSessionscheduler andAgentSelector, and replaceroundrobinflag withAgentSelectionStrategy.RoundRobinpolicy. (#1655) - Do not omit to update session's occupying resources to DB when a kernel starts. (#1832)
- Fix DDN command output handling when exceeding quotas. (#1901)
- Explicitly specify the storage-side UID/GID when creating qtrees in the NetApp storage backend (#1983)
- Sync mismatch between
kernels.session_nameandsessions.nameand fix session-rename API to updatesession_nameof sibling kernels atomically. (#1985) - Change function default arguments from mutable object to
None. (#1986) - Revert some VFolder APIs response type to remove mismatch between
Content-Typeheader and body. (#1988) - Upgrade pants to 2.21.0.dev4 for Python 3.12 support in their embedded pex/pip versions (#1998)
- Fix Graylog log adapter not working after upgrading to Python 3.12 (#1999)
- Fix
compute_containerGraphQL query resolver functions. (#2012) - Fix harbor v2 image scanner skipping importing rest of the artifacts when any of the item does not include tag (#2015)
- Let external log viewers display more accurate, meaningful stack frames of logger invocations. (#2019)
- Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
- Fix container commit not working on certain docker engine versions (#2040)
- add omitted request fetching from client to manager about deleting vfolder in trash bin. (#2042)
- Fix a buggy restriction on VFolder deletion due to wrong query condition (#2055)
- Fix wrong usage of dataloader in GQL group resolver. (#2056)
- Ensure that vfolders, including automount vfolders, are mounted during session creation only if their status is not set to "DEAD" (i.e., deleted). (#2059)
- Fix wrong calculation of resource usage (#2062)
- Fix VFolder file operation not working when user has been granted access to shared but deleted VFolder which has same name with the normal one (#2072)
- Add missing type argument in group query (#2073)
- Let the
backend.ai mgr clear-historycommand clears session records as well as kernel records (#2077) - Fix
compute_session_listGQL query not responding on an abundant amount of sessions (#2084) - Fix VFolder invitation not accepted when inviting VFolder shares name with already deleted one (#2093)
- Fix orphan model service routes being created (#2096)
- Fix initialization of the resource usage API's kernel-level usage aggregation (#2102)
- Fix model server starting on every kernels (including sub role kernels) on multi container infernce session (#2124)
- Add missing
commit_session_to_filetoOP_EXC(#2127) - Fix wrong SQL query build for GQL Relay node (#2128)
- Pass ImageRef.canonical in
commit_session_to_file(#2134) - Handle fileset-already-exists response of
create-filsetAPI request and make sure to wait between all GPFS job polling iterations (#2144) - Skip any possible redundant quota update requests when creating new quota (#2145)
- * Fix error when calling
check_presetsClient SDK API with an invalidgroupparameter- Rewrite Client SDK to access all APIConfig fields (#2152)
- Ensure that all pending sessions are picked by schedulers (#2155)
- Fix user creation error when any model-store does not exists. (#2160)
- Fix buggy resolver of
model_cardGQL Query. (#2161) - Fix security vulnerability for
sudo_session_enabled(#2162) - Rename
endpoints.model_mount_destiationtomodel_mount_destination(#2163) - Wait for real quota scope directory creation after Netapp
create_qtree()call (#2170) - Fix wrong per-user concurrency calculation logic (#2175)
- Keep
sync_container_lifecycles()bgtask alive in a loop. (#2178) - Fix missing check for group (project) vfolder count limit and error handling with an invalid
groupparameter (#2190) - Fix model service persisting on
degradedstatus forever in rare chance when trying to delete the service (#2191) - Fix error when query or mutate GraphQL using
BigIntfield type (#2203) - Ensure that utilization idleness is checked after a set period. (#2205)
- Fix
backend.ai sshcommand execution when packaged as SCIE/PEX (#2226) - * fix
endpointsquery not working when trying to loadimage_row.aliases- fix
endpoints.statusreportingPROVISIONINGwhen its status is inDESTROYINGstate (#2233)
- fix
- Fix GQL raising error when trying to resolve
endpoints.errorsfield occasionally (#2236) - Fix
ZeroDivisionErrorin volume usage calculation by returning 0% when volume capacity is zero (#2245) - Fix GraphQL to support query to non-installed images (#2250)
- Add missing
push_imagemethod implementation to Dummy Agent (#2253) - Rename no-op
access_keyparameter ofendpoint_listGQL Query touser_uuid(#2287) - Fix
ai.backend.service-portslabel syntax broken when image does not expose built-in service port (#2288) - Improve stability of
untag_image_from_registrymutation (#2289) - SSH not working between kernels started with customized image (#2290)
- Invalid container memory capacity reported (#2291)
- Corrected an issue where the
resource_policyfield in the user model was incorrectly mapped todomain_name. (#2314) - Omit to clean containerless kernels which are still creating its container. (#2317)
- Fix model service sessions created before 24.03.5 failing to spawn (#2318)
- Image commit not working (#2319)
- model service session scheduler (
scale_services()) failing when sessions bound to active route already marked as terminated (#2320) - Fix container metric collection halted on systems with Cgroups v1 (#2321)
- Run batch execution after the batch session starts. (#2327)
- Add support for configuring
sync_container_lifecycles()task. (#2338) - Fix mismatches between responses of
/services/_runtimesand new model service creation input (#2371) - Fix incorrect check of values returned from docker stat API. (#2389)
- Shutdown agent properly by removing a code that waits a cancelled task. (#2392)
- Restrict GraphQL query to
user_nodesfield to requiresuperadminprivilege (#2401) - Handle all possible exceptions when scheduling single node session so that the status information of pending session is not empty. (#2411)
- Utilize
ExtendedJSONEncoderfor error logging to handleUUIDobjects inextra_data(#2415) - Change outdated references in event module from
kernelstosessions. (#2421) - Upgrade
inquirerto remove dependency on deprecateddistutils, which breaks up execution of the scie builds (#2424) - Allow specific status of vfolders to query to purge. (#2429)
- Update the install-dev scripts to use
pnpminstead ofnpmto speed up installation and resolve some peculiar version resolution issues related to esbuild. (#2436) - Fix a packaging issue in the
backendai-webserverscie executable due to missing explicit requirement of setuptools (#2454) - Improve pruning of non-physical filesystems when measuring disk usage in agents (#2460)
- Update the install-dev scripts to install
pnpmif pnpm isn't installed. (#2472) - Improve error handling of initialization failures in the kernel runner (#2478)
- Fix
BACKEND_MODEL_NAMEenvironment always overwritten to model name specified at model definition (#2481) - Do not allow assigning preopen port which collides with image's own service port definition (#2482)
- Fix GET requests with queryparams defined in API spec occasionally throwing 400 Bad Request error (#2483)
- Check null value of user mutation by
Undefinedsentinel value rather thanNone. (#2506) - Do null check on
groups.total_resource_slotsanddomains.total_resource_slotsvalue. (#2509) - Fix hearbeat processing failing when agent reports image with its name not compilant to Backend.AI's naming rule (#2516)
- Corrected a typo (
maangercorrected tomanager) in thecheck_status()API response of the storage component (#2523) - Rename
images.image_filtersGQL Query argument toimages.image_types(#2555) - Prevent session status from being transit to
PULLINGstatus event if image pull is not required (#2556) - Prevent other user's customized image from being listed as a response of
imagesGQL query (#2557) - skip resolving malformed
ModelCardGQL item (#2570) - Delete sessions DB records when purging project. (#2573)
- Initialize Redis connection pool objects with specified connection opts rather than ignoring them. (#2574)
- Fix
GET /func/folders/{folderName}API returning string literal"null"instead of null value onuserandgroupfields (#2584) - Update
GQLPrivilegeCheckMiddlewareto align with upstream changes ongraphql-corepackage (#2598) - Robust type check when idle checker fetches utilization data. (#2601)
- Skip mounting zero-byte lxcfs files when lxcfs is activated to prevent crashes in session containers (#2604)
- Fix typo in minilang query field spec and column map. (#2605)
- Remove duplicate CPU quota arguments when creating containers (#2608)
- Increase
MAX_CMD_LENof dropbear to improve compatibility with PyCharm debugger (#2613) - Silence falsy Redis timeout warnings when retrying blocking commands if the timeout does not exceed the expected command timeout (#2632)
- Fix a regression of #2483 in the session-download API used by the
backend.ai sshcommand (#2635) - Implement missing
StrEnumTypehandling inpopulate_fixture(). (#2648) - Let
GET /resource/usage/periodrequest contain data in query parameter rather than JSON body. (#2661) - Allow sudo-enabled container users to ovewrite
/usr/bin/scpand/usr/libexec/sftp-serverby unifying the intrinsic ssh binaries to use the mergeddropbearmultiexecutable. (#2667) - Update
webserverlogout API to respond with HTTP 200 OK (#2681) - Fix WSProxy not properly handling WebSocket request sent from Firefox (#2684)
- Scan parent directory of created qtree to avoid creating quota on non-existing directory. (#2696)
- Fix
list_files,get_fstab_contents,get_performance_metricandshared_vfolder_infoPython SDK function not working withValidationErrorexception printed (#2706) - Resolve the issue where the vfolder id does not match in
list_shared_vfolders. (#2731) - Handle OS Error when deleting vfolders. (#2741)
- Fix typo in Virtual-folder status update code. (#2742)
- Correct
msgpackdeserialization ofResourceSlot. (#2754) - Fix regression error of
session create_from_templatecommand. (#2761) - Silence
model_namespace warnings with pydantic-based model classes (#2765) - Change the initialization order of PackageContext to apply
target_pathcorrectly in the TUI installer (#2768) - Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
- Omit null parameter when call
usage-per-periodAPI. (#2777) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Handle container port mismatch when creating kernel. (#2786)
- Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
- Correct session status determiner function. (#2803)
- Fix
endpoint_list.total_countGQL field returning incorrect value (#2805) - Fix
Service.create()SDK method andservice createCLI command not working withUnboundLocalErrorexception (#2806) - Refresh expiration time of login session when login. (#2816)
- Fix
kernel_idassignment for main kernel log retrieval (#2820) - Use a safer TLS version (v1.2) when creating SSL sockets in the logstash handler (#2827)
- Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_groupvalue. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
Documentation Updates
- Add note about installing client library with same version as server (#1976)
- Remove deprecated
versionfrom the docker compose YAML templates in package installation docs. (#2035) - Fix a typo in the
agent.tomlexample of the package-based installation guide to have a duplicate double quote (#2069)
External Dependency Updates
- Upgrade the base runtime (CPython) version from 3.11.6 to 3.12.2 (#1994)
- Upgrade aiodocker to v0.22.0 with minor bug fixes found by improved type annotations (#2339)
- Update the halfstack containers to point the latest stable versions (#2367)
- Upgrade aiodocker to 0.22.1 to fix error handling when trying to extract the log of non-existing containers (#2402)
- Upgrade the base CPython from 3.12.2 to 3.12.4 (#2449)
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
Miscellaneous
- Wrap RPC authentication error to custom error for better logging. (#1970)
- Add
requested_slotsfield to compute session GQL type. (#1984) - Allow
pydantic.BaseModelas the API handler return schema. (#1987) - Fix incorrect version notation of GQL Field. (#1993)
- Add maxpendingsession_count field to Keypair resource policy GQL schema (#2013)
- Handle container creation exception and start exception in separate try-except contexts. (#2316)
- Fix broken the workflow call for the action that auto-assigns PR numbers to news fragments (#2358)
- Finally stabilize the hanging tests in our CI due to docker-internal races on TCP port mappings to concurrently spawned fixture containers by introducing monotonically increasing TCP port numbers (#2379)
- Further improve the monotonic port allocation logic for the test containers to remove maximum concurrency restrictions (#2396)
- Add PEX, SCIE binary build configs for the plugin subsystem. (#2422)
- * Add POST
/foldersAPI endpoints to replace DELETE APIs that require request body.- Allow
DELETErequests to have body data. (#2571)
- Allow
- Enhacne type hints for potential
Nonearguments (#2580) - Add
ai.backend.manager.models.graphqlmodule for better code base management. (#2669) - Remove Scheduler related types that are no longer used. (#2705)
- Allow adding required GQL field argument to schema. (#2712)
- Upgrade
readthedocsbuild environment to Python 3.12 (#2814)## 24.03.0rc1 (2024-03-31)
Features
- Allw filter
compute_sessionquery byuser_id. (#1805) - Allow overriding vfolder mount permissions in API calls and CLI commands to create new sessions, with addition of a generic parser of comma-separated "key=value" list for CLI args and API params (#1838)
- Always enable
ai.backend.accelerator.cuda_openin the scie-based installer (#1966) - Use
config["pipeline"]["endpoint"]as default value ofconfig["pipeline"]["frontend-endpoint"]when not provided (#1972)
Fixes
- Set single agent per kernel resource usage. (#1725)
- Abort container creation when duplicate container port definition exists (#1750)
- To update image metadata, check if the min/max values in
resource_limitsare undefined. (#1941) - Explicitly disable the user-site package detection in the krunner python commands to avoid potential conflicts with user-installed packages in
.localdirectories (#1962) - Fix
caf54fcc17abmigration to drop a primary key only if it exists and in589c764a18f1migration, add missing table arguments. (#1963)
Documentation Updates
- Update docstrings in
ai.backend.client.request.Request:fetch()andai.backend.client.request.FetchContextManageras the support for synchronous context manager has been deprecated. (#1801) - Resize font-size of footer text in ethical ads in documentation hosted by read-the-docs (#1965)
- Only resize font-size of footer text in ethical ads not in title of content in documentation (#1967)
Miscellaneous
- Revert response type of service create API. (#1979)
Full Changelog
Check out the full changelog until this release (24.12.0).
Full Commit Logs
Check out the full commit logs between release (24.09.0b1) and (24.12.0).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.4
Miscellaneous
- Add alembic revision history as of 24.03.11
Full Changelog
Check out the full changelog until this release (24.09.4).
Full Commit Logs
Check out the full commit logs between release (24.09.4rc1) and (24.09.4).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.4rc1
Miscellaneous
- Add alembic revision history as of 24.03.11
Full Changelog
Check out the full changelog until this release (24.09.4rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.3) and (24.09.4rc1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.3
Fixes
- Allow purging vfolders by enabling name-based queries of deleted VFolders (#3176)
- Fix the issue where the value of occupying slots abnormally multiplies when creating a compute session (#3186)
- Add missing
extrafield toContainerRegistryNodeGQL query, mutations. (#3208)
Full Changelog
Check out the full changelog until this release (24.09.3).
Full Commit Logs
Check out the full commit logs between release (24.09.3rc2) and (24.09.3).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.3rc2
Fixes
- Fix purge functionality that deletes VFolder records by allowing admins to query other users' VFolders (#3223)
Full Changelog
Check out the full changelog until this release (24.09.3rc2).
Full Commit Logs
Check out the full commit logs between release (24.09.3rc1) and (24.09.3rc2).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.3rc1
Fixes
- Allow purging vfolders by enabling name-based queries of deleted VFolders (#3176)
- Fix the issue where the value of occupying slots abnormally multiplies when creating a compute session (#3186)
- Add missing
extrafield toContainerRegistryNodeGQL query, mutations. (#3208)
Full Changelog
Check out the full changelog until this release (24.09.3rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.2) and (24.09.3rc1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.2
Fixes
- Allow the
modify_compute_sessionmutation works withoutpriorityfield in input argument and let the mutation validatesnamevalue (#2985) - Prevent redis password from being logged. (#3031)
- Fix regression of the
AgentSummaryresolver caused by an incorrectbatch_load_funcassignment. (#3045) - Fix outdated image string join logic in
ImageRow.image_ref. (#3125) - Allow admins to delete other users' vfolders by enabling vfolder fetching for precondition checks (#3137)
- Fix Libc version not detected on unlabeled images when image has custom entrypoint set (#3173)
- Fix service not started when
[logging].rotation-sizeconfig is set (#3174)
Full Changelog
Check out the full changelog until this release (24.09.2).
Full Commit Logs
Check out the full commit logs between release (24.09.2rc2) and (24.09.2).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.2rc2
No significant changes.
Full Changelog
Check out the full changelog until this release (24.09.2rc2).
Full Commit Logs
Check out the full commit logs between release (24.09.2rc1) and (24.09.2rc2).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.2rc1
Fixes
- Allow the
modify_compute_sessionmutation works withoutpriorityfield in input argument and let the mutation validatesnamevalue (#2985) - Prevent redis password from being logged. (#3031)
- Fix regression of the
AgentSummaryresolver caused by an incorrectbatch_load_funcassignment. (#3045) - Fix outdated image string join logic in
ImageRow.image_ref. (#3125) - Allow admins to delete other users' vfolders by enabling vfolder fetching for precondition checks (#3137)
- Fix Libc version not detected on unlabeled images when image has custom entrypoint set (#3173)
- Fix service not started when
[logging].rotation-sizeconfig is set (#3174)
Full Changelog
Check out the full changelog until this release (24.09.2rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.1) and (24.09.2rc1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.1
Features
- Allow regular users to assign agent manually if
hide-agentconfiguration is disabled (#2614) - Hide FastTrack (
pipeline) menu by default on installation byinstall-dev.shscript. (#3010) - Add an
show_non_installed_imagesoption to show all images regardless of installation on environment select section in session/service launcher page. (#3124)
Fixes
- Fix
architecturecondition not applied when queryimagesrows (#2989) - Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
- Disallow
Noneid encoding inAsyncNode.to_global_id(). (#2898) - Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
- Fix
orderGQL query argument parser ofgroup_nodes(#2927) - Set the
postgres_readonlyflag tofalsewhen begin generic sessions (#2946) - Fix wrong container registry migration script. (#2949)
- Let GPFS client keep polling when GPFS job is running (#2961)
- Handle
IndexErrorwhen parse string toBinarySize(#2962) - Handle error when convert
shmemstring value intoBinarySize(#2972) - Fix a wrong parameter when call 'recalcagentresource_occupancy()' (#2982)
- Make image, container_registry table's
projectcolumn nullable and improve container registry storage config migration script. (#2978) - Fix wrong password limit in container registry migration script. (#2986)
- Strengthen join condition between kernels and images to prevent incorrect matches (#2993)
- Enable session commit to different registry, project. (#2997)
- Wrong field reference in
ImageNoderesolver (#3002) - Fix obsolete logic of
untag()ofHarborRegistry_v2. (#3004) - Fix
Agent.compute_containersGraphQL field by adding missing resolver (#3011) - Fix
backend.ai appscommand's faulty argument handling logic. (#3015) - Check Vast data quota with a given name exists before creating quota and change default value of
force_loginconfig to true (#3023) - Fix
get_logs_from_agent()to raiseInstanceNotFoundexception for kernels not assigned to agents (#3032) - Fix regression of
ComputeContainerGraphQL queries due to newly introduced relationship fields (#3042) - Fix model service traffics not distributed equally to every sessions when there are 10 or more replicas (#3043)
- Fix regression of
LegacyComputeSessionGraphQL queries. (#3046) - Include missing legacy logging module in the pex. (#3054)
- Change the name of deleted vfolders with a timestamp suffix when sending them to DELETE_ONGOING status to allow reuse of the vfolder name, for cases when actual deletion takes a long time (#3061)
- Fix model service not routing traffics based on traffic ratio (#3075)
- Fix the broken
ComputeContainer.batch_load_detaildue to the misuse ofselectinloadas follow-up to #3042 (#3078) - Fix session
status_infonot being updated correctly when batch executions fail, ensuring failed batch execution states are properly reflected in the sessions table (#3085) - agent not loading
krunner-extractorimage when Docker instance does not support loading XZ compressed images (#3101)
Full Changelog
Check out the full changelog until this release (24.09.1).
Full Commit Logs
Check out the full commit logs between release (24.09.1rc2) and (24.09.1).
- Python
Published by github-actions[bot] about 1 year ago
https://github.com/lablup/backend.ai - 24.09.1rc2
Fixes
- Fix
architecturecondition not applied when queryimagesrows (#2989)
Full Changelog
Check out the full changelog until this release (24.09.1rc2).
Full Commit Logs
Check out the full commit logs between release (24.09.1rc1) and (24.09.1rc2).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.09.1rc1
Fixes
- Fix missing notification of cancellation or failure of background tasks when shutting down the server (#2579)
- Disallow
Noneid encoding inAsyncNode.to_global_id(). (#2898) - Update Dellemc OneFS storage backend to correctly initialize volume object and wrong http request arguments (#2918)
- Fix
orderGQL query argument parser ofgroup_nodes(#2927) - Set the
postgres_readonlyflag tofalsewhen begin generic sessions (#2946) - Fix wrong container registry migration script. (#2949)
- Let GPFS client keep polling when GPFS job is running (#2961)
- Handle
IndexErrorwhen parse string toBinarySize(#2962) - Handle error when convert
shmemstring value intoBinarySize(#2972) - Fix a wrong parameter when call 'recalcagentresource_occupancy()' (#2982)
Full Changelog
Check out the full changelog until this release (24.09.1rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.0) and (24.09.1rc1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.09.0
Features
- Add support for optional payload encryption in the client SDK and CLI as a follow-up to #484 (#493)
- Allow unicode characters in project(user group) name and domain name. (#1663)
- Improve exception logging stability by pre-formatting exception objects instead of pickling/unpickling them (#1759)
- Add new API to create new image from live session (#1973)
- Clear
error_logsrecords in theclear-historycommand (#1989) - Introduce
mgr schema dump-historyandmgr schema apply-missing-revisionscommand to ease the major upgrade involving deviation of database migration histories (#2002) - Update
image forgetCLI command to untag image from registry before forgetting it from the database (#2010) - Update
etcd-client-pyto 0.3.0 (#2014) - Allow self-ssh in single-node single-container compute sessions. (#2032)
- Prevent deleting mounted folders. (#2036)
- Allow agent to report its internal registry snapshot via UNIX domain socket server (#2038)
- New redis client (experimental) (#2041)
- Expose user info to environment variables (#2043)
- Introduce the
rolling_countGraphQL field to provide the current rate limit counter for a keypair within the designated time window slice (#2050) - Deprecate the reliance on HTTP cookies for authenticating the pipeline service, switching to the use of HTTP headers instead (#2051)
- Allow user to explicitly set filename of model definition YAML (#2063)
- Add the
backend.ai plugin scancommand to inspect the plugin scan results from various entrypoint sources (#2070) - Bring back etcetra-backed Etcd as an option for ditributed lock backend (#2079)
- Enable distribute-lock configuration (#2080)
- Cache volume objects in
RootContext.get_volume(#2081) - Revamp images GQL query by changing image filtering from flag-based to feature set-based and add
aliasesfield to customized image GQL schema (#2136) - Added missing fields for
keypair_resource_policyin client-py, models, etc. (#2146) - Add parameters to
check-presetsSDK function (#2153) - Add relay-aware
VirtualFolderNodeGQL Query (#2165) - Also perform basic model service validation process when updating model service via
ModifyEndpoint(#2167) - Add support for mounting arbitrary VFolders on model service session (#2168)
- Add support for CentOS 8 based kernels (#2220)
- Clear zombie routes automatically (#2229)
- Add
scaling_group.agent_count_by_statusandscaling_group.agent_total_resource_slots_by_statusGQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Allow modifying model service session's environment variable setup (#2255)
- Add
endpoint.runtime_variantcolumn (#2256) - Add new API to show list of supported inference runtimes (#2258)
- Add support for model service provisioning without
model-definition.yaml(#2260) - Allow superadmins to force-update session status through destroy API. (#2275)
- Add session status check & update API. (#2312)
- Add support for fetching container logs of a specific kernel. (#2364)
- Introduce Python native WSProxy (#2372)
- Implement scanning plugin entrypoints of external packages (#2377)
- Add
row_id,typeandcontainer_registryfields to theGroupNodeGQL schema. (#2409) - Add support for PureStorage RapidFiles Toolkit v2 (#2419)
- Add API that extends lifespan of webserver's login session. (#2456)
- Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
- Match container's timezone to container host OS when available (#2503)
- Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
- Now Backend.AI can run arbitrary container images without Backend.AI-specific metadata labels by introducing good default values and replacing intrinsic kernel-runner binaries with statically built ones (#2582)
- Allow
Beareras valid token type on model service authentication (#2583) - Introduce automatic creation of a 'model-store' group upon inserting a new domain. (#2611)
- Add support for declaring custom description field for GraphQL
relayedge types. (#2643) - Add an
enable_LLM_playgroundoption to show/hide the LLM playground tab on the serving page. (#2677) - Add
max_gaudi2_devices_per_containerconfig on webserver (#2685) - Add
max_atom_plus_device_per_containerconfig on webserver (#2686) - Introduce Account-manager component. (#2688)
- * Add query depth limit config of GQL.
- Add page size limit config of GQL Connection.
- Set default page size of GQL Connection to 10. (#2709)
- Add compute session GQL Relay query schema. (#2711)
- Allow
DataLoaderManagerto get a loader function by function itself rather than function name. (#2717) - Allow filter and order in endpointlist gql request. (#2723)
- Add new vfolder API to update sharing status. (#2740)
- Avoid raising a type error even if a particular table in the toml file is empty, as long as the default value for all settings exists. (#2782)
- Add an explicit configuration
scaling-group-typetoagent.tomlso that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796) - Add per-session priority attributes and
ModifyComputeSessionGraphQL mutation to update session names and priorities (#2840) - Add dependee/dependent/graph ComputeSessionNode connection queries (#2844)
- Implement the priority-aware scheduler that applies to any arbitrary scheduler plugin (#2848)
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
Improvements
- Enable robust DB connection handling by allowing
pool-pre-pingsetting. (#1991) - Enhance update mechanism of session & kernel status. (#2311)
- Remove database-level foreign key constraints in
vfolders.{user,group}columns to decouple the timing of vfolder deletion and user/group deletion. (#2404) - Implement storage-host RBAC interface. (#2505)
- Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
- Split out
ai.backend.loggingpackage from theai.backend.commonto improve reusability and reduce the startup time (i.e., import latencies) (#2760) - Avoid using
collections.OrderedDictwhen not necessary in the manager API and client SDK (#2842)
Deprecations
- Remove no longer used
env-tester-{admin,user,user2}.shscripts and all references (#1956)
Fixes
- Merge
kernels.roleintosessions.session_typeand check the image compatibility based on comparison with theai.backend.rolelabel (#1587) - Refactor
PendingSessionScheduler intoPendingSessionscheduler andAgentSelector, and replaceroundrobinflag withAgentSelectionStrategy.RoundRobinpolicy. (#1655) - Do not omit to update session's occupying resources to DB when a kernel starts. (#1832)
- Fix DDN command output handling when exceeding quotas. (#1901)
- Explicitly specify the storage-side UID/GID when creating qtrees in the NetApp storage backend (#1983)
- Sync mismatch between
kernels.session_nameandsessions.nameand fix session-rename API to updatesession_nameof sibling kernels atomically. (#1985) - Change function default arguments from mutable object to
None. (#1986) - Revert some VFolder APIs response type to remove mismatch between
Content-Typeheader and body. (#1988) - Upgrade pants to 2.21.0.dev4 for Python 3.12 support in their embedded pex/pip versions (#1998)
- Fix Graylog log adapter not working after upgrading to Python 3.12 (#1999)
- Fix
compute_containerGraphQL query resolver functions. (#2012) - Fix harbor v2 image scanner skipping importing rest of the artifacts when any of the item does not include tag (#2015)
- Let external log viewers display more accurate, meaningful stack frames of logger invocations. (#2019)
- Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
- Fix container commit not working on certain docker engine versions (#2040)
- add omitted request fetching from client to manager about deleting vfolder in trash bin. (#2042)
- Fix a buggy restriction on VFolder deletion due to wrong query condition (#2055)
- Fix wrong usage of dataloader in GQL group resolver. (#2056)
- Ensure that vfolders, including automount vfolders, are mounted during session creation only if their status is not set to "DEAD" (i.e., deleted). (#2059)
- Fix wrong calculation of resource usage (#2062)
- Fix VFolder file operation not working when user has been granted access to shared but deleted VFolder which has same name with the normal one (#2072)
- Add missing type argument in group query (#2073)
- Let the
backend.ai mgr clear-historycommand clears session records as well as kernel records (#2077) - Fix
compute_session_listGQL query not responding on an abundant amount of sessions (#2084) - Fix VFolder invitation not accepted when inviting VFolder shares name with already deleted one (#2093)
- Fix orphan model service routes being created (#2096)
- Fix initialization of the resource usage API's kernel-level usage aggregation (#2102)
- Fix model server starting on every kernels (including sub role kernels) on multi container infernce session (#2124)
- Add missing
commit_session_to_filetoOP_EXC(#2127) - Fix wrong SQL query build for GQL Relay node (#2128)
- Pass ImageRef.canonical in
commit_session_to_file(#2134) - Handle fileset-already-exists response of
create-filsetAPI request and make sure to wait between all GPFS job polling iterations (#2144) - Skip any possible redundant quota update requests when creating new quota (#2145)
- * Fix error when calling
check_presetsClient SDK API with an invalidgroupparameter- Rewrite Client SDK to access all APIConfig fields (#2152)
- Ensure that all pending sessions are picked by schedulers (#2155)
- Fix user creation error when any model-store does not exists. (#2160)
- Fix buggy resolver of
model_cardGQL Query. (#2161) - Fix security vulnerability for
sudo_session_enabled(#2162) - Rename
endpoints.model_mount_destiationtomodel_mount_destination(#2163) - Wait for real quota scope directory creation after Netapp
create_qtree()call (#2170) - Fix wrong per-user concurrency calculation logic (#2175)
- Keep
sync_container_lifecycles()bgtask alive in a loop. (#2178) - Fix missing check for group (project) vfolder count limit and error handling with an invalid
groupparameter (#2190) - Fix model service persisting on
degradedstatus forever in rare chance when trying to delete the service (#2191) - Fix error when query or mutate GraphQL using
BigIntfield type (#2203) - Ensure that utilization idleness is checked after a set period. (#2205)
- Fix
backend.ai sshcommand execution when packaged as SCIE/PEX (#2226) - * fix
endpointsquery not working when trying to loadimage_row.aliases- fix
endpoints.statusreportingPROVISIONINGwhen its status is inDESTROYINGstate (#2233)
- fix
- Fix GQL raising error when trying to resolve
endpoints.errorsfield occasionally (#2236) - Fix
ZeroDivisionErrorin volume usage calculation by returning 0% when volume capacity is zero (#2245) - Fix GraphQL to support query to non-installed images (#2250)
- Add missing
push_imagemethod implementation to Dummy Agent (#2253) - Rename no-op
access_keyparameter ofendpoint_listGQL Query touser_uuid(#2287) - Fix
ai.backend.service-portslabel syntax broken when image does not expose built-in service port (#2288) - Improve stability of
untag_image_from_registrymutation (#2289) - SSH not working between kernels started with customized image (#2290)
- Invalid container memory capacity reported (#2291)
- Corrected an issue where the
resource_policyfield in the user model was incorrectly mapped todomain_name. (#2314) - Omit to clean containerless kernels which are still creating its container. (#2317)
- Fix model service sessions created before 24.03.5 failing to spawn (#2318)
- Image commit not working (#2319)
- model service session scheduler (
scale_services()) failing when sessions bound to active route already marked as terminated (#2320) - Fix container metric collection halted on systems with Cgroups v1 (#2321)
- Run batch execution after the batch session starts. (#2327)
- Add support for configuring
sync_container_lifecycles()task. (#2338) - Fix mismatches between responses of
/services/_runtimesand new model service creation input (#2371) - Fix incorrect check of values returned from docker stat API. (#2389)
- Shutdown agent properly by removing a code that waits a cancelled task. (#2392)
- Restrict GraphQL query to
user_nodesfield to requiresuperadminprivilege (#2401) - Handle all possible exceptions when scheduling single node session so that the status information of pending session is not empty. (#2411)
- Utilize
ExtendedJSONEncoderfor error logging to handleUUIDobjects inextra_data(#2415) - Change outdated references in event module from
kernelstosessions. (#2421) - Upgrade
inquirerto remove dependency on deprecateddistutils, which breaks up execution of the scie builds (#2424) - Allow specific status of vfolders to query to purge. (#2429)
- Update the install-dev scripts to use
pnpminstead ofnpmto speed up installation and resolve some peculiar version resolution issues related to esbuild. (#2436) - Fix a packaging issue in the
backendai-webserverscie executable due to missing explicit requirement of setuptools (#2454) - Improve pruning of non-physical filesystems when measuring disk usage in agents (#2460)
- Update the install-dev scripts to install
pnpmif pnpm isn't installed. (#2472) - Improve error handling of initialization failures in the kernel runner (#2478)
- Fix
BACKEND_MODEL_NAMEenvironment always overwritten to model name specified at model definition (#2481) - Do not allow assigning preopen port which collides with image's own service port definition (#2482)
- Fix GET requests with queryparams defined in API spec occasionally throwing 400 Bad Request error (#2483)
- Check null value of user mutation by
Undefinedsentinel value rather thanNone. (#2506) - Do null check on
groups.total_resource_slotsanddomains.total_resource_slotsvalue. (#2509) - Fix hearbeat processing failing when agent reports image with its name not compilant to Backend.AI's naming rule (#2516)
- Corrected a typo (
maangercorrected tomanager) in thecheck_status()API response of the storage component (#2523) - Rename
images.image_filtersGQL Query argument toimages.image_types(#2555) - Prevent session status from being transit to
PULLINGstatus event if image pull is not required (#2556) - Prevent other user's customized image from being listed as a response of
imagesGQL query (#2557) - skip resolving malformed
ModelCardGQL item (#2570) - Delete sessions DB records when purging project. (#2573)
- Initialize Redis connection pool objects with specified connection opts rather than ignoring them. (#2574)
- Fix
GET /func/folders/{folderName}API returning string literal"null"instead of null value onuserandgroupfields (#2584) - Update
GQLPrivilegeCheckMiddlewareto align with upstream changes ongraphql-corepackage (#2598) - Robust type check when idle checker fetches utilization data. (#2601)
- Skip mounting zero-byte lxcfs files when lxcfs is activated to prevent crashes in session containers (#2604)
- Fix typo in minilang query field spec and column map. (#2605)
- Remove duplicate CPU quota arguments when creating containers (#2608)
- Increase
MAX_CMD_LENof dropbear to improve compatibility with PyCharm debugger (#2613) - Silence falsy Redis timeout warnings when retrying blocking commands if the timeout does not exceed the expected command timeout (#2632)
- Fix a regression of #2483 in the session-download API used by the
backend.ai sshcommand (#2635) - Implement missing
StrEnumTypehandling inpopulate_fixture(). (#2648) - Let
GET /resource/usage/periodrequest contain data in query parameter rather than JSON body. (#2661) - Allow sudo-enabled container users to ovewrite
/usr/bin/scpand/usr/libexec/sftp-serverby unifying the intrinsic ssh binaries to use the mergeddropbearmultiexecutable. (#2667) - Update
webserverlogout API to respond with HTTP 200 OK (#2681) - Fix WSProxy not properly handling WebSocket request sent from Firefox (#2684)
- Scan parent directory of created qtree to avoid creating quota on non-existing directory. (#2696)
- Fix
list_files,get_fstab_contents,get_performance_metricandshared_vfolder_infoPython SDK function not working withValidationErrorexception printed (#2706) - Resolve the issue where the vfolder id does not match in
list_shared_vfolders. (#2731) - Handle OS Error when deleting vfolders. (#2741)
- Fix typo in Virtual-folder status update code. (#2742)
- Correct
msgpackdeserialization ofResourceSlot. (#2754) - Fix regression error of
session create_from_templatecommand. (#2761) - Silence
model_namespace warnings with pydantic-based model classes (#2765) - Change the initialization order of PackageContext to apply
target_pathcorrectly in the TUI installer (#2768) - Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
- Omit null parameter when call
usage-per-periodAPI. (#2777) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Handle container port mismatch when creating kernel. (#2786)
- Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
- Correct session status determiner function. (#2803)
- Fix
endpoint_list.total_countGQL field returning incorrect value (#2805) - Fix
Service.create()SDK method andservice createCLI command not working withUnboundLocalErrorexception (#2806) - Refresh expiration time of login session when login. (#2816)
- Fix
kernel_idassignment for main kernel log retrieval (#2820) - Use a safer TLS version (v1.2) when creating SSL sockets in the logstash handler (#2827)
- Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_groupvalue. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
Documentation Updates
- Add note about installing client library with same version as server (#1976)
- Remove deprecated
versionfrom the docker compose YAML templates in package installation docs. (#2035) - Fix a typo in the
agent.tomlexample of the package-based installation guide to have a duplicate double quote (#2069)
External Dependency Updates
- Upgrade the base runtime (CPython) version from 3.11.6 to 3.12.2 (#1994)
- Upgrade aiodocker to v0.22.0 with minor bug fixes found by improved type annotations (#2339)
- Update the halfstack containers to point the latest stable versions (#2367)
- Upgrade aiodocker to 0.22.1 to fix error handling when trying to extract the log of non-existing containers (#2402)
- Upgrade the base CPython from 3.12.2 to 3.12.4 (#2449)
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
Miscellaneous
- Wrap RPC authentication error to custom error for better logging. (#1970)
- Add
requested_slotsfield to compute session GQL type. (#1984) - Allow
pydantic.BaseModelas the API handler return schema. (#1987) - Fix incorrect version notation of GQL Field. (#1993)
- Add maxpendingsession_count field to Keypair resource policy GQL schema (#2013)
- Handle container creation exception and start exception in separate try-except contexts. (#2316)
- Fix broken the workflow call for the action that auto-assigns PR numbers to news fragments (#2358)
- Finally stabilize the hanging tests in our CI due to docker-internal races on TCP port mappings to concurrently spawned fixture containers by introducing monotonically increasing TCP port numbers (#2379)
- Further improve the monotonic port allocation logic for the test containers to remove maximum concurrency restrictions (#2396)
- Add PEX, SCIE binary build configs for the plugin subsystem. (#2422)
- * Add POST
/foldersAPI endpoints to replace DELETE APIs that require request body.- Allow
DELETErequests to have body data. (#2571)
- Allow
- Enhacne type hints for potential
Nonearguments (#2580) - Add
ai.backend.manager.models.graphqlmodule for better code base management. (#2669) - Remove Scheduler related types that are no longer used. (#2705)
- Allow adding required GQL field argument to schema. (#2712)
- Upgrade
readthedocsbuild environment to Python 3.12 (#2814)## 24.03.0rc1 (2024-03-31)
Features
- Allw filter
compute_sessionquery byuser_id. (#1805) - Allow overriding vfolder mount permissions in API calls and CLI commands to create new sessions, with addition of a generic parser of comma-separated "key=value" list for CLI args and API params (#1838)
- Always enable
ai.backend.accelerator.cuda_openin the scie-based installer (#1966) - Use
config["pipeline"]["endpoint"]as default value ofconfig["pipeline"]["frontend-endpoint"]when not provided (#1972) - Migrate container registry config storage from
EtcdtoPostgreSQL(#1917) - Implement ID-based client workflow to ContainerRegistry API. (#2615)
- Rafactor Base ContainerRegistry's
scan_tagand implementMEDIA_TYPE_DOCKER_MANIFESTtype handling. (#2620) - Support GitHub Container Registry. (#2621)
- Support GitLab Container Registry. (#2622)
- Support AWS ECR Public Container Registry. (#2623)
- Support AWS ECR Private Container Registry. (#2624)
- Replace rescan command's
--localflag with local container registry record. (#2665) - Add
projectcolumn to the images table and refactoringImageReflogic. (#2707) - Support docker image manifest v2 schema1. (#2815)
- Add
filterandorderparameters to Group GQL Relay API. (#2863) - Add
vast_use_auth_tokenconfig to utilize VASTData API token optionally. (#2901) - Use a valid value for the
idfield in the GQL schema query resolver forContainerRegistry. (#2908)
Fixes
- Set single agent per kernel resource usage. (#1725)
- Abort container creation when duplicate container port definition exists (#1750)
- To update image metadata, check if the min/max values in
resource_limitsare undefined. (#1941) - Explicitly disable the user-site package detection in the krunner python commands to avoid potential conflicts with user-installed packages in
.localdirectories (#1962) - Fix
caf54fcc17abmigration to drop a primary key only if it exists and in589c764a18f1migration, add missing table arguments. (#1963) - Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.shfor when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Add missing implementation of wsproxy and manager CLI's log-level customization options (#2698)
- Add missing batch execution call after session starts (#2884)
- Fix a regression of the unicode-aware slug update that prevented creation of dot-prefixed (automount) vfolders (#2892)
- Fix invalid image format log spam in Agent (#2894)
- Fix wrong creation of
raw_configsin_create_kernels_in_one_agent(#2896) - Assign valid value to
idfield inContainerRegistryNodeGQL schema query resolver. (#2899) - Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_loginconfig to enable login before every REST API call (#2911)
Documentation Updates
- Update docstrings in
ai.backend.client.request.Request:fetch()andai.backend.client.request.FetchContextManageras the support for synchronous context manager has been deprecated. (#1801) - Resize font-size of footer text in ethical ads in documentation hosted by read-the-docs (#1965)
- Only resize font-size of footer text in ethical ads not in title of content in documentation (#1967)
Miscellaneous
- Revert response type of service create API. (#1979)
Full Changelog
Check out the full changelog until this release (24.09.0).
Full Commit Logs
Check out the full commit logs between release (24.09.0rc1) and (24.09.0).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.11
Features
- Add
vast_use_auth_tokenconfig to utilize VASTData API token optionally. (#2901)
Fixes
- Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.shfor when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Fix invalid image format log spam in Agent (#2894)
- Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_loginconfig to enable login before every REST API call (#2911)
Full Changelog
Check out the full changelog until this release (24.03.11).
Full Commit Logs
Check out the full commit logs between release (24.03.10) and (24.03.11).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.09.0rc1
Features
- Migrate container registry config storage from
EtcdtoPostgreSQL(#1917) - Implement ID-based client workflow to ContainerRegistry API. (#2615)
- Rafactor Base ContainerRegistry's
scan_tagand implementMEDIA_TYPE_DOCKER_MANIFESTtype handling. (#2620) - Support GitHub Container Registry. (#2621)
- Support GitLab Container Registry. (#2622)
- Support AWS ECR Public Container Registry. (#2623)
- Support AWS ECR Private Container Registry. (#2624)
- Replace rescan command's
--localflag with local container registry record. (#2665) - Add
projectcolumn to the images table and refactoringImageReflogic. (#2707) - Support docker image manifest v2 schema1. (#2815)
- Add
filterandorderparameters to Group GQL Relay API. (#2863) - Add
vast_use_auth_tokenconfig to utilize VASTData API token optionally. (#2901) - Use a valid value for the
idfield in the GQL schema query resolver forContainerRegistry. (#2908)
Fixes
- Explicitly wait for readiness of the Docker daemon and the compose stack before pouring database fixtures in
install-dev.shfor when installing at the provisioning stage of Codespaces and integration tests in CI. (#2378) - Add missing implementation of wsproxy and manager CLI's log-level customization options (#2698)
- Add missing batch execution call after session starts (#2884)
- Fix a regression of the unicode-aware slug update that prevented creation of dot-prefixed (automount) vfolders (#2892)
- Fix invalid image format log spam in Agent (#2894)
- Fix wrong creation of
raw_configsin_create_kernels_in_one_agent(#2896) - Assign valid value to
idfield inContainerRegistryNodeGQL schema query resolver. (#2899) - Update vast quota rather than raise error when quota exists. (#2900)
- Calculate correct expiration time of VAST auth token and add
vast_force_loginconfig to enable login before every REST API call (#2911)
Full Changelog
Check out the full changelog until this release (24.09.0rc1).
Full Commit Logs
Check out the full commit logs between release (24.09.0b1) and (24.09.0rc1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.10
Features
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
- Allow
DataLoaderManagerto get a loader function by function itself rather than function name. (#2717) - Add an explicit configuration
scaling-group-typetoagent.tomlso that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796)
Improvements
- Avoid using
collections.OrderedDictwhen not necessary in the manager API and client SDK (#2842)
Fixes
- Merge
kernels.roleintosessions.session_typeand check the image compatibility based on comparison with theai.backend.rolelabel (#1587) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Fix
kernel_idassignment for main kernel log retrieval (#2820) - Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_groupvalue. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
- Add
scaling_group.agent_count_by_statusandscaling_group.agent_total_resource_slots_by_statusGQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
- Silence
model_namespace warnings with pydantic-based model classes (#2765) - Change the initialization order of PackageContext to apply
target_pathcorrectly in the TUI installer (#2768) - Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
- Omit null parameter when call
usage-per-periodAPI. (#2777) - Handle container port mismatch when creating kernel. (#2786)
- Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
- Correct session status determiner function. (#2803)
- Fix
endpoint_list.total_countGQL field returning incorrect value (#2805)
External Dependency Updates
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
Miscellaneous
- Enhacne type hints for potential
Nonearguments (#2580) - Upgrade
readthedocsbuild environment to Python 3.12 (#2814)
Full Changelog
Check out the full changelog until this release (24.03.10).
Full Commit Logs
Check out the full commit logs between release (24.03.10rc1) and (24.03.10).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.10rc1
Features
- Add support for setting a timeout when pulling Docker images and upgrade aiodocker to version 0.23.0. (#2852)
Improvements
- Avoid using
collections.OrderedDictwhen not necessary in the manager API and client SDK (#2842)
Fixes
- Merge
kernels.roleintosessions.session_typeand check the image compatibility based on comparison with theai.backend.rolelabel (#1587) - Delete vfolder invitation and permission rows when deleting vfolders. (#2780)
- Fix
kernel_idassignment for main kernel log retrieval (#2820) - Wrong count of concurrent compute sessions. (#2829)
- Create kernels with correct
scaling_groupvalue. (#2837) - Fix a regression in progress bar rendering of the TUI installer after upgrading the Textual library (#2867)
External Dependency Updates
- Upgrade Python (3.12.4 -> 3.12.6) and common/tool dependencies to prepare for Python 3.13 and apply latest fixes (#2851)
Miscellaneous
- Enhacne type hints for potential
Nonearguments (#2580) - Upgrade
readthedocsbuild environment to Python 3.12 (#2814)
Full Changelog
Check out the full changelog until this release (24.03.10rc1).
Full Commit Logs
Check out the full commit logs between release (24.03.10b3) and (24.03.10rc1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.10b3
No significant changes.
Full Changelog
Check out the full changelog until this release (24.03.10b3).
Full Commit Logs
Check out the full commit logs between release (24.03.10b2) and (24.03.10b3).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.10b2
Fixes
- Fix
Service.create()SDK method andservice createCLI command not working withUnboundLocalErrorexception (#2806)
Full Changelog
Check out the full changelog until this release (24.03.10b2).
Full Commit Logs
Check out the full commit logs between release (24.03.10b1) and (24.03.10b2).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.10b1
Features
- Allow
DataLoaderManagerto get a loader function by function itself rather than function name. (#2717) - Add an explicit configuration
scaling-group-typetoagent.tomlso that the agent could distinguish whether itself belongs to an SFTP resource group or not (#2796)
Fixes
- Fix handling of undefined values in the ModifyImage GraphQL mutation. (#2028)
- Silence
model_namespace warnings with pydantic-based model classes (#2765) - Change the initialization order of PackageContext to apply
target_pathcorrectly in the TUI installer (#2768) - Make the regex patterns to update configuration files working with multiline texts correctly in the TUI installer (#2771)
- Omit null parameter when call
usage-per-periodAPI. (#2777) - Handle container port mismatch when creating kernel. (#2786)
- Explicitly set the protected service ports depending on the resource group type and the service types (#2797)
- Correct session status determiner function. (#2803)
- Fix
endpoint_list.total_countGQL field returning incorrect value (#2805)
Full Changelog
Check out the full changelog until this release (24.03.10b1).
Full Commit Logs
Check out the full commit logs between release (24.03.9) and (24.03.10b1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.9
Features
- Allow filter and order in endpointlist gql request. (#2723)
- Add
scaling_group.agent_count_by_statusandscaling_group.agent_total_resource_slots_by_statusGQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
- Add new vfolder API to update sharing status. (#2740)
Improvements
- Enable robust DB connection handling by allowing
pool-pre-pingsetting. (#1991)
Fixes
- Correct
msgpackdeserialization ofResourceSlot. (#2754) - Fix regression error of
session create_from_templatecommand. (#2761) - Fix
GET /func/folders/{folderName}API returning string literal"null"instead of null value onuserandgroupfields (#2584) - Fix
list_files,get_fstab_contents,get_performance_metricandshared_vfolder_infoPython SDK function not working withValidationErrorexception printed (#2706) - Handle OS Error when deleting vfolders. (#2741)
- Fix typo in Virtual-folder status update code. (#2742)
Full Changelog
Check out the full changelog until this release (24.03.9).
Full Commit Logs
Check out the full commit logs between release (24.03.9rc1) and (24.03.9).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.9rc1
Features
- Allow filter and order in endpointlist gql request. (#2723)
Fixes
- Correct
msgpackdeserialization ofResourceSlot. (#2754) - Fix regression error of
session create_from_templatecommand. (#2761)
Full Changelog
Check out the full changelog until this release (24.03.9rc1).
Full Commit Logs
Check out the full commit logs between release (24.03.9b1) and (24.03.9rc1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.9b1
Features
- Add
scaling_group.agent_count_by_statusandscaling_group.agent_total_resource_slots_by_statusGQL fields to query the count and the resource allocation of agents that belong to a scaling group. (#2254) - Allow bulk association and disassociation of scaling groups with domains, user groups, and key pairs. (#2473)
- Add new vfolder API to update sharing status. (#2740)
Improvements
- Enable robust DB connection handling by allowing
pool-pre-pingsetting. (#1991)
Fixes
- Fix
GET /func/folders/{folderName}API returning string literal"null"instead of null value onuserandgroupfields (#2584) - Fix
list_files,get_fstab_contents,get_performance_metricandshared_vfolder_infoPython SDK function not working withValidationErrorexception printed (#2706) - Handle OS Error when deleting vfolders. (#2741)
- Fix typo in Virtual-folder status update code. (#2742)
Full Changelog
Check out the full changelog until this release (24.03.9b1).
Full Commit Logs
Check out the full commit logs between release (24.03.8) and (24.03.9b1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.8
Features
- Add an
enable_LLM_playgroundoption to show/hide the LLM playground tab on the serving page. (#2677) - Add
max_atom_plus_device_per_containerconfig on webserver (#2686)
Fixes
- Remove duplicate CPU quota arguments when creating containers (#2608)
- Allow sudo-enabled container users to ovewrite
/usr/bin/scpand/usr/libexec/sftp-serverby unifying the intrinsic ssh binaries to use the mergeddropbearmultiexecutable. (#2671) - Update
webserverlogout API to respond with HTTP 200 OK (#2681) - Scan parent directory of created qtree to avoid creating quota on non-existing directory. (#2696)
Full Changelog
Check out the full changelog until this release (24.03.8).
Full Commit Logs
Check out the full commit logs between release (24.03.8rc2) and (24.03.8).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.8rc2
Fixes
- skip resolving malformed
ModelCardGQL item (#2570) - Robust type check when idle checker fetches utilization data. (#2601)
- Let
GET /resource/usage/periodrequest contain data in query parameter rather than JSON body. (#2661)
Full Changelog
Check out the full changelog until this release (24.03.8rc2).
Full Commit Logs
Check out the full commit logs between release (24.03.8rc1) and (24.03.8rc2).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.8rc1
Fixes
- Rename
images.image_filtersGQL Query argument toimages.image_types(#2555) - Prevent other user's customized image from being listed as a response of
imagesGQL query (#2557) - Delete sessions DB records when purging project. (#2573)
- Implement missing
StrEnumTypehandling inpopulate_fixture(). (#2648)
External Dependency Updates
- Upgrade the intrinsic kernel-runner binaries (dropbear, scp, sftp-server, su-exec and tmux) to use statically built executables based on the latest Alpine Linux and the latest source codes (#2625)
Full Changelog
Check out the full changelog until this release (24.03.8rc1).
Full Commit Logs
Check out the full commit logs between release (24.03.7) and (24.03.8rc1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.7
Features
- Allow
Beareras valid token type on model service authentication (#2583) - Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
- Add
row_id,typeandcontainer_registryfields to theGroupNodeGQL schema. (#2409) - Add support for PureStorage RapidFiles Toolkit v2 (#2419)
- Add support for fetching container logs of a specific kernel. (#2364)
- Allow superadmins to force-update session status through destroy API. (#2275)
- Introduce Python native WSProxy (#2372)
Improvements
- Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
- Remove database-level foreign key constraints in
vfolders.{user,group}columns to decouple the timing of vfolder deletion and user/group deletion. (#2404)
Fixes
- Update
GQLPrivilegeCheckMiddlewareto align with upstream changes ongraphql-corepackage (#2598) - Check null value of user mutation by
Undefinedsentinel value rather thanNone. (#2506) - Do null check on
groups.total_resource_slotsanddomains.total_resource_slotsvalue. (#2509) - Fix hearbeat processing failing when agent reports image with its name not compilant to Backend.AI's naming rule (#2516)
- Corrected a typo (
maangercorrected tomanager) in thecheck_status()API response of the storage component (#2523) - Prevent session status from being transit to
PULLINGstatus event if image pull is not required (#2556) - Initialize Redis connection pool objects with specified connection opts rather than ignoring them. (#2574)
- Fix
BACKEND_MODEL_NAMEenvironment always overwritten to model name specified at model definition (#2481) - Do not allow assigning preopen port which collides with image's own service port definition (#2482)
- Fix GET requests with queryparams defined in API spec occasionally throwing 400 Bad Request error (#2483)
- Fix incorrect check of values returned from docker stat API. (#2389)
- Handle all possible exceptions when scheduling single node session so that the status information of pending session is not empty. (#2411)
- Improve error handling of initialization failures in the kernel runner (#2478)
- Add missing
commit_session_to_filetoOP_EXC(#2127) - Pass ImageRef.canonical in
commit_session_to_file(#2134) - Omit to clean containerless kernels which are still creating its container. (#2317)
- Run batch execution after the batch session starts. (#2327)
- Add support for configuring
sync_container_lifecycles()task. (#2338) - Restrict GraphQL query to
user_nodesfield to requiresuperadminprivilege (#2401) - Utilize
ExtendedJSONEncoderfor error logging to handleUUIDobjects inextra_data(#2415) - Change outdated references in event module from
kernelstosessions. (#2421) - Upgrade
inquirerto remove dependency on deprecateddistutils, which breaks up execution of the scie builds (#2424) - Allow specific status of vfolders to query to purge. (#2429)
- Update the install-dev scripts to use
pnpminstead ofnpmto speed up installation and resolve some peculiar version resolution issues related to esbuild. (#2436) - Fix a packaging issue in the
backendai-webserverscie executable due to missing explicit requirement of setuptools (#2454) - Improve pruning of non-physical filesystems when measuring disk usage in agents (#2460)
- Fix buggy resolver of
model_cardGQL Query. (#2161) - Keep
sync_container_lifecycles()bgtask alive in a loop. (#2178) - Shutdown agent properly by removing a code that waits a cancelled task. (#2392)
- Fix user creation error when any model-store does not exists. (#2160)
- Ensure that utilization idleness is checked after a set period. (#2205)
- Fix
ZeroDivisionErrorin volume usage calculation by returning 0% when volume capacity is zero (#2245) - Fix GraphQL to support query to non-installed images (#2250)
- Add missing
push_imagemethod implementation to Dummy Agent (#2253) - Corrected an issue where the
resource_policyfield in the user model was incorrectly mapped todomain_name. (#2314) - Fix mismatches between responses of
/services/_runtimesand new model service creation input (#2371) - Skip mounting zero-byte lxcfs files when lxcfs is activated to prevent crashes in session containers (#2604)
- Fix typo in minilang query field spec and column map. (#2605)
- Silence falsy Redis timeout warnings when retrying blocking commands if the timeout does not exceed the expected command timeout (#2632)
- Fix a regression of #2483 in the session-download API used by the
backend.ai sshcommand (#2635)
Miscellaneous
- * Add POST
/foldersAPI endpoints to replace DELETE APIs that require request body.- Allow
DELETErequests to have body data. (#2571)
- Allow
- Handle container creation exception and start exception in separate try-except contexts. (#2316)
- Finally stabilize the hanging tests in our CI due to docker-internal races on TCP port mappings to concurrently spawned fixture containers by introducing monotonically increasing TCP port numbers (#2379)
- Further improve the monotonic port allocation logic for the test containers to remove maximum concurrency restrictions (#2396)
External Dependency Updates
- Upgrade aiodocker to 0.22.1 to fix error handling when trying to extract the log of non-existing containers (#2402)
- Upgrade the base CPython from 3.12.2 to 3.12.4 (#2449)
- Upgrade aiodocker to v0.22.0 with minor bug fixes found by improved type annotations (#2339)
Full Changelog
Check out the full changelog until this release (24.03.7).
Full Commit Logs
Check out the full commit logs between release (24.03.7rc2) and (24.03.7).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.7rc2
Miscellaneous
- * Add POST
/foldersAPI endpoints to replace DELETE APIs that require request body.- Allow
DELETErequests to have body data. (#2571)
- Allow
Full Changelog
Check out the full changelog until this release (24.03.7rc2).
Full Commit Logs
Check out the full commit logs between release (24.03.7rc1) and (24.03.7rc2).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.7rc1
Features
- Allow
Beareras valid token type on model service authentication (#2583)
Fixes
- Update
GQLPrivilegeCheckMiddlewareto align with upstream changes ongraphql-corepackage (#2598)
Full Changelog
Check out the full changelog until this release (24.03.7rc1).
Full Commit Logs
Check out the full commit logs between release (24.03.7b4) and (24.03.7rc1).
- Python
Published by github-actions[bot] over 1 year ago
https://github.com/lablup/backend.ai - 24.03.7b4
Features
- Add a pre-setup configuration menu to the TUI installer to allow setting the public-facing address of Backend.AI components (#2541)
Improvements
- Optimize the query latency when fetching a large number of agents with stat metrics from Redis (#2558)
Fixes
- Check null value of user mutation by
Undefinedsentinel value rather thanNone. (#2506) - Do null check on
groups.total_resource_slotsanddomains.total_resource_slotsvalue. (#2509) - Fix hearbeat processing failing when agent reports image with its name not compilant to Backend.AI's naming rule (#2516)
- Corrected a typo (
maangercorrected tomanager) in thecheck_status()API response of the storage component (#2523) - Prevent session status from being transit to
PULLINGstatus event if image pull is not required (#2556) - Initialize Redis connection pool objects with specified connection opts rather than ignoring them. (#2574)
Full Changelog
Check out the full changelog until this release (24.03.7b4).
Full Commit Logs
Check out the full commit logs between release (24.03.7b3) and (24.03.7b4).
- Python
Published by github-actions[bot] over 1 year ago