Skip to content

Releases: CentaurusInfra/arktos

Release v1.0

15 Mar 04:24
391c199
Compare
Choose a tag to compare

This release allows Arktos to be truly multi-tenancy platform through adopting Mizar network solution including CNI plugin and VPC/Subnet support. The release also supports some OpenStack API for VM management in Arktos. Main features in this release are:

Integrated network solution for multi-tenancy cluster (requires Ubuntu 1804 and up):

  • Automatically setup default VPC and Subnet for each tenant
  • Inherited pods & services connectivity and isolation from VPC solution
  • Automated Arktos scale out cluster with fully isolated network solution, Mizar, both in kube-up and local dev environments

Cluster initialization tool:

  • Support scale out architecture in admin cluster
  • Support Ubuntu 20.04
  • Update flannel to v0.14.0

Unified VM/Container:

  • Arktos API extension to support selected OpenStack APIs

Features in Arktos scaleout architecture:

  • Support system pods/daemonset/services deployment from all tenant partitions
  • Limit daemonset deployment to system tenant only

Scalability:

  • Scalability - 60K (3TP * 3RP) - Pod startup latency p50 < 2s, p90 < 3s, p99 < 7s

Miscellaneous enhancement and bug fix:

  • KubeClientManager in kubelet safe concurrency fix (PR 1209)
  • Reduce create network request in tenant controller (PR 1291)
  • Reduce create service request in network controller (PR 1313)

Release v0.9

01 Oct 20:14
306c447
Compare
Choose a tag to compare
Release v0.9 Pre-release
Pre-release

This release focuses on doubling the throughput of large Arktos scale-out cluster, minimizing management cost, as well as enabling service support.

Some highlights include:

  • Arktos now supports 50, 000 nodes in a cluster with only two tenant partitions (TP) and two resource partitions (RP) and with similar pod start up latency. This significantly reduces the management cost of 50,000 node Arktos cluster.
  • This release also doubles Arktos system throughput, thanks to many optimizations in API Server, Controller Manager, as well as in Kubelet. (Taking the 60% management cost reduction into consideration, each TP actually has 5 times system throughput with 2.5 times cluster size increase.)
  • Service is now supported in Arktos scale-out and scale-up architecture. Customers now can create and deploy services, and associate pods with the service in Arktos.
  • Pod start up latency and system throughput:
Release v0.8 (June 2021) v0.9 (September 2021)
System Scalability (Cluster Size) 50K (nodes in a cluster) 50K 25K
System Architecture Partition (Cost) 5 Tenant Partition (TP) & 5 Resource Partition (RP) 5x5 2x2 1x1
System Throughput (Combined QPS) 100 QPS in Server / 25 QPS in Client 200/50 200/50 100/25
Latency/Performance
(Pod Startup Latency in seconds)
P50 1.8278 1.7879 1.8307 1.7987
P90 2.7846 2.5756 2.7759 2.6265
P99 5.7178 3.7062 7.3256 4.9631

Features/Improvements/Bug fixes:

Service support:

  • Scale-out cluster is able to use flannel cni
  • Service support is enabled in local dev cluster by default

Scalability and performance tuning changes:

  • Avoid GET node for each node PATCH in kubelet (PR 835)
  • Refresh resource version with idle watchers upon watch session renewal (PR 1183)
  • Reduce pod list requests in perf test (PR 1187)
  • Cherry pick performance related community changes:
    • Use watch instead of list pods in node controller (PR 1129, 1173)
    • Disable watchcache for events (PR 1184)
  • Cherry pick perf test changes:
    • Add channel for events to PodStartupLatency (PR 1187)

Perf test tool changes:

  • Decouple proxy operation in kube-up and kubemark (PR 1105)
  • Fix Prometheus config to include HAProxy metrics (PR 1103)
  • Kubemark cluster starts partition servers in parallel (PR 1113)
  • Support skipping pod deletion phase in perf test (PR 1159)
  • Perf test config for large cluster (PR 1187)

Security fixes:

  • Bump gorilla/websocket to v1.14.2 (PR 1127)

Bug fixes:

  • Fix a bug that event client was created with wrong user agent (PR 1120)
  • Set user agent for clients when talking to API server in another partition (PR 1125, 1186)

Others:

Release v0.8

05 Jun 00:49
fb66dab
Compare
Choose a tag to compare
Release v0.8 Pre-release
Pre-release

This release continues Arktos scale-out architecture for scalability improvements to achieve 50K-node cluster scale, with support of secure communication in the scale-out framework. Main features in this release are:

Scalability scale-out architecture:

  • Support multiple resource managers. A tenant can utilize resources in any of the resource managers.
  • 50K cluster comprised of 5 tenant partitions and 5 resource managers finished density test with maximum 6s pod start up latency.
  • Communications among Arktos components support both secure and insecure mode for desired GW-cluster deployments.

Performance improvements ported from Kubernetes 1.18:

  • New scheduling process logic and improved throughput

Test and Monitoring Improvement:

  • Double garbage collection QPS in perf test tear down process, reduce perf test tearing down time by 50%
  • Enable profiling for scheduler and controller manager

Critical bug fixes:

  • Fix GC controller delete pod issue for multi-tenancy (Issue #1026)
  • Consistent etcd key paths for system tenant and regular tenants (Issue #339).

Mizar integration:

  • Add Namespace and Network Policy controllers in Mizar controllers to support network policy features in Arktos and Mizar integration

Release v0.7

06 Feb 00:07
e234a19
Compare
Choose a tag to compare
Release v0.7 Pre-release
Pre-release

This release is mainly for scalability improvements, with some functionality stabilization and build/test toolchain improvements.

Scalability scale-out architecture:

  • Introduce the new scale-out architecture in addition to the existing scale-up design. The new architecture supports multiple partitions, with each partition supporting the existing scale-up configuration.
  • Initial implementation of the scale-out architecture that:
    • Supports multiple tenant partitions and a single resource manager. Multiple resource manager will be supported in subsequent releases.
    • Passes 20K density scalability test with 2 tenant partitions and 1 resource manager.

Performance improvements ported from Kubernetes 1.18:

  • API Server cache index improvement
  • List and watch performance improvement
  • Fix watcher memory leak upon closing
  • Object serialization improvement
  • Reduce node status update frequency and pod patch update frequency

Upgrade to build/runtime golang 1.13 version

Test Improvements:

  • Make density test cases tenancy-aware
  • Improve scalability test suite for optimized configurations and better logging and tracing

Critical bug fixes:

  • Fix bugs about PV/PVC for multi-tenancy (#937)
  • Fix concurrent map write issue at AggregateWatcher stop (#825)
  • Fix memory leak in AggregatedWatcher (#787)

Release v0.6

24 Sep 23:42
40fd8c8
Compare
Choose a tag to compare
Release v0.6 Pre-release
Pre-release

This release is for the integration with Mizar.

Feature enhancements and bug fixes include:

  • Implemented gRPC client in network controller for communication with Mizar gRPC server.
  • Implemented Pod Controller that watches for Pod Create, Update, Resume, Delete, and sends the corresponding message to Mizar.
  • Implemented Node Controller that watches for Node Create, Update, Resume, Delete, and sends the corresponding message to Mizar.
  • Implemented Arktos Network Controller that watches for Arktos Network Create, Update, Resume, and sends the corresponding message to Mizar.
  • Implemented Service Controller that watches for Service Create, Update, Resume, Delete, and sends the corresponding message to mizar.
  • Update cluster IP once Mizar assigns IP address for the service. It detects service type, and if the service is dns service, it updates Arktos Network object with the same IP address. It also sends Kubernetes service endpoint port mapping to Mizar.
  • Implemented Service Endpoint Controller that watches for Service Endpoint Create, Update, Resume, and sends the corresponding message to Mizar.
  • Pods / Service respects network isolation.
  • DNS and Kubernetes services are deployed per-network.
  • Instruction document of setting up the playground lab.

Release v0.5

31 Aug 23:48
6c648e7
Compare
Choose a tag to compare
Release v0.5 Pre-release
Pre-release

Summary

This release contains new core features, scalability improvement and lots of stabilization improvements.

Some highlights include:

  • New core features such as etcd partitioning, multi-tenancy controllers & CRD, in-place vertical scaling of both container and VM, multi networking, etc.
  • Verified scalability improvement of 300K pods and 10K nodes.
  • Stabilization, improved build&test infrastructure and more test coverage.

The detailed enhancements are listed below.

Key Features and Improvements

Unified VM/Container:

  • Add features of in-place container vertical scaling.
  • Add the initial implementation of in-place VM vertical scaling, based on vcpu and memory hotplug.
  • Bump to new libvirt version in Arktos VM runtime.
  • Enable standalone deployment of Arktos VM runtime for edge scenarios.
  • Use same cgroup hierarchy for container pods and VM pods.
  • Refactor and simplify the runtime manager code in kubelet.

Multi Tenancy:

  • Add the new feature of per-tenant CRD. Each tenant can install their own CRDs without impacting each other.
  • Add the new feature of tenant-shared CRD. Tenants can share a CRD installed in system space.
  • Add support of tenant.All in client-go.
  • Add support of patch and more other commands for "--tenant" option in kubelet.
  • Update most commonly-used controllers to tenancy-aware controllers, in addition to controllers that have already been updated before:
    • Job controller
    • Volume controller, include corresponding changes in scheduler and kubelet
    • StatefulSet controller
    • Service controller
    • Resource quota controller
    • Daemonset controller
    • Cronjob controller
  • Update Tenant controller to:
    • Initialize default tenant role and role binding during tenant creation.
    • Initialize default network object during tenant creation.
    • Support tenant deletion.
  • Stabilization and various bug fixes.

Multi-Tenancy Networking:

  • Add the new CRD object "Network".
  • Initial support of per-network service IP allocation.
  • Add pod network annotations and check readiness in kubelet.
  • Initial implementation of flat network controller, with flannel network provider.

Scalability:

  • Verified support of 300K pods with 10k nodes within performance constraint.
  • Support multiple etcd clusters to shard cluster data.
  • Bump to latest etcd version and customized it for multi-etcd partitioning.
  • Stabilization and enhancement of multiple API Server partitioning.
  • Stabilization and enhancement of multiple controller partitioning.
  • Add test infrastructure support for AWS.
  • Improved build and test infrastructure.

Release v0.2

05 Apr 06:03
ede0558
Compare
Choose a tag to compare
Release v0.2 Pre-release
Pre-release

This release focuses on the stabilization of Arktos as well as new features in multi-tenancy, scalability and unified VM/Container. Major improvements include:

  • Multi-tenancy: virtualized multi-tenancy cluster based on short path and access control.
  • Scalability: API server data partitioning and performance test in AWS.
  • Unified VM/Container: Partial runtime services readiness and storage volume support.

Key Features and Improvements

Multi-tenancy:

  • Multi-tenancy design update #101
  • Tenancy short-path support #50
  • Add Tenant Controller #124
  • Tenancy-aware token Authenticator #129
  • Tenancy-aware Cert Authenticator #99
  • Tenancy-aware RBAC Authorizer #20
  • Tenancy in kubeconfig context #69
  • Stabilization, more test and workaround fixes #92

Scalability:

  • API Server Data Partitioning #105, #65
  • Tools and guidance for setting up data partitioned Arktos cluster #62
  • Add kube-up and start-kubemark for AWS #127

Unified VM/Container:

  • Add support for primary runtime #126
  • Add volume driver for OpenStack Cinder #93
  • Fix issues on VM pod vCPU settings #139

Release v0.1

04 Apr 04:52
b474e79
Compare
Choose a tag to compare
Release v0.1 Pre-release
Pre-release

This release is the first release of the project. It includes the following new components:

  • Unified Node Agent
  • Unified Scheduler
  • Partitioned and Scalable Controller Manager
  • API Server with Multi-Tenancy and Unified Pod Support
  • Arktos VM Runtime Server

Key Features and Improvements:

  • Multi-tenancy

    • Introduce a new layer “tenant” before “namespace” in API resource URL schema, to provide a clear and isolated resource hierarchy for tenants.
    • Introduce a new API resource “tenant”, to keep tenant-level configurations and properties.
    • The metadata section of all exiting API resources has a new member: tenantName.
    • API Server, ClientGo, Scheduler, Controllers and CLI changes for the new resource model.
  • Unified VM/Container:

    • Extend “pod” definition to both containers and VM. Now a pod can contain one VM, or one or more containers.
    • Enhance scheduler to schedule container pods and VM pods in the same way (unified scheduling).
    • Enhance kubelet to support multiple CRI runtimes (unified agent).
    • Implement a VM runtime server evolved from project Virtlet, with new features like VM reboot, snapshot, restore, etc.
    • Enhance kubelet to handle VM state changes and configuration changes.
    • Introduce a new API resource “action” and the corresponding handles (action framework) to support some VM specific actions which are not appropriate to be expressed as state machine changes, like reboot and snapshot.
    • Artkos Integration with OpenStack Neutron.
    • Arktos integration with Mizar.
  • Scalability

    • Implement a controller framework that supports multiple controller instances running in active-active mode.
    • Add a new component "workload controller manager" to host controllers migrated to the new framework. In this release it includes replicaset controller and deployment controller.
    • Support preliminary workload auto rebalancing based on the number of controller instances that are currently running.
    • Implement filter by range in API server to support multiple controller instances without increasing traffic.
    • API Server data partitioning (partial support).