Kexec handover and the live update orchestrator

发布于:2025-09-01 ⋅ 阅读:(20) ⋅ 点赞:(0)

Rebooting a computer ordinarily brings an abrupt end to any state built up by the old system; the new kernel starts from scratch. There are, however, people who would like to be able to reboot their systems without disrupting the workloads running therein. Various developers are currently partway through the project of adding this capability, in the form of "kexec handover" and the "live update orchestrator", to the kernel.
通常情况下,重启计算机会突然结束旧系统中建立的所有状态;新内核会从零开始。然而,有些人希望能够在不打断正在运行的工作负载的情况下重启系统。多个开发者目前正致力于在内核中加入这一能力,其形式就是 “kexec 交接(kexec handover)” 和 “实时更新协调器(live update orchestrator)”。

Normally, rebooting a computer is done out of the desire to start fresh, but sometimes the real objective is to refresh only some layers of the system. Consider a large machine running deep within some cloud provider's data center. A serious security or performance issue may bring about a need to update the kernel on that machine, but the kernel is not the only thing running there. The user-space layers are busily generating LLM hallucinations and deep-fake videos, and the owner of the machine would much rather avoid interrupting that flow of valuable content. If the kernel could be rebooted without disturbing the workload, there would be great rejoicing.
通常,重启计算机的目的是为了重新开始,但有时真正的目标只是刷新系统的某些层次。想象一台运行在云服务提供商数据中心深处的大型机器。一旦出现严重的安全或性能问题,就可能需要更新该机器上的内核,但内核并不是唯一在运行的东西。用户空间层正忙于生成大语言模型的“幻觉”以及深度伪造视频,而机器的所有者则更希望避免打断这些有价值内容的产出。如果能够在不影响工作负载的情况下重启内核,那将是令人非常高兴的事情。

Preserving a workload across a reboot requires somehow saving all of its state, from user-space memory to device-level information within the kernel. Simply identifying all of that state can be a challenge, preserving it even more so, as a look at the long effort behind the Checkpoint/Restore in Userspace project will make clear. All of that state must then be properly restored after the kernel is swapped out from underneath the workload. All told, it is a daunting challenge.
要在重启过程中保持工作负载,需要以某种方式保存其所有状态,从用户空间内存到内核中的设备级信息。光是识别所有这些状态就是一项挑战,更不用说保存它们了,这一点从长期进行的用户空间检查点/恢复(CRIU)项目中就能看出。然后,在内核从工作负载底层被替换掉后,所有这些状态必须被正确恢复。总体而言,这是一项艰巨的挑战。

The problem becomes a little easier, though, in the case of a system running virtualized guests. The state of the guests themselves is well encapsulated within the virtual machines, and there is relatively little hardware state to preserve. So it is not surprising that t


网站公告

今日签到

点亮在社区的每一天
去签到