sched_ext is a Linux kernel feature which enables implementing kernel thread schedulers in BPF and dynamically loading them. This repository contains various scheduler implementations and support utilities.
sched_ext enables safe and rapid iterations of scheduler implementations, thus radically widening the scope of scheduling strategies that can be experimented with and deployed; even in massive and complex production environments.
You can find more information, links to blog posts and recordings, in the wiki. The following are a few highlights of this repository.
- The
scx_layeredcase study concretely demonstrates the power and benefits ofsched_ext. - For a high-level but thorough overview of the
sched_ext(especially its motivation), please refer to the overview document. - For a description of the schedulers shipped with this tree, please refer to the schedulers document.
- The following video is the
scx_rustlandscheduler which makes most scheduling decisions in userspaceRustcode showing better FPS in terraria while kernel is being compiled. This doesn't mean thatscx_rustlandis a better scheduler but does demonstrate how safe and easy it is to implement a scheduler which is generally usable and can outperform the default scheduler in certain scenarios.
scx_rustland-terraria.mp4
sched_ext is supported by the upstream kernel starting from version 6.12. Both Meta and Google are fully committed to sched_ext and Meta is in the process of mass production deployment. See #kernel-feature-status for more details.
In all example shell commands, $SCX refers to the root of this repository.
All that's necessary for running sched_ext schedulers is a kernel with sched_ext support and the scheduler binaries along with the libraries they depend on. Switching to a sched_ext scheduler is as simple as running a sched_ext binary:
root@test ~# cat /sys/kernel/sched_ext/state /sys/kernel/sched_ext/*/ops 2>/dev/null disabled root@test ~# scx_simple local=1 global=0 local=74 global=15 local=78 global=32 local=82 global=42 local=86 global=54 ^Zfish: Job 1, 'scx_simple' has stopped root@test ~# cat /sys/kernel/sched_ext/state /sys/kernel/sched_ext/*/ops 2>/dev/null enabled simple root@test ~# fg Send job 1 (scx_simple) to foreground local=635 global=179 local=696 global=192 ^CEXIT: BPF scheduler unregisteredscx_simple is a very simple global vtime scheduler which can behave acceptably on CPUs with a simple topology (single socket and single L3 cache domain).
Above, we switch the whole system to use scx_simple by running the binary, suspend it with ctrl-z to confirm that it's loaded, and then switch back to the kernel default scheduler by terminating the process with ctrl-c. For scx_simple, suspending the scheduler process doesn't affect scheduling behavior because all that the userspace component does is print statistics. This doesn't hold for all schedulers.
Note: C schedulers like scx_simple were previously included in this repository but have since been moved to scx-c-examples. The schedulers in this repository now use Rust for userspace components.
In addition to terminating the program, there are two more ways to disable a sched_ext scheduler - sysrq-S and the watchdog timer. Ignoring kernel bugs, the worst damage a sched_ext scheduler can do to a system is starving some threads until the watchdog timer triggers.
As illustrated, once the kernel and binaries are in place, using sched_ext schedulers is straightforward and safe. While developing and building schedulers in this repository isn't complicated either, sched_ext makes use of many new BPF features, some of which require build tools which are newer than what many distros are currently shipping. This should become less of an issue in the future. For the time being, the following custom repositories are provided for select distros.
scx |-- scheds : Sched_ext scheduler implementations | |-- include : Shared BPF and user C include files including vmlinux.h | \-- rust : Example schedulers - userspace code written Rust \-- rust : Rust support code \-- scx_utils : Common utility library for Rust schedulers Rust schedulers : use cargo.
Dependencies:
clang: >=16 required, >=17 recommendedlibbpf: >=1.2.2 required, >=1.3 recommendedbpftool: Usually available inlinux-tools-commonor similar packageslibelf,libz,libzstd: For linking against libbpfpkg-config: For finding system librariesRusttoolchain: >=1.82
The kernel has to be built with the following configuration:
CONFIG_BPF=yCONFIG_BPF_SYSCALL=yCONFIG_BPF_JIT=yCONFIG_DEBUG_INFO_BTF=yCONFIG_BPF_JIT_ALWAYS_ON=yCONFIG_BPF_JIT_DEFAULT_ON=yCONFIG_SCHED_CLASS_EXT=y
The scx/kernel.config file includes all required and other recommended options for using sched_ext. You can append its contents to your kernel .config file to enable the necessary features.
$ cd $SCX $ cargo build --release # Build all Rust schedulers $ cargo build --release -p scx_rusty # Build specific schedulerRust schedulers are also published on crates.io:
$ cargo install scx_rustySee: CARGO BUILD
- Rust schedulers:
target/release/scx_rusty
cargo support these environment variables for BPF compilation:
BPF_CLANG: The clang command to use. (Default:clang)BPF_CFLAGS: Override all compiler flags for BPF compilationBPF_BASE_CFLAGS: Override base compiler flags (non-include)BPF_EXTRA_CFLAGS_PRE_INCL: Extra flags before include pathsBPF_EXTRA_CFLAGS_POST_INCL: Extra flags after include paths
Examples:
# Use specific clang version for Rust schedulers $ BPF_CLANG=clang-17 cargo build --releaseWith the implementation of scx_stats, schedulers no longer display statistics by default. To display the statistics from the currently running scheduler, a manual user action is required. Below are examples of how to do this.
- To check the scheduler statistics, use the
$ scx_SCHEDNAME --monitor $INTERVALfor example 0.5 - this will print the output every half a second
$ scx_bpfland --monitor 0.5Some schedulers may implement different or multiple monitoring options. Refer to --help of each scheduler for details. Most schedulers also accept --stats $INTERVAL to print the statistics directly from the scheduling instance.
scx_bpfland
$ scx_bpfland --monitor 5 [scx_bpfland] tasks -> run: 3/4 int: 2 wait: 3 | nvcsw: 3 | dispatch -> dir: 0 prio: 73 shr: 9 [scx_bpfland] tasks -> run: 4/4 int: 2 wait: 2 | nvcsw: 3 | dispatch -> dir: 1 prio: 3498 shr: 1385 [scx_bpfland] tasks -> run: 4/4 int: 2 wait: 2 | nvcsw: 3 | dispatch -> dir: 1 prio: 2492 shr: 1311 [scx_bpfland] tasks -> run: 4/4 int: 2 wait: 3 | nvcsw: 3 | dispatch -> dir: 2 prio: 3270 shr: 1748scx_rusty
$ scx_rusty --monitor 5 ###### Thu, 29 Aug 2024 14:42:37 +0200, load balance @ -265.1ms ###### cpu= 0.00 load= 0.17 mig=0 task_err=0 lb_data_err=0 time_used= 0.0ms tot= 15 sync_prev_idle= 0.00 wsync= 0.00 prev_idle= 0.00 greedy_idle= 0.00 pin= 0.00 dir= 0.00 dir_greedy= 0.00 dir_greedy_far= 0.00 dsq=100.00 greedy_local= 0.00 greedy_xnuma= 0.00 kick_greedy= 0.00 rep= 0.00 dl_clamp=33.33 dl_preset=93.33 slice=20000us direct_greedy_cpus=f kick_greedy_cpus=f NODE[00] load= 0.17 imbal= +0.00 delta= +0.00 DOM[00] load= 0.17 imbal= +0.00 delta= +0.00scx_lavd
$ scx_lavd --monitor 5 | 12 | 1292 | 3 | 1 | 8510 | 37.6028 | 2.42068 | 99.1304 | 100 | 62.8907 | 100 | 100 | 62.8907 | performance | 100 | 0 | 0 | | 13 | 2208 | 3 | 1 | 6142 | 33.3442 | 2.39336 | 98.7626 | 100 | 60.2084 | 100 | 100 | 60.2084 | performance | 100 | 0 | 0 | | 14 | 941 | 3 | 1 | 5223 | 31.323 | 1.704 | 99.215 | 100.019 | 59.1614 | 100 | 100.019 | 59.1614 | performance | 100 | 0 | 0 |scx_rustland
$ scx_rustland --monitor 5 [RustLand] tasks -> r: 1/4 w: 3 /3 | pf: 0 | dispatch -> u: 4 k: 0 c: 0 b: 0 f: 0 | cg: 0 [RustLand] tasks -> r: 1/4 w: 2 /2 | pf: 0 | dispatch -> u: 28385 k: 0 c: 0 b: 0 f: 0 | cg: 0 [RustLand] tasks -> r: 0/4 w: 4 /0 | pf: 0 | dispatch -> u: 25288 k: 0 c: 0 b: 0 f: 0 | cg: 0 [RustLand] tasks -> r: 0/4 w: 2 /0 | pf: 0 | dispatch -> u: 30580 k: 0 c: 0 b: 0 f: 0 | cg: 0 [RustLand] tasks -> r: 0/4 w: 2 /0 | pf: 0 | dispatch -> u: 30824 k: 0 c: 0 b: 0 f: 0 | cg: 0 [RustLand] tasks -> r: 1/4 w: 1 /1 | pf: 0 | dispatch -> u: 33178 k: 0 c: 0 b: 0 f: 0 | cg: 0See: services
sched-ext has been fully upstreamed as of 6.12.
A list of the breaking changes in the sched_ext kernel tree and the associated commits for the schedulers in this repo.
Want to learn how to develop a scheduler or find some useful tools for working with schedulers? See the developer guide for more details.
- scx_horoscope - An astrological CPU scheduler that makes scheduling decisions based on real-time planetary positions and zodiac signs. Tasks get boosted or penalized depending on cosmic conditions. Built for educational and entertainment purposes.
We aim to build a friendly and approachable community around sched_ext. You can reach us through the following channels:
GitHub: https://github.com/sched-ext/scxDiscord: https://discord.gg/b2J8DrWa7tMailing List: sched-ext@lists.linux.dev (for kernel development)
We also hold weekly office hours every Tuesday. Please see the #office-hours channel on Discord for details.
There are articles and videos about sched_ext, which helps you to explore sched_ext in various ways. Following are some examples:
- 2025 Linux Plumbers Conference MC
- 2024 Linux Plumbers Conference MC
Sched_extYT playlist- LWN: The extensible scheduler class (February, 2023)
- arighi's blog: Implement your own kernel CPU scheduler in Ubuntu with
sched_ext(July, 2023) - David Vernet's talk : Kernel Recipes 2023 -
sched_ext: pluggable scheduling in the Linux kernel (September, 2023) - Changwoo's blog:
sched_ext: a BPF-extensible scheduler class (Part 1) (December, 2023) - arighi's blog: Getting started with
sched_extdevelopment (April, 2024) - Changwoo's blog:
sched_ext: scheduler architecture and interfaces (Part 2) (June, 2024) - arighi's YT channel:
scx_bpflandLinux scheduler demo: topology awareness (August, 2024) - David Vernet's talk: Kernel Recipes 2024 - Scheduling with superpowers: Using
sched_extto get big perf gains (September, 2024) - arighi's talk: Kernel Recipes 2025 - Schedule Recipes (September, 2025)