3

I have a systemd service that occasionally needs to run an external tool in a sub-process.
Because the tool is sometimes memory intensive, the service runs the tool in a dedicated cgroup using a cgexec shell command:

cgexec --sticky -g memory:<cgroup> <command> 

Previously, this service was running on Ubuntu 20.04 which uses cgroups v1. I'm now migrating this service to Ubuntu 24.04 which uses cgroups v2.

The above method no longer seems to work. Here is the error message I commonly get:

cgroup change of group failed 

I'll get into possible solutions from previous answers next, but may main question is this: Is there a canonical way for a systemd service to spawn a sub-process that uses an alternate cgroup using cgroups v2?

From what I understand from previous discussions, my problem may stem from the permissions for the cgroup.procs file in the common ancestor of the cgroup for the service and the cgroup targeted by the cgexec's -g option. In my case, I believe this would be the cgroup.procs for the root cgroup.

The top answer from discussion #1: Using cgroups v2 without root suggests using cgcreate to create a group under the correct slice. In their example, they're using a user.slice. In mine it would be something like this:

cgcreate -a service-user:service-user -t service-user:service-user -g memory:system.slice/foo.service/tool 

This doesn't seem to work because the cgroups created under system.slice seem to be ephemeral, being removed when the service stops.

The top answer from discussion #2 suggests making the root cgroup.procs other-writable (e.g., sudo chmod o+w /sys/fs/cgroup/cgroup.procs). This does actually seem to work, but there was an open question about security.

1 Answer 1

3

After digging through additional documentation, I found a solution.

"Delegation" is the tool that cgroups and systemd provide for this type of use case. With delegation a service is allowed to manage its own cgroup sub-hierarchy. This includes the ability to create additional "helper" cgroups. Below is an example of the situation described in the original question followed by a solution that uses delegation.

Error example

Let's say we have this foo service:

# foo.service [Unit] Description=Foo without delegation [Service] User=foo-user ExecStart=/bin/bash /usr/bin/foo.sh [Install] WantedBy=multi-user.target 

With /usr/bin/foo.sh is defined as:

#!/usr/bin/bash # # /usr/bin/foo.sh set -euo pipefail echo "Starting foo service" while true; do echo "Calling tool in helper cgroup" cgexec --sticky -g memory:helper bash /usr/bin/memory-intensive-helper.sh done 

Note that we'd like to execute memory-intensive-helper.sh in a "helper" cgroup we've naively created using cgcreate:

sudo cgcreate -a foo-user:foo-user -t foo-user:foo-user -g memory:helper sudo bash -c 'echo $((512 * 1024 * 1024)) > /sys/fs/cgroup/helper/memory.max' 

If we start the foo.service:

sudo systemctl restart foo.service 

And watch logs:

journalctl -f -u foo.service 

We get the familiar "cgroup change of group failed" error:

systemd[1]: Started foo.service - Foo without delegation. bash[15202]: Starting foo service bash[15202]: Calling tool in helper cgroup bash[15203]: cgroup change of group failed systemd[1]: foo.service: Main process exited, code=exited, status=87/n/a systemd[1]: foo.service: Failed with result 'exit-code'. 

The reason for the error is explained by cgroup v2's "containment" behavior (see "Delegation and Containment"). Notice that our cgroup hierarchy looks like this:

/sys/fs/cgroup | +-- helper | +-- system.slice | +-- foo.service 

Systemd runs the foo.sh process in the cgroup system.slice/foo.service. foo.sh tries running memory-intensive-helper.sh in the helper cgroup. In order to do that the memory-intensive-helper.sh process would need to be moved from system.slice/foo.service to helper. To do this move cgroups v2 requires foo-user to have write access to the cgroup.procs file of the cgroup that is the common ancestor of system.slice/foo.service and helper. That would be root cgroup in this case. By default only root has access to the root cgroup, so the move fails. (Granting non-root users access to the root cgroup is discouraged.)

A solution using delegation

In the error example foo.sh's cgexec command failed because foo-user did not have full access to the cgroup hierarchy. With the Delegate and DelegateSubgroup options we can create a sub-hierarchy that the foo-user service can completely control itself:

# foo.service [Unit] Description=Foo with delegation [Service] User=foo-user Delegate=yes DelegateSubgroup=main ExecStart=/bin/bash /usr/bin/foo.sh [Install] WantedBy=multi-user.target 

When we start the foo service with those additional options, the resulting cgroup hierarchy will look like this:

/sys/fs/cgroup | +-- system.slice | +-- foo.service <-- owned by foo-user | +-- main <-- owned by foo-user 

We can then have foo.sh create any cgroups that it may need itself:

#!/usr/bin/bash # # /usr/bin/foo.sh set -euo pipefail echo "Starting foo service shell" delegated_subtree_root="/sys/fs/cgroup/foo.slice/foo.service" # Enable memory controllers in child cgroups. According to the docs [1]: # # > Resources are distributed top-down and a cgroup can further distribute a # resource only if the resource has been distributed to it from the parent. # This means that all non-root “cgroup.subtree_control” files can only contain # controllers which are enabled in the parent’s “cgroup.subtree_control” file. # # [1]: https://docs.kernel.org/admin-guide/cgroup-v2.html#top-down-constraint echo "+memory" > "${delegated_subtree_root}/cgroup.subtree_control" # Create and configure the "helper" cgroup helper_cgroup_path="${delegated_subtree_root}/helper" mkdir "$helper_cgroup_path" echo $((512 * 1024 * 1024)) > "${helper_cgroup_path}/memory.max" while true; do echo "Calling tool in helper cgroup" cgexec --sticky -g memory:system.slice/foo.service/helper memory-intensive-helper.sh sleep 2 done 

With foo.sh now creating its own helper cgroup, here is the final cgroup hierarchy:

/sys/fs/cgroup | +-- system.slice | +-- foo.service <-- owned by foo-user | +-- main <-- owned by foo-user | +-- helper <-- owned by foo-user 

With foo-user in control of the entire /sys/fs/cgroup/system.slice/foo.service sub-tree, it's now free to move processes among the leaf cgroups (i.e., main, helper).

Helpful documentation

The above solution elides some details. To fill in the gaps, I recommend the following key pieces of documentation:

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.