0

I'm running into an issue with CPU throttling that only seems to trigger under a specific workload of running the Kythe indexer. Detailed repro steps at the end of the question. I'm going to give a high level summary here.

Kythe is a tool for extracting indexes from source code. I'm running Kythe under GNU Parallel for each compilation unit in LLVM (parallel will automatically run 32 processes).

The following workloads are able to max out all cores continuously for 10min+:

  • Clang compilation using Ninja. This workload is somewhat similar to indexing as it should be performing similar number of input operations; Kythe uses Clang internally to index the code. CPU temperatures hover around 75C - 80C. One probably irrelevant difference from running Kythe is that Kythe can generate indexes of around 300MB ~ 2.5GB per compilation unit, so I'm running Kythe under a small Python wrapper which creates a temporary file, lets Kythe write to it, and then deletes the file.
  • GNU Parallel running a simple busy loop (incrementing a 1K vector element-wise for 5-15 seconds, which is approximately how long Kythe takes to index a compilation unit). Temperature is similar to above.

However, running Kythe under GNU Parallel causes some kind of throttling, where the CPUs get underclocked, and work is not assigned (using sudo cpupower frequency-set -g performance didn't help -- so it seems like the problem is that Kythe processes are getting penalized/deprioritized after a while, the browser also sees slowdown). The temperature drops to 60C or so.

CPU throttling with Kythe

In the above picture, the early part of the graph shows the GNU Parallel/Busy loop workload. I then terminate those processes and start the GNU Parallel/Kythe workload. For some reason, the Kythe workload runs into throttling issues which neither Clang nor the Busy loop workload run into. What could be causing this/how do I debug further?

Reproduction steps

  1. (Prep) Run the CMake command on the LLVM repo:

    git clone https://github.com/llvm/llvm-project.git --depth=1 cd llvm-project/llvm CC=/usr/bin/clang-14 CXX=/usr/bin/clang++-14 cmake -B ../build -DCMAKE_BUILD_TYPE=Release -G Ninja -DLLVM_ENABLE_PROJECTS=clang 

    This will prepare a ../build/compile_commands.json file.

  2. (Prep) Download the Kythe release (e.g. under $HOME/code) and run the extractor as described in the Kythe docs.

    wget https://github.com/kythe/kythe/releases/download/v0.0.60/kythe-v0.0.60.tar.gz -o $HOME/code tar xzf $HOME/code/kythe-v0.0.60.tar.gz cd ../build mkdir kythe-v0.0.60-output KYTHE_ROOT_DIRECTORY=$PWD KYTHE_OUTPUT_DIRECTORY=$PWD/kythe-v0.0.60-output/ KYTHE_CORPUS=my-llvm $HOME/code/kythe-v0.0.60/tools/runextractor compdb -extractor $HOME/code/kythe-v0.0.60/extractors/cxx_extractor 
  3. (Actual workload) Run Kythe in parallel:

    #!/usr/bin/env python3 # code/timing.py import sys import tempfile import time import subprocess import os from datetime import datetime input_file = sys.argv[1] _, output_file = tempfile.mkstemp(prefix="entries-") start = datetime.now() subprocess.run(["/home/varun/code/kythe-v0.0.60/indexers/cxx_indexer", "--ignore_unimplemented", input_file, "-o", output_file]) end = datetime.now() delta = end - start input_size = os.stat(input_file).st_size output_size = os.stat(output_file).st_size print("{} bytes to {} bytes in {} sec from {}".format(input_size, output_size, delta.seconds, input_file)) os.remove(output_file) 
    parallel ~/code/timing.py ::: kythe-v0.0.60-output/*.kzip | tee timings.txt 
1
  • Could you write tmp-files to a ram disk? Commented Nov 1, 2022 at 16:31

1 Answer 1

0

One probably irrelevant difference from running Kythe is that Kythe can generate indexes of around 300MB ~ 2.5GB per compilation unit, so I'm running Kythe under a small Python wrapper which creates a temporary file, lets Kythe write to it, and then deletes the file.

Turns out, this is very relevant. It is easy to check if this is causing the problem by creating a dummy script which mimics Kythe's high disk output and seeing if it causes similar throttling. Here is an example script:

#!/usr/bin/env python3 import sys import tempfile import time import os import random from datetime import timedelta from datetime import datetime output_fd, output_file = tempfile.mkstemp(prefix="entries-") size_100M = random.randint(5, 15) start = datetime.now() # Kythe can end up writing about 500MB - 1.5GB in a span of 5-15s. # We mimic that workload by writing 1M every 0.01s, and just wasting # some CPU if we're done writing early. dummyvec = list(range(1024)) for i in range(size_100M * 100): iter_start = datetime.now() iter_end = iter_start + timedelta(milliseconds=10) os.write(output_fd, os.urandom(1024 * 1024)) while datetime.now() < iter_end: # Waste CPU for i in range(len(dummyvec)): dummyvec[i] = (dummyvec[i] + 1) % 1024 end = datetime.now() delta = end - start output_size = os.stat(output_file).st_size print("Wrote {} bytes in {} sec".format(output_size, delta.seconds)) os.remove(output_file) 

This script can be run under parallel again:

parallel high_output.py ::: kythe-v0.0.60-output/*.kzip 

If you start seeing timings which exceed 15 seconds, you know for sure that there is throttling happening.

Turns out, I started seeing times of over 20-25 seconds after the parallel command ran for a little bit.


It is important to look at other temperatures too, not just the CPU temperature. For example, you can use the lm-sensors package on Ubuntu to see temperatures across all the different sensors.

# Re-run sensors from lm-sensors every 2 seconds watch -n 2 sensors 

Turns out, the problem was with temperatures of the NVMe drive.

nvme-pci-0400 Adapter: PCI adapter Composite: +55.9°C (low = -273.1°C, high = +81.8°C) (crit = +84.8°C) Sensor 1: +55.9°C (low = -273.1°C, high = +65261.8°C) Sensor 2: +87.8°C (low = -273.1°C, high = +65261.8°C) 

In a full-speed run, Kythe can end up writing 100MB/s of output per compilation unit. While this isn't so bad, having 32 processes means 3.2GB/s of output, which seemed to overwhelm the NVMe in my situation, leading to overheating. When the NVMe drive gets heated so much, Linux seems to throttle all running processes (explaining the browser slowdown).

For Kythe specifically, based on this Google groups thread, Kythe has a flag --experimental_dynamic_claim_cache which seems like it can potentially help reduce disk output, utilizing memcached under the hood.

1
  • iostat -dkx 1 is often good at seeing if disk IO is the bottleneck. Commented Nov 1, 2022 at 16:29

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.