6,075 questions
3 votes
1 answer
190 views
Why does sequential array access have a high cache miss rate?
I have the following C code that I am testing to understand perf and caching. It sequentially accesses an array of doubles. // test.c #include <stdio.h> #include <stdlib.h> #include <...
2 votes
1 answer
77 views
Find out why program is slow processing files from network share using Valgrind
I have an open source C/C++ program on Linux amd64 that processes a PDF input file and that I did not write by myself. So I'm not familiar with its code. Processing a PDF file read from local disk ...
1 vote
1 answer
78 views
Perf callgraph output doesn't look as I would expect for a test program with a delay loop that should take near 100% of the time
I'm experimenting with perf record --control to profile select sections of a program. Here's a Rust program that uses perf to profile the call to a function waste_time(): use libc; use log::info; use ...
1 vote
0 answers
160 views
Why does “Command Buffer Full” appear in PyTorch CUDA kernel launches?
I’m using the PyTorch profiler to analyze sglang, and I noticed that in the CUDA timeline, some kernels show “Command Buffer Full”. This causes the cudaLaunchKernel time to become very long, as shown ...
0 votes
0 answers
35 views
Scalene wsl no web UI
I'm trying to profile a Python FastAPI application (which uses LangGraph) using Scalene on Windows. Since Scalene's Windows version doesn't support multithreading, I'm running it in WSL instead. When ...
2 votes
0 answers
61 views
Recovering a perf.data file with size field 0 after perf report terminated improperly
I had a multi-process application to profile using perf with the following command: sudo perf record -a -g -F 99 -e cycles:u -- sleep 50000 & The sleep time is over 13 hours. The program should ...
1 vote
0 answers
85 views
Fatest way to convert float array to string in python
This question came up while I was saving a large number of model-inferred embeddings to plain text. To do so, I needed to convert lists of float embeddings into strings, and I found this conversion to ...
1 vote
0 answers
34 views
Tracking Per Channel Memory Traffic in AMD Zen 2 (Rome)
I am using perf to profile workloads on my system, and I need to track the memory traffic generated by my workload on each NUMA node. Currently, I only have perf results for LLC cache misses, which ...
1 vote
1 answer
44 views
How to create a html file with a link that automatically opens chrome://tracing with a particular json file?
I have a json file that contains profiling data that can be opened with chrome's trace-viewer. I can do it manually by opening chrome://tracing, then selecting 'load' and then loading the json file. ...
0 votes
0 answers
105 views
RISC-V vs C Code Comparison for Simple Multiply and Accumulate (MAC) Operation
We tried profiling a simple MAC operation using both RISC-V Vector (RVV) intrinsics and plain C code. Surprisingly, the C version performs better, even though the intrinsics code processes 16 ...
0 votes
0 answers
99 views
How to set filter by module in heaptrack_gui profiler that all gui application contains only my module?
I only started to use heaptrack and can not set filtering by modules. It possible to do from gui like this Heap track but output very nosy and this filter doesn't influence to other tabs. Does exist ...
1 vote
1 answer
62 views
CPU Sampling/Profiling of Helidon app in VisualVM
I have a Helidon app and would like to take CPU samples and/or start a CPU profiler. This does not work. With the same setup, it works for a simple (non Helidon) app Trying to start the CPU (and also ...
1 vote
1 answer
31 views
jax.numpy profiling: time spent in "ufunc_api.py:173(__call__)"
I am analyzing my numpy/python code by running it with "-m cProfile". Snakeviz shows as the entry with most time spent: 20895038 calls to ufunc_api.py:173(__call__) with the majority of the ...
1 vote
0 answers
148 views
What's the `perf stat` equivalent for MacOS?
On Linux, I often find myself perusing perf stat to figure out whether a code change improved things like cache miss rate. (I'm specifically interested in cache miss rates and page faults.) Now I'm ...
1 vote
1 answer
170 views
Why is EmojiCompat consuming significant retained memory in my Flutter Android app without explicit usage?
I'm developing a Flutter application that doesn't utilize emojis in any part of the UI or logic. However, upon profiling the app using Android Studio's Memory Profiler, I observed that androidx.emoji2....