Finding a VS Code Memory Leak

In 2021 I found a huge memory leak in VS code, totalling around 64 GB when I first saw it, but with no actual limit on how high it could go. I found this leak despite two obstacles that should have made the discovery impossible:

  1. The memory leak didn’t show up in Task Manager – there was no process whose memory consumption was increasing.
  2. I had never used VS Code. In fact, I have still never used it.

So how did this work? How did I find an invisible memory leak in a tool that I have never used?

This was during lockdown and my whole team was working from home. In order to maintain connection between teammates and in order to continue transferring knowledge from senior developers to junior developers we were doing regular pair-programming sessions. I was watching a coworker use VS Code for… I don’t remember what… and I noticed something strange.

So many of my blog posts start this way. “This doesn’t look right”, or “huh – that’s weird”, or some variation on that theme. In this case I noticed that the process IDs on her system had seven digits.

That was it. And as soon as I saw that I knew that there was a process-handle leak on her system and I was pretty sure that I would find it. Honestly, the rest of this story is pretty boring because it was so easy.

You see, Windows process IDs are just numbers. For obscure technical reasons they are always multiples of four. When a process goes away its ID is eligible for reuse immediately. Even if there is a delay before the process ID (PID) is reused there is no reason for the highest PID to be much more than four times the maximum number of processes that were running at one time. If we assume a system with 2,000 processes running (according to pslist my system currently has 261) then PIDs should be four decimal digits. Five decimal digits would be peculiar. But seven decimal digits? That implies at least a quarter-million processes. The PIDs I was seeing on her system were mostly around four million, which implies a million processes. Nope. I do not believe that there were that many processes.

It turns out that “when a process goes away its ID is eligible for reuse” is not quite right. If somebody still has a handle to that process then its PID will be retained by the OS. Forever. So it was quite obvious what was happening. Somebody was getting a handle to processes and then wasn’t closing them. It was a handle leak.

The first time I dealt with a process handle leak it was a complicated investigation as I learned the necessary techniques. That time I only realized that it was a handle leak through pure luck. Since then I’ve shipped tools to find process-handle and thread handle leaks, and have documented the techniques to investigate handle leaks of all kinds. Therefore this time I just followed my own recipe. Task Manager showed me which process was leaking handles:

And an ETW trace gave me a call stack for the leaking code within the hour (this image stolen from the github issue):

The bug was pretty straightforward. A call to OpenProcess was made, and there was no corresponding call to CloseProcess. And because of this a boundless amount of memory – roughly 64 KiB for each missing CloseProcess call – was leaked. A tiny mistake, with consequences that could easily consume all of the memory on a high-end machine.

This is the buggy code (yay open source!):

void GetProcessMemoryUsage(ProcessInfo process_info[1024], uint32_t* process_count) {   DWORD pid = process_info[*process_count].pid;   HANDLE hProcess;   PROCESS_MEMORY_COUNTERS pmc;   hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, pid);   if (hProcess == NULL) {     return;   }   if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc))) {     process_info[*process_count].memory = (DWORD)pmc.WorkingSetSize;   } }

And this is the code with the fix – the bold-faced line was added to fix the leak:

void GetProcessMemoryUsage(ProcessInfo& process_info) {   DWORD pid = process_info.pid;   HANDLE hProcess;   PROCESS_MEMORY_COUNTERS pmc;   hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, false, pid);   if (hProcess == NULL) {     return;   }   if (GetProcessMemoryInfo(hProcess, &pmc, sizeof(pmc))) {     process_info.memory = (DWORD)pmc.WorkingSetSize;   }   CloseHandle(hProcess); }

That’s it. One missing line of code is all that it takes to waste tens of GB of memory.

The bug was found back when I still used Twitter so I reported my findings there (broken link, cached copy found in the wayback machine) and somebody else then filed a github issue based on my report. I stopped using twitter a couple of years later and then my account got banned (due to not being used?) and then deleted, so now that bug report along with everything else I ever posted is gone. That’s pretty sad actually. Yet another reason for me to dislike the owner of Twitter.

The bug was fixed within a few days of the report. Maybe The Great Software Quality Collapse hadn’t quite started then. Or maybe I got lucky.

Anyway, if you don’t want me posting embarrassing stories about your software on my blog or on bsky then be sure to leave the Handles column open in Task Manager and pay attention if you ever see it getting too high in a process that you are responsible for.

Sometimes I think it would be nice to have limits on resources in order to more automatically find mistakes like this. If processes were automatically crashed (with crash dumps) whenever memory or handles exceeded some limit then bugs like this would be found during testing. The limits could be set higher for software that needs it, but 10,000 handles and 4 GiB RAM would be more than enough for most software when operating correctly. The tradeoff would be more crashes in the short term but fewer leaks in the long term. I doubt it will ever happen, but if this mode existed as a per-machine opt-in then I would enable it.

Unknown's avatar

About brucedawson

I'm a retired programmer, ex-Google/Valve/Microsoft/etc., who was focused on optimization and reliability. Nothing's more fun than making code run 10x as fast. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And sled hockey. And juggle. And worry about whether this blog should have been called randomutf-8. 2010s in review tells more: https://twitter.com/BruceDawson0xB/status/1212101533015298048
This entry was posted in Bugs, Code Reliability, Debugging, Investigative Reporting, memory, Programming, Rants and tagged , , , , . Bookmark the permalink.

17 Responses to Finding a VS Code Memory Leak

  1. joyful255a8a71aa's avatarjoyful255a8a71aa says:

    This is an identical problem I ran into: A process had opened a handle to a subprocess to collect some information and had one code path that failed to close the handle. (Which is why we should all start using raii objects in C++). This went out in a commercial product!Instead of developing a tool like you :-), I used sysinternals process explorer, to find the dangling handles. While the tool does pinpoint where the handle leaks from, knowing the code, it was pretty straight forward to hone in on it.

    • joyful255a8a71aa's avatarjoyful255a8a71aa says:

      Oh, and a one-line fix solved the problem too 😉

    • Jon's avatarJon says:

      +1 to RAII, that naked OpenProcess() call is like a naked “new”. I’d hope that static analysis could’ve picked this up.

      Now we have AI, I asked ChatGPT to review the code:

      Process handle not closed

      You call OpenProcess, but never call CloseHandle(hProcess).
      That leaks a handle every time the function is called.

      (It didn’t like the *process_count dereference without bounds checking either)

      • The indexing into the process_info array was indeed weird, as was the declaration of the parameter as having 1024 entries (meaningless!) since it suggests a misunderstanding of the contract.

        And yeah, a naked OpenProcess does feel just as bad as a naked “new” which is a dirty feeling.

        The new code is much better, although some RAII would make it even better.

  2. Andrija's avatarAndrija says:

    Good to see another blog post! FWIW, one of the Web archive versions has the Twitter thread, even some pictures are there: https://web.archive.org/web/20220506224017/https://twitter.com/BruceDawson0xB/status/1447668569626476548

  3. crafty348bba429f's avatarcrafty348bba429f says:

    Thanks Bruce, I love your posts! And you made me curious. I rebooted my machine and went straight into proc exp. Highest PID is 2242 and 260 processes are running.

    So I suppose something is leaking handles, a lot. And I even haven’t started VS Code yet. 😜

    • Maybe there was briefly about 550 processes, or maybe there are slight delays in reusing PIDs. You could run the FindZombieHandles tool to find out for sure:

      https://github.com/randomascii/blogstuff/tree/main/FindZombieHandles

      My system shows 261 total zombie processes, with half of those held by HPPrintScanDoctorService.exe. The leaking of handles is slow enough to not be tragic – it’s only wasting about 16 MB, which is “cheap enough” now.

      • crafty348bba429f's avatarcrafty348bba429f says:

        Sorry Bruce, there was a typo and because I could not post the entire proc exp image here, I just mistyped the number. The highest PID was 22424, which indicated about 5600 processes, right after the boot.

        Will run FindZombieHandles right now.

        • crafty348bba429f's avatarcrafty348bba429f says:

          I did run it. Nothing explains this case:

          23 total zombie processes.
          14 total zombie threads.
          11 zombies held by HPPrintScanDoctorService.exe(5080)
          11 zombies of HPSUPD-Win32Exe.exe – process handle count: 11 – thread handle count: 11
          2 zombies held by WMIRegistrationService.exe(5636)
          2 zombies of mofcomp.exe – process handle count: 2 – thread handle count: 0
          1 zombie held by com.docker.backend.exe(21500)
          1 zombie of wsl.exe – process handle count: 1 – thread handle count: 0
          1 zombie held by devenv.exe(8620)
          1 zombie of PerfWatson2.exe – process handle count: 1 – thread handle count: 1
          1 zombie held by vmcompute.exe(4592)
          1 zombie of vmwp.exe – process handle count: 1 – thread handle count: 0
          1 zombie held by NVDisplay.Container.exe(2728)
          1 zombie of dbInstaller.exe – process handle count: 1 – thread handle count: 1
          1 zombie held by svchost.exe(2580)
          1 zombie of userinit.exe – process handle count: 1 – thread handle count: 0

          @Bruce do you have a consolidated approach to identify the abundance of PIDs during boot?

          Is it running Windows Performance Recorder through a boot cycle and then use some Randomascii view in WPA? I clicked through the links in your post but was not sure what the latest “how I’ve done it and it worked” actually was.

          • It may be that there is no problem, which is why you can’t find it. a PID of 22424 implies that there _were_ about 5,600 processes or zombies at some point. Which is odd. But maybe they’ve all gone away – maybe the program that was holding on to all the process handles has disappeared. You could find out by writing code that creates a few hundred processes at the same time to see what sort of PID range they get.

            • crafty348bba429f's avatarcrafty348bba429f says:

              By chance I was looking at the handle count of the system process and it is leaking handles. after about an hour I am already at 6715 handles and it keeps increasing. It does drop sometimes but I will look after it and see if is slowly rising.

  4. I have an idea, job objects. Create a Windows job object, set some limits on it, then create a new process within that job object.

    I’m using this with Visual Studio (not VSCode). Because I’m doing lot of C++ constexpr programming, sometimes the C++ compiler starts eating all my RAM (and swap) and as side effect, other apps on my computer might suffer or crash. Yeah, I’m poor and I have little RAM. Also, after exiting VS, some processes might be still lurking around, blocking folder rename and such.

    Not with job objects! When I make a mistake and the compiler starts to eat all my RAM, it hits an artificial wall and dies. Rest of my computer surviving just fine. When I’m done for a while, I can exit the IDE, I can then instruct the job object to kill the remaining processes that survived longer than they should. Typically this is some PDB server writer and telemetry apps.

    I’m using github / lowleveldesign / process-governor app for this.

    Marek Knápek.

  5. Cory's avatarCory says:

    I followed your advice and added the Handles column to Task Manager and the biggest offender on my machine right now is OUTLOOK.EXE at 14,537 handles!

    I wish there was an online course that taught this stuff and WinDbg without having extensive knowledge of assembly and Windows internals as a prerequisite.

  6. ulfi's avatarulfi says:

    Another +1 to RAII here. Not only does it help release resources, it also communicate intent.

    For example here it sort of looks like *buffer is never released: https://github.com/microsoft/vscode-windows-process-tree/blob/bc0ee891ca3df19dad46b023e3bb1266dfd1a205/src/process_commandline.cc#L51C11-L70C12

    Is that true? I don’t know and I don’t really have time to investigate. It was just the first line of the first file I looked at after clicking your link. I’ve spent ore time on this comment. With RAII it would be obvious and no more thought needed.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.