Expose MLX memory management APIs#98
Merged
polvalente merged 4 commits intoelixir-nx:mainfrom Feb 22, 2026
Merged
Conversation
Add bindings for MLX's memory management functions: - EMLX.memory_info/0 - returns active, peak, and cache memory usage - EMLX.clear_cache/0 - releases unused GPU memory back to the system - EMLX.reset_peak_memory/0 - resets the peak memory counter - EMLX.set_memory_limit/1 - sets the memory limit guideline - EMLX.set_cache_limit/1 - sets the cache size limit Without clear_cache, repeated model inference causes GPU memory to grow unbounded as MLX caches freed buffers. On a 24 GB Apple M5 running a 823M parameter model, memory usage grew from 3 GB to 18 GB after just 4 batches, causing severe system-wide slowdowns and GPU swapping. Calling clear_cache + :erlang.garbage_collect() between batches keeps memory stable and inference throughput consistent.
polvalente reviewed Feb 22, 2026
| } | ||
| | ||
| NIF(clear_cache) { | ||
| mlx::core::clear_cache(); |
polvalente reviewed Feb 22, 2026
test/emlx/memory_test.exs Outdated
| t = Nx.iota({1024, 1024}, type: :f32, backend: EMLX.Backend) | ||
| EMLX.eval(EMLX.Backend.from_nx(t)) | ||
| after_alloc = EMLX.memory_info().active_memory | ||
| assert after_alloc > before |
Collaborator
There was a problem hiding this comment.
This assertion could be more strict since we know the tensor size
| EMLX.eval(EMLX.Backend.from_nx(t)) | ||
| Nx.backend_deallocate(t) | ||
| EMLX.clear_cache() | ||
| info = EMLX.memory_info() |
Collaborator
There was a problem hiding this comment.
Do we have assertions on what info returns?
| {EMLX.Backend, device: device} | ||
| end | ||
| | ||
| @doc """ |
Collaborator
There was a problem hiding this comment.
These docs could use some examples,.even if they aren't doctests per se
polvalente reviewed Feb 22, 2026
polvalente reviewed Feb 22, 2026
polvalente approved these changes Feb 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds bindings for MLX's memory management functions:
EMLX.memory_info/0— returns%{active_memory, peak_memory, cache_memory}in bytesEMLX.clear_cache/0— releases unused GPU memory back to the systemEMLX.reset_peak_memory/0— resets the peak memory counterEMLX.set_memory_limit/1— sets the memory limit guidelineEMLX.set_cache_limit/1— sets the cache size limitWhy this is needed
Without
clear_cache, repeated model inference causes GPU memory to grow unbounded as MLX caches freed buffers. On a 24 GB Apple M5 running ai-forever/FRIDA (823M parameter T5 encoder), memory grows from 3 GB to 18 GB after just 4 inference batches, causing severe system-wide slowdowns as the GPU starts swapping:With
EMLX.clear_cache()+:erlang.garbage_collect()between batches:All 2117 tests pass (2115 existing + 6 new memory tests).