According to Bryant and O'Hallaron's (somewhat abstracted) model of the page table on Linux-like systems, each page table entry (PTE) has an address field that holds one of three kinds of value: (1) the physical address (actually, the page number) to which the virtual page is mapped; (2) an equivalent location identifier for a long-term storage device -- where the page can be found; or (3) 0, for an unallocated page. Suppose a page has been swapped in from long-term storage to main memory. In this account, the kernel replaces the long-term storage address (2) with the physical address (1). Now suppose that same page in physical memory needs to be evicted to make room for some other page to be swapped in. How does the original long-term storage address get restored to the PTE, if it has been overwritten with the physical address? Thanks.
- I am referring to the most simplified picture of the page table. Real page tables are, of course, broken up into pages and arranged as pruned trees, etc., but the question remains of how retrieval information for unchached pages gets preserved when a page gets mapped to physical memory.Amittai Aviram– Amittai Aviram2024-04-20 15:46:30 +00:00Commented Apr 20, 2024 at 15:46
2 Answers
It's not entirely clear to me from your question whether you are asking about mmap()ped files, swap, or passively mapped files in the page cache, so I'll answer for all.
In the non-mmap case, eviction is eviction: once the page is made clean, it can simply be dropped. When a page not backed by a file is evicted from main memory, there is typically no need to restore any address because they will simply page fault into a new cache entry the next time the file is accessed. A similar thing happens in the swap case: if we fault in a page from swap and later have to evict the same page, we will likely just give a new swap address next time and set that in the PTE (or use the swap cache if the page is present there and is clean).*
In the mmap case, the lifecycle is controlled by the mmap() system call. The range is explicitly mapped to a contiguous portion of virtual memory on mmap(), and the metadata for this mapping (like the backing FD which we have a reference count on, the offset, the size, etc) is stored in the relevant virtual memory area (VMA). Even when pages are evicted, the VMA retains the mapping information, allowing the kernel to know where to fault in from when next accessed.
* In reality neither swapping nor general paging activity typically happens at the page level, but instead often happens at some lower granularity, like a swap cluster or readahead batch.
- I thought there was still a "swap cache". I.e. if the page in the swap device is still up to date, then it is possible to simply free the in-memory page, without allocating and writing a new page in the swap device. At least,
/proc/vmstatis still showing non-zeronr_swapcachedon my system. I think the OP wants to know what happens in this situation (and I have tried to answer accordingly).sourcejedi– sourcejedi2024-04-23 16:07:28 +00:00Commented Apr 23, 2024 at 16:07 - @sourcejedi Thanks, rereading maybe that's what they were referring to indeed. I'll add a short bit and try to do it without duplicating your answer too much.Chris Down– Chris Down2024-04-23 18:00:34 +00:00Commented Apr 23, 2024 at 18:00
- @ChrisDown Thank you! I did mean cases involving mmap where there is a backing file descriptor, which you cover that in your second paragraph. I appreciate your having covered the other case as well.Amittai Aviram– Amittai Aviram2024-06-02 21:08:13 +00:00Commented Jun 2, 2024 at 21:08
PTE's represent virtual pages. As you say, when a virtual page is present in main memory, the PTE's address field will hold the physical Page Frame Number (PFN).
Each physical page has a corresponding struct page. This has:
Following
flagsis:struct address_space *mapping;For pages that are in the page cache (a large portion of the pages on most systems),
mappingpoints to the information needed to access the file that backs up the page. If, however, the page is an anonymous page (user-space memory backed by swap), thenmappingwill point to ananon_vmastructure [...]-- Cramming more into struct page, LWN.net
Then, the page→index field is used to store the swp_entry_t structure for anonymous pages.
swp_entry_t holds the index of a swap device, and a location within that swap device.
For pages in the page cache, this holds a file offset...
-- Rephrased from Understanding the Linux Virtual Memory Manager, Mel Gorman, 2004.
...and for pages in the page cache, I think some versions of Linux did rely on the page→index field , using it to implement non-linear mappings.
However, as non-linear mappings are no longer supported, page→index appears to be redundant in this case? So you should read the other answer about VMA's.