Skip to main content
Commonmark migration
Source Link

All you can really do is profile—any general advice is going to be very, well, general. You should try to keep your working set small so that it fits in the first levels of cache, and avoid redundant memory accesses. If it’s expensive to compute an intermediate value, precompute it and store the result. If you know you will need data, prefetch it from RAM.

For cache control, you can use compiler intrinsics such as GCC’s __builtin_prefetch:

void __builtin_prefetch (const void *addr, ...) 
 

This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to __builtin_prefetch into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions will be generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed.

It allows you to specify whether you expect to read or write, and what degree of temporal locality you expect the accesses to have.

For virtual memory control, you can load data using mmap and tell the virtual memory manager how you expect to access the mapped pages using madvise, e.g.:

MADV_SEQUENTIAL 
 

Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)

 
MADV_WILLNEED 
 

Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)

You’ll need to benchmark and profile to determine where to actually use such strategies. Intel VTune can give you useful stats on how your application is making use of the cache and pipeline.

All you can really do is profile—any general advice is going to be very, well, general. You should try to keep your working set small so that it fits in the first levels of cache, and avoid redundant memory accesses. If it’s expensive to compute an intermediate value, precompute it and store the result. If you know you will need data, prefetch it from RAM.

For cache control, you can use compiler intrinsics such as GCC’s __builtin_prefetch:

void __builtin_prefetch (const void *addr, ...) 
 

This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to __builtin_prefetch into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions will be generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed.

It allows you to specify whether you expect to read or write, and what degree of temporal locality you expect the accesses to have.

For virtual memory control, you can load data using mmap and tell the virtual memory manager how you expect to access the mapped pages using madvise, e.g.:

MADV_SEQUENTIAL 
 

Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)

 
MADV_WILLNEED 
 

Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)

You’ll need to benchmark and profile to determine where to actually use such strategies. Intel VTune can give you useful stats on how your application is making use of the cache and pipeline.

All you can really do is profile—any general advice is going to be very, well, general. You should try to keep your working set small so that it fits in the first levels of cache, and avoid redundant memory accesses. If it’s expensive to compute an intermediate value, precompute it and store the result. If you know you will need data, prefetch it from RAM.

For cache control, you can use compiler intrinsics such as GCC’s __builtin_prefetch:

void __builtin_prefetch (const void *addr, ...) 

This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to __builtin_prefetch into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions will be generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed.

It allows you to specify whether you expect to read or write, and what degree of temporal locality you expect the accesses to have.

For virtual memory control, you can load data using mmap and tell the virtual memory manager how you expect to access the mapped pages using madvise, e.g.:

MADV_SEQUENTIAL 

Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)

MADV_WILLNEED 

Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)

You’ll need to benchmark and profile to determine where to actually use such strategies. Intel VTune can give you useful stats on how your application is making use of the cache and pipeline.

Source Link
Jon Purdy
  • 20.6k
  • 9
  • 66
  • 96

All you can really do is profile—any general advice is going to be very, well, general. You should try to keep your working set small so that it fits in the first levels of cache, and avoid redundant memory accesses. If it’s expensive to compute an intermediate value, precompute it and store the result. If you know you will need data, prefetch it from RAM.

For cache control, you can use compiler intrinsics such as GCC’s __builtin_prefetch:

void __builtin_prefetch (const void *addr, ...) 

This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to __builtin_prefetch into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions will be generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed.

It allows you to specify whether you expect to read or write, and what degree of temporal locality you expect the accesses to have.

For virtual memory control, you can load data using mmap and tell the virtual memory manager how you expect to access the mapped pages using madvise, e.g.:

MADV_SEQUENTIAL 

Expect page references in sequential order. (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)

MADV_WILLNEED 

Expect access in the near future. (Hence, it might be a good idea to read some pages ahead.)

You’ll need to benchmark and profile to determine where to actually use such strategies. Intel VTune can give you useful stats on how your application is making use of the cache and pipeline.