Linked Questions

Question 1

I'm new to learning about low level constructs and have a simple question about how they work. My understanding is that if I have a piece of code int* arr = new int[100000000]; this will be a section ...

Question 2

I am new to multi threaded programing, and I knew coming into it that there are some weird side affects if you are not careful, but I didn't expect to be THIS puzzled about code I wrote. I am writing ...

Question 3

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } ...

Question 4

What is the difference between "cache unfriendly code" and "cache friendly" code? How can I make sure I write cache-efficient code?

Question 5

I understand that the processor brings data into the cache via cache lines, which - for instance, on my Atom processor - brings in about 64 bytes at a time, whatever the size of the actual data being ...

Question 6

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy. ERMSB was introduced with the Ivy Bridge microarchitecture. See the section "Enhanced REP MOVSB and ...

Question 7

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors ?

Question 8

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing addps xmm1, xmmword ptr [rsi+rax*1] does not ...

Question 9

I am having problem in understanding locality of reference. Can anyone please help me out in understanding what it means and what is, Spatial Locality of reference Temporal Locality of reference

Question 10

I keep seeing people claim that the MOV instruction can be free in x86, because of register renaming. For the life of me, I can't verify this in a single test case. Every test case I try debunks ...

Question 11

When trying to understand assembly (with compiler optimization on), I see this behavior: A very basic loop like this outside_loop; while (condition) { statements; } Is often compiled into (...

Question 12

I am starting to study algorithms and data structures seriously, and interested in learning how to compare the performance of the different ways I can implement A&DTs. For simple tests, I can get ...

Question 13

I'm struggling to understand what happens when the first two levels of the Translation Lookaside Buffer result in misses? I am unsure whether "page walking" occurs in special hardware circuitry, or ...

Question 14

I have learned about different cache mapping techniques like direct mapping and fully associative or set associative mapping, and the trade-offs between those. (Wikipedia) But I am curious which one ...

Question 15

I am having alignment issue while using ymm registers, with some snippets of code that seems fine to me. Here is a minimal working example: #include <iostream> #include <immintrin.h> ...

Collectives™ on Stack Overflow

Linked Questions

N00b question about main memory and cpu register memory [duplicate]

I have no idea why changing variable access/storage type in pthread subroutine sharply increases perfromance [duplicate]

Why are elementwise additions much faster in separate loops than in a combined loop?

What does it mean for code to be "cache-friendly"?

How do cache lines work?

Enhanced REP MOVSB for memcpy

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors?

Micro fusion and addressing modes

What is locality of reference?

Can x86's MOV really be "free"? Why can't I reproduce this at all?

Why are loops always compiled into "do...while" style (tail jump)?

How can I benchmark the performance of C++ code? [closed]

What happens after a L2 TLB miss?

Which cache mapping technique is used in intel core i7 processor?

How to solve the 32-byte-alignment issue for AVX load/store operations?

Hot Network Questions