Linked Questions

0 votes
0 answers
127 views

I'm new to learning about low level constructs and have a simple question about how they work. My understanding is that if I have a piece of code int* arr = new int[100000000]; this will be a section ...
Brocolli Rob's user avatar
0 votes
0 answers
113 views

I am new to multi threaded programing, and I knew coming into it that there are some weird side affects if you are not careful, but I didn't expect to be THIS puzzled about code I wrote. I am writing ...
SpaceFace102's user avatar
2451 votes
11 answers
262k views

Suppose a1, b1, c1, and d1 point to heap memory, and my numerical code has the following core loop. const int n = 100000; for (int j = 0; j < n; j++) { a1[j] += b1[j]; c1[j] += d1[j]; } ...
Johannes Gerer's user avatar
892 votes
10 answers
225k views

What is the difference between "cache unfriendly code" and "cache friendly" code? How can I make sure I write cache-efficient code?
Noah Roth's user avatar
  • 9,280
238 votes
5 answers
147k views

I understand that the processor brings data into the cache via cache lines, which - for instance, on my Atom processor - brings in about 64 bytes at a time, whatever the size of the actual data being ...
Norswap's user avatar
  • 12.3k
113 votes
6 answers
39k views

I would like to use enhanced REP MOVSB (ERMSB) to get a high bandwidth for a custom memcpy. ERMSB was introduced with the Ivy Bridge microarchitecture. See the section "Enhanced REP MOVSB and ...
Z boson's user avatar
  • 34k
47 votes
7 answers
40k views

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors ?
Karthik Balaguru's user avatar
64 votes
4 answers
10k views

I have found something unexpected (to me) using the Intel® Architecture Code Analyzer (IACA). The following instruction using [base+index] addressing addps xmm1, xmmword ptr [rsi+rax*1] does not ...
Z boson's user avatar
  • 34k
35 votes
4 answers
24k views

I am having problem in understanding locality of reference. Can anyone please help me out in understanding what it means and what is, Spatial Locality of reference Temporal Locality of reference
user avatar
50 votes
2 answers
9k views

I keep seeing people claim that the MOV instruction can be free in x86, because of register renaming. For the life of me, I can't verify this in a single test case. Every test case I try debunks ...
user541686's user avatar
  • 213k
44 votes
1 answer
8k views

When trying to understand assembly (with compiler optimization on), I see this behavior: A very basic loop like this outside_loop; while (condition) { statements; } Is often compiled into (...
iBug's user avatar
  • 37.6k
33 votes
3 answers
48k views

I am starting to study algorithms and data structures seriously, and interested in learning how to compare the performance of the different ways I can implement A&DTs. For simple tests, I can get ...
user avatar
37 votes
1 answer
10k views

I'm struggling to understand what happens when the first two levels of the Translation Lookaside Buffer result in misses? I am unsure whether "page walking" occurs in special hardware circuitry, or ...
user997112's user avatar
  • 31.1k
21 votes
1 answer
10k views

I have learned about different cache mapping techniques like direct mapping and fully associative or set associative mapping, and the trade-offs between those. (Wikipedia) But I am curious which one ...
Subhadip's user avatar
  • 461
22 votes
3 answers
16k views

I am having alignment issue while using ymm registers, with some snippets of code that seems fine to me. Here is a minimal working example: #include <iostream> #include <immintrin.h> ...
romeric's user avatar
  • 2,385

15 30 50 per page
1
2 3 4 5
8