![]() Loads can affect performance system wide. > What adding a lookup table can do is to add the load-latency to the dependency chain involving the calculation, which seems to be what you are talking about here. If it's not vectorizable, LUT result is often used for indirect jump/call (like large switch statement) or memory access (say, a histogram etc.). > To be fair, replacing a series of ALU ops with a lookup table doesn't usually add a "data dependency" If it's vectorization, then the whole equation changes! The comparison with 384 FOPs seems a bit off: I guess you are talking about about some 32-FOP per cycle SIMD implementation (AVX512?) - but the assumption of data dependencies kind of rules that out: one would assume it's scalar code here. If the method is really hot, e.g., in a tight(ish) loop, then you are mostly going to be getting L1 hits. On the other hand, in that case performance isn't that critical by definition. If the involved method isn't that hot then L1 misses (like your example) or worse are definitely a possibility. How much that actually matters depends on whether the code is latency-bound and the involved lookup is on the critical path: in many cases where there is enough ILP it won't be (an general rule is that in most code most instructions are not on a critical dependency-chain). For an L1 hit that's usually 4 or 5 cycles, and for L2 hits and beyond it's worse, as you point out. ![]() ![]() What adding a lookup table can do is to add the load-latency to the dependency chain involving the calculation, which seems to be what you are talking about here. To be fair, replacing a series of ALU ops with a lookup table doesn't usually add a "data dependency" - the data dependency probably already existed, but perhaps flowed through registers rather than memory.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |