Adventures in Caching Algorithms: Part 1, Getting Input

The subject of caching recently piqued my interest. I encountered caching in my undergrad OS class (so I suppose it’s more like it’s on paging algorithms), but otherwise haven’t encountered it much. It seems to me that caching is hugely important in applications of computer science (in my one hardware course, the professor shared some famous quote to the effect “caching is the one good idea computer science has had”…) but isn’t taught in traditional undergrad curriculums–at least, not as part of algorithms.

My impression so far is that they’ve been deemed theoretically uninteresting. Belady’s algorithm is a nice result (and certainly approachable for undergrad greedy algorithms, I think…), but otherwise it seems the dominating factor of a “good” algorithm is workload-dependent. But still, I want to learn more about them. I’ve been on a kick for learning more about empirical things, and like in the SAT solver I’ve also taken to problems (and solutions) that empirically show improvements, but elude a natural proof for their optimality.

Getting Input: PIN

So I knew I wanted to simulate a cache. For that I need input. Oh I can use an RNG of various distributions and everything, but I was curious if there was a way to “naturally” get more input. So I did some quick searching (that is, I asked a friendly AI) and learned about PIN. I think it’s really cool! It provides natural hooks to “record” what goes on in execution at the assembly level, so I was able to basically do exactly what they say in the example and created a tool to emitted all the instruction addresses (for the icache) and memory read/writes.

So with my tool written (more on that later) I can run it like:

 ./../pin/pin -t obj-intel64/trace_tool.so -- /bin/ls

Working inside-out, that will run the “usual” ls command, (like, just list the directory contents), but the binary will be intercepted and parsed by the pin tool. That pin tool will use the code in my utility (which I’ve named trace_tool) which emits certain bytes to a distinguished file. That output basically looks like this:

i:0x7fa5505b8c21
r:0x7fa550744b60
i:0x7fa5505b8c28
i:0x7fa5505b8c2a
i:0x7fa5505b8c00
w:0x7fa550744b60
i:0x7fa5505b8c0b

I basically have 3 functions in my PIN tool. All three of them just print the provided address, but one prefixes it with “i”, another “r”, and another “w”. So I ask pin to put the first behind every instruction, the second behind every read-op, and the third behind every write-op. That gives me the output I want! I haven’t validated things in any more rigorous way, but a quick scan makes this “look good” to me.

Of course, the way these hobby projects work, I can’t just say I got what I want and dive into caching algorithms. I have to do this part even better. More on (obvious…) performance improvements to come!