We have solved the mystery described in yesterday’s entry…
…mostly. I’ve found down the years that inside any big mystery are likely one or more smaller mysteries. And so it is. I would have figured this problem out a whole lot sooner if the symptoms had been consistent.
They weren’t. And those symptoms made me nuts for several days. Eventually I decided to yell for help.
I got a lot of very good help. If you haven’t read yesterday’s entry (and if you’re actually interested in assembly language programming) go read it now. I won’t repeat all the details here.
In short: I wrote a small demo program for my new book, x64 Assembly Language Step By Step. It didn’t work. Several of my readers took the code I posted in yesterday’s entry, built the executable, and…it worked.
That’s what made me nuts. I ran the damned thing on three different Linux instances, and the problem manifested on all three of them. But a couple of my friends ran the executable and had no trouble at all. It worked perfectly.
WTF?
That’s actually the small mystery inside the big mystery. The big mystery we figured out fairly quickly. Bruce, a new Contra commenter, built the executable and it failed. He changed one line in the program, and it worked. I tried his fix. It worked. Mystery solved.
But…why? Bruce cleared register RDI to null (i.e., 0) before calling the libc time function. I had cleared RAX, as part of an earlier test to try and pin down the symptoms. I intended to remove that line from the program. But it gave Bruce an idea: clear RDI instead. He did. It worked. I tried it, and…victory! Clearing RDI to 0 completely eliminated the problem, and I spent another hour trying various things to crash the executable. No luck. It was a consistent fix, in that once I cleared RDI to 0, nothing else would make the executable malfunction.
I think it started to dawn on several of us at once. Supposedly, the time function doesn’t take any parameters. Or so I supposed, based on my reading. But that was wrong. The Linux time function takes one (understated) parameter: The parameter can either be 0, or it can be an address. If it’s an address, time will put the current time_t timestamp value at that address. If it’s 0, time will return the time_t value in RAX.
In stepping through the demo program’s execution in a debugger, I noticed that after a call to the puts function, register RDI would contain a memory address. It wasn’t always the same, and it wasn’t generally useful, So un-useful, in fact, that the garbage addresses being left in RDI would cause either a hang or a segmentation fault. In the x64 calling convention, the first parameter is always passed to a function in RDI. I didn’t think of time as having any parameters at all, but clearing RDI to 0 before calling time guaranteed that time would place the time_t value safely in register RAX…instead of crashing.
So the big mystery was solved. I spent an hour and a half trying to get the program to crash. As long as RDI was 0 when time was called, it did not crash. Halleluia! The big mystery was solved.
The small mystery remained: Why did some of my readers built the executable and have it work perfectly, while the exact same program on my Linux machines went belly-up? That remains an open service ticket. I’m mildly curious, but as long as I know that RDI has to be either 0 (preferably) or the address of a suitable buffer to hold the time_t value, all will be well.
Let me wrap up by abundantly thanking everyone who took part in the bug hunt:
- My friend and SFF collaborator Jim Strickland
- Linux expert Bill Buhler
- New commenter Bruce
- Long-time reader Jason Bucata
- X64 programming expert Jonathan O’Neal
- Contra regular Keith
You guys were brilliant. I will cite you all on the Acknowlegements page in the book, when it comes up (with some luck) next summer.
Again, thanks. In a weird but satifying way, it was fun. Now I have to get back to work.
A small proof reading comment. In the first paragraph after “WTF?” you’ve written “mysetery” instead of “mystery”.
Regards
Good catch. Fixed. Thanks! As is my policy, I’ll leave your comment here to remind me that I need to proof a little better, heh.
Bruce wrote: “I then changed xor rax, rax to be xor edi,edi.”
But you then referred in your article to rdi. As I am not an x86 assembly guy, I am confused. Clearing edi? Or rdi?
> Clearing edi? Or rdi?
In x86-64, most 32-bit instructions inherently zero the upper 32 bits of their associated 64-bit register (to avoid a partial register stall). Thus, while
xor rdi,rdi
will zero the 64-bit rdi register,
xor edi,edi
will do the same thing, while saving one byte of code (the REX prefix that extends an x86 instruction to 64 bits).