James Carlson's Weblog

Thursday, June 27, 2019

Funny C++ Initialization

If you have a class (call it "myclass") that has Plain Old Data (POD) members and does not have a defined constructor (i.e., uses the compiler-supplied default constructor), then "new myclass;" and "new myclass();" will do different things. The parenthesis-free version will leave the POD uninitialized, but the with-parenthesis version will initialize the POD elements to zero.

If you define a constructor in the class, then the difference disappears. Both forms of "new" result in leaving the POD uninitialized (as you may have originally expected).

Here's a test case that demonstrates the difference:

funny-init.cpp

Compiling (with g++) and running this little program produces:

Default constructor, no parenthesis: 42
Default constructor, parenthesis: 0
Explicit constructor, no parenthesis: 42
Explicit constructor, parenthesis: 42

As an additional bit of hilarity is that the default-constructor variant compiles to an actual object constructor that does nothing, but when you invoke "new" with parenthesis, the site of the "new" invocation is littered with extra instructions just to write zeros over the POD elements in the class.

One surprising place this difference shows up is with "struct" and placement new. If you use placement new with a "struct" and you add the parenthesis, then the underlying storage is wiped clean. Here's a test case for that:

placement-wipeout.cpp

And the corresponding output:

Default constructor, no parenthesis: 42
Default constructor, parenthesis: 0

It's hard to see how this is a helpful state of affairs, but forewarned is forearmed.

Tuesday, November 6, 2018

RTLD_DEEPBIND has deep surprises

Linux RTLD_DEEPBIND has at least one very deep problem: if you load a dynamic object with it, the symbols called from within that one object will ignore LD_PRELOAD and go directly to the dependency, but symbols within the dependency itself are still resolved via LD_PRELOAD.

A real-world example of this failure is with a dynamically loaded object that invokes malloc(3), strdup(3), and free(3). Suppose we have an application using an LD_PRELOAD that interposes on malloc and free. The call to malloc, strdup, and free from within the loaded object will go straight to libc, bypassing the preload as expected. But the implementation of strdup inside libc invokes malloc on its own:

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strdup.c;hb=HEAD

That invocation of malloc will go to the LD_PRELOAD library, not libc's local definition. As a result, the pointer that the dynamic object gets back is from the LD_PRELOAD library's implementation of malloc. If the dynamic object tries to free that pointer, it will go straight to the libc definition of free(). Unless the preload "just" an innocuous wrapper on the libc functions, and doesn't replace them outright, this will fail in spectacular ways.

Here's a demo of the sort of hilarity this causes:

https://www.workingcode.com/deep-disaster.tar

This program produces the following output with "make test":

LD_PRELOAD=./preload.so ./main
Doing normal test: in preload: Inside the normal library
Doing bound test: in libbound: Inside the normal library
Doing inside test: from inside: in preload: Inside the normal library
Doing bound inside test: in libbound: from inside: in preload: Inside the normal library

The first two test results are as expected. The main program goes through the preload to get to the common library, and the deeply-bound library does not. The third result is also fine, and represents the main program invoking a function inside the common library that invokes another library function, which redirects through the preload. The fourth result is the problem. The deeply-bound library invokes a function in the common library that in turn invokes another library function. In this case, it (somewhat surprisingly) goes through the preload, even though the user probably expected that RTLD_DEEPBIND would avoid the use of the preload.

At best, it does so "sometimes."

Note that this means that almost any non-trivial use of RTLD_DEEPBIND is incompatible with (at least) the usual LD_PRELOAD=libtcmalloc.so type of wrapper. Anything you load with RTLD_DEEPBIND is hopelessly compromised if it invokes libc functions that internally use malloc or free. Or it means that any such wrapper must carefully wrap all exposed libc interfaces (such as strdup and fopen) that can allocate memory, and supply its own implementation -- a feat that may be impossible.

Friday, January 12, 2018

GNU make reports "process_begin: CreateProcess(NULL, pwd, ...) failed." on Cygwin

I very rarely build anything on Windows. My day job doesn't call for it much, and I certainly wouldn't want to do that for "fun." So, when I do, I often run into problems that (a) I don't understand and (b) nobody else seems to have seen before. This is one of those sorts of problems.

My build on a new machine failed almost immediately after typing "make" with this error:

process_begin: CreateProcess(NULL, pwd, ...) failed.

I have no idea what that means. The "pwd" command, of course, works just fine for me at the command line, and "whence" tells me it comes from /usr/bin. Everything looks fine there.

After a lot of debugging and comparisons with others who had working configurations, I found this: near the front of my PATH, I had an entry like this:

/cygdrive/c/this/does/../not/exist

That's not the actual path, of course, but it gives the idea. The bug I encountered is this: if you have a path that includes "/.." and if the previous directory in that path doesn't physically exist on that machine, then the path search function stops right there. It doesn't look at the rest of the path entries at all. So, if you have something like this before "/usr/bin", you're sunk. That works fine on all UNIX and Unix-like systems, and it even works fine for the Cygwin shells. But, for some reason, it doesn't work within GNU make's code that deals with path searches.

Changing that path so that it didn't have "/.." in it fixed the problem.

Sunday, July 2, 2017

Not Hertz

This happened many years ago, but I still think about it at times, and a recent exchange on Twitter made me remember that I really should have written it up a long time ago. Better late than never, I suppose.

In 2009, my father was being treated for cancer, and his health was up and down. I had been planning for some time to take a day of vacation from work on Friday, November 6th, 2009, and stay the weekend. I had a flight booked on Jet Blue and a car reserved with Hertz.

On Monday, November 2nd, I got word that my father had taken a turn for the worse. The radiation treatment had caused swelling in his throat, and he was rushed to the hospital to insert a trach tube. After talking to my brother in Pittsburgh, I decided to change my plans. I would fly out as soon as possible so I could be with my brother, father, and step-mother.

I called Jet Blue first. No problem. It was a $50 fee to change the flight, but I could have the first plane out in the morning on Tuesday. That was pretty easy, and I was encouraged.

Then I called Hertz reservations through the 800 number. I'd booked with them because I had a good bit of experience with them. They were one of the preferred providers when I was at Sun, and I traveled a lot for work when I was there. The customer service folks couldn't change my previous reservation, but they did offer an alternative: they could book a second car from the 3rd to the 6th, so I had a car the whole time, and they told me that the reservations counter in Pittsburgh would be able to help when I got there. Nothing they could do by phone; it had to be done in person. That was less than ideal, but what do I know about their reservations system?

The next morning, I flew to Pittsburgh, and got to the reservations counter. They told me, no, they couldn't change the two contiguous reservations into one. They had no idea why the 800 people told me that. They told me I should call the 800 number again and ask for help.

So, I stepped out of line and into the waiting area, and called the 800 number again from my cell phone. No dice. They couldn't do anything for me and told me to talk to the people at the desk again. So, I got back in line and waited again at the desk. When I got up there, I asked if there was a manager I could talk to. The answer was just "no." They helpfully said that I should call the desk on Friday and ask to have it fixed before having to drive in. They assured me there was no way I'd have to return just to swap cars; that would be silly.

I took the keys and went off to visit my father. We had to make some really tough (and possibly wrong) decisions over the next few days. It was a difficult time, but I'm very glad I made the trip.

On Friday, I called the Hertz desk in Pittsburgh. I was told that, no, they could not help me. I could not extend the current reservation. I could not combine reservations. The only thing I could do would be to return the car as agreed on Friday and then take another one out. So, that night, I drove the 20 miles / 30 minutes to the airport, dropped off one ugly champagne colored Elantra, picked up a nearly identical champagne colored Elantra -- the only difference was that the XM radio worked in one and not in the other -- then drove back to my father's place.

That Sunday, November 8th, I dropped off the second car and returned to Boston. That's the last time I've ever done business with Hertz, and the last time I ever will.

This sort of thing, to me, reeks of a corporate culture problem. None of the customer service representatives had the remit to fix things. In a company that is serious about customer service, the employees are given the power to "make things right" -- even if this means breaking company policies. The service I got indicates the reverse. Nobody I dealt with had the power to fix anything.

The problem is not necessarily being treated like dirt by every single customer service representative I dealt with. The problem is having no prospect of things ever getting better, because it wasn't just one or two people having a bad day or not knowing how to make changes. In fact, the representatives were pleasant to deal with, but completely unhelpful. A systemic problem like that is something I can't put up with. So when I need ground transport, it's anyone but them.

Thursday, November 24, 2016

"A start job is running for dev-disk-by" and other horrors

My desktop system at home is currently running OpenSUSE Tumbleweed. It used to run Debian until I got caught in an upgrade version-locked disaster. Then it ran OpenSolaris until Oracle made that an unlikely proposition.

I've been reasonably happy with OpenSUSE until I made the mistake of trying to do a "zypper dup" recently, and the reboot showed me this:

[*** ] A start job is running for dev-disk-by\x2duuid-1a0dc1c5\x2d26cc\x2d45ff\x2da7b1\x2d1f827c971ff9.device (15s / no limit)

As long as one might care to sit there and watch, it never completed whatever task it was trying to perform.

I was able to boot up with a rescue CD (thank goodness I downloaded that first), and was able to mount the disks with no trouble. But no amount of fooling around would make it boot. It was not a happy evening.

After quite a bit of fooling about, I discovered that there were two serious problems that I had to fix manually, and I'm writing this for those who might have run into similar problems:

1. dracut is missing bits

Crucial kernel drivers go missing when dracut builds a new initrd image, and other bits get included whether you like it or not. I have a mix of file systems in use, and here are the new configuration bits I had to add to /etc/dracut.conf.d:

add_dracutmodules+="btrfs"
add_drivers+="btrfs zlib_deflate xor raid6_pq"
add_drivers+="md-mod raid1 raid456"
omit_drivers+="nouveau"

That it would exclude the RAID and btrfs drivers by default was very surprising.

2. udevd is broken by default

The default configuration of udevd simply doesn't work right. It limits itself to an absurdly tiny number of processes, and ends up failing to run trivial scripts needed by the Linux "MD" disk subsystem. That's a big part of my boot problem. The solution is to create a file named /etc/systemd/system/systemd-udevd.service with this inside:

[Unit]
Description=udev Kernel Device Manager
Documentation=man:systemd-udevd.service(8) man:udev(7)
DefaultDependencies=no
Wants=systemd-udevd-control.socket systemd-udevd-kernel.socket
After=systemd-udevd-control.socket systemd-udevd-kernel.socket systemd-sysusers.service
Before=sysinit.target
ConditionPathIsReadWrite=/sys

[Service]
Type=notify
OOMScoreAdjust=-1000
Sockets=systemd-udevd-control.socket systemd-udevd-kernel.socket
Restart=always
RestartSec=0
ExecStart=/usr/lib/systemd/systemd-udevd
MountFlags=slave
KillMode=mixed
WatchdogSec=3min
TasksMax=infinity

The important part is that "TasksMax=infinity" line. That's what fixes the system so that it will actually boot again.

Saturday, August 29, 2015

Missing Windows users? The badly-named "net" command is your friend.

I have a lousy old laptop that I use for IMC Club presentations and the like. I wish I could afford something better, but it mostly works almost well enough to keep me from bothering to look around.

Except every once in a while, it falls apart. Windows is, unfortunately, really terrible in that way. The latest problem it had is so strange and so obscure that I felt I had to write about it just in case someone else runs into it.

First symptom: all users accounts are gone from the login screen. In fact, the login screen itself is skipped, and it goes straight to asking for the Admin password on boot.

On getting in, the "user accounts" tool shows nothing but Admin and Guest. All user accounts are just plain gone. Attempting to add the user accounts back results in an error message saying that the user "already has permission to access this computer." Well, that's unhelpful.

The parental controls section shows the accounts. The files are still there under C:\Users. Everything seems in place, but nobody can log in. Regedit shows nothing interesting. Googling around for all sorts of related phrases shows that quite a few people have experienced this, but nobody has solved it.

I just solved it. Typing "net user Jim", I can see output that ends like this:

    Local Group Memberships
    Global Group memberships   *None
    The command completed successfully.

I tried adding a dummy account, and it showed up with "Local Group Memberships" set to *Users. That's the key. For some reason, all of the accounts had been kicked out of the "Users" group, and that's why they were gone from the login screen. Adding them back in looks like this:

    net localgroup Users Jim /add
    net localgroup Users Madeline /add

After doing that, the system was back to normal. Ah, Windows. Thanks for wasting so many hours of my life.

Tuesday, June 3, 2014

Unfortunate Result

I had high hopes for AMC's new "Halt and Catch Fire." It's great to see the pie man back on TV. And the subject is one that I'm very interested in, as I've been involved with computers since well before the introduction of the PC.

From the start, though, the show went off the rails. They tried to explain the title with a "definition" that was complete gibberish. That phrase doesn't refer to an actual computer instruction or program; it's an old joke. And, sadly, it seems that the folks working on that show didn't get the joke.

Back then, writing in assembly was more common than it is today. Just for fun, many people made lists of fake instructions and passed them around by photocopies and pinning them to bulletin boards in office hallways. Typical entries were "Rewind and Stretch Tape," "Jump To Random Location," and "Execute Operator." No machine had those instructions. They were jokes, as was "Halt and Catch Fire." But they missed that entirely.

That was just the start. They missed the boat in so many ways that it's hard to count them. I don't want to just pick nits -- the pattern "on, on, off, on" or 1101 is a hexadecimal D, not a B -- but the basic plot points are way off the mark. The BIOS wasn't in any way the "secret sauce." It just does basic hardware initialization, a little bit of self-test, and then loads bootstrap code from a floppy drive or (back then) a cassette recorder. Heck, IBM published the source code in the technical reference manual! And if you wanted to extract the ROM contents, even back then, it wouldn't be as crazily hard as it was portrayed for dramatic purposes. You can just read the memory.

So, it's already headed in a bad direction, and it's hard to see why that's the case. The real story has all sorts of interesting twists and turns -- just google "Gary Kildall" for part of it -- and it's a shame to see the opportunity squandered. I'll keep watching for a few episodes, hoping that they'll turn it around, but it's not looking good so far. Instead of portraying life in a start-up, though, it seems to be mimicking the failure of one.