Tuesday, November 6, 2018

RTLD_DEEPBIND has deep surprises

Linux RTLD_DEEPBIND has at least one very deep problem: if you load a dynamic object with it, the symbols called from within that one object will ignore LD_PRELOAD and go directly to the dependency, but symbols within the dependency itself are still resolved via LD_PRELOAD.

A real-world example of this failure is with a dynamically loaded object that invokes malloc(3), strdup(3), and free(3).  Suppose we have an application using an LD_PRELOAD that interposes on malloc and free.  The call to malloc, strdup, and free from within the loaded object will go straight to libc, bypassing the preload as expected.  But the implementation of strdup inside libc invokes malloc on its own:

https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strdup.c;hb=HEAD

That invocation of malloc will go to the LD_PRELOAD library, not libc's local definition.  As a result, the pointer that the dynamic object gets back is from the LD_PRELOAD library's implementation of malloc.  If the dynamic object tries to free that pointer, it will go straight to the libc definition of free().  Unless the preload "just" an innocuous wrapper on the libc functions, and doesn't replace them outright, this will fail in spectacular ways.

Here's a demo of the sort of hilarity this causes:

https://www.workingcode.com/deep-disaster.tar

This program produces the following output with "make test":

LD_PRELOAD=./preload.so ./main
Doing normal test: in preload: Inside the normal library
Doing bound test: in libbound: Inside the normal library
Doing inside test: from inside: in preload: Inside the normal library
Doing bound inside test: in libbound: from inside: in preload: Inside the normal library

The first two test results are as expected.  The main program goes through the preload to get to the common library, and the deeply-bound library does not.  The third result is also fine, and represents the main program invoking a function inside the common library that invokes another library function, which redirects through the preload.  The fourth result is the problem.  The deeply-bound library invokes a function in the common library that in turn invokes another library function.  In this case, it (somewhat surprisingly) goes through the preload, even though the user probably expected that RTLD_DEEPBIND would avoid the use of the preload.

At best, it does so "sometimes."

Note that this means that almost any non-trivial use of RTLD_DEEPBIND is incompatible with (at least) the usual LD_PRELOAD=libtcmalloc.so type of wrapper.  Anything you load with RTLD_DEEPBIND is hopelessly compromised if it invokes libc functions that internally use malloc or free.  Or it means that any such wrapper must carefully wrap all exposed libc interfaces (such as strdup and fopen) that can allocate memory, and supply its own implementation -- a feat that may be impossible.

Friday, January 12, 2018

GNU make reports "process_begin: CreateProcess(NULL, pwd, ...) failed." on Cygwin

I very rarely build anything on Windows.  My day job doesn't call for it much, and I certainly wouldn't want to do that for "fun."  So, when I do, I often run into problems that (a) I don't understand and (b) nobody else seems to have seen before.  This is one of those sorts of problems.

My build on a new machine failed almost immediately after typing "make" with this error:
process_begin: CreateProcess(NULL, pwd, ...) failed.
I have no idea what that means.  The "pwd" command, of course, works just fine for me at the command line, and "whence" tells me it comes from /usr/bin.  Everything looks fine there.

After a lot of debugging and comparisons with others who had working configurations, I found this: near the front of my PATH, I had an entry like this:
/cygdrive/c/this/does/../not/exist
That's not the actual path, of course, but it gives the idea.  The bug I encountered is this: if you have a path that includes "/.." and if the previous directory in that path doesn't physically exist on that machine, then the path search function stops right there.  It doesn't look at the rest of the path entries at all.  So, if you have something like this before "/usr/bin", you're sunk.  That works fine on all UNIX and Unix-like systems, and it even works fine for the Cygwin shells.  But, for some reason, it doesn't work within GNU make's code that deals with path searches.

Changing that path so that it didn't have "/.." in it fixed the problem.