SEGMENTATION-VIOLATION error embedding ecl in programs that use boehm gc
Describe the problem here.
/* -*- mode: c; -*-
file: main.c
*/
#include <stdlib.h>
#include <math.h>
#include <ecl/ecl.h>
#include <gc.h>
void say_hello();
char str_my_t[] =
"t";
char str_hello[] =
"\"hello, from ecl!~%\"";
char* argv;
char** pargv;
int main() {
GC_set_no_dls(1);
GC_set_all_interior_pointers(1);
GC_set_finalize_on_demand(0);
GC_INIT();
argv = "app";
pargv = &argv;
cl_boot(1, pargv);
atexit(cl_shutdown);
/* Set up handler for Lisp errors to prevent buggy Lisp (an */
/* imposibility, I know!) from killing the app. */
const cl_env_ptr l_env = ecl_process_env();
CL_CATCH_ALL_BEGIN(l_env) {
CL_UNWIND_PROTECT_BEGIN(l_env) {
say_hello();
}
CL_UNWIND_PROTECT_EXIT {}
CL_UNWIND_PROTECT_END;
}
CL_CATCH_ALL_END;
return 0;
}
void say_hello() {
cl_object my_t = c_string_to_object(str_my_t);
cl_object hello = c_string_to_object(str_hello);
cl_object cl_load = ecl_make_symbol("FORMAT","CL");
cl_funcall(3, cl_load, my_t, hello);
return;
}
Copy the ecl shared library to the local directory (not required by makes various testing scenarios easier, compile using the following command:
cp /usr/local/lib/libecl.so.16.1 .
export ldflags="-L. -Wl,-R -Wl,."
export cflags="-DGC_LINUX_THREADS -D_REENTRANT -fPIC -g -pipe -Wall"
gcc main.c $cflags $ldflags -lecl -lgc -o hello
Run hello using the following:
export LD_LIBRARY_PATH=$(pwd):$LD_LIBRARY_PATH; ./hello
We should have the output:
hello, from ecl!
Instead a segmentation fault is triggered:
;;; Unhandled lisp initialization error
;;; Message:
undefined-function
;;; Arguments:
(:name ext::segmentation-violation)
Internal or unrecoverable error in:
Lisp initialization error.
;;; ECL C Backtrace
;;; ./libecl.so.16.1(si_dump_c_backtrace+0x39) [0x7fc2a468e619]
;;; ./libecl.so.16.1(ecl_internal_error+0x44) [0x7fc2a46779c4]
;;; ./libecl.so.16.1(+0x19db22) [0x7fc2a4677b22]
;;; ./libecl.so.16.1(cl_funcall+0x86) [0x7fc2a4656dd6]
;;; ./libecl.so.16.1(cl_error+0xf1) [0x7fc2a46789e1]
;;; ./libecl.so.16.1(+0x19ecf8) [0x7fc2a4678cf8]
;;; ./libecl.so.16.1(ecl_function_dispatch+0x58) [0x7fc2a4656d38]
;;; ./libecl.so.16.1(+0x1c9a3a) [0x7fc2a46a3a3a]
;;; ./libecl.so.16.1(+0x1ca416) [0x7fc2a46a4416]
;;; /lib/x86_64-linux-gnu/libc.so.6(+0x35860) [0x7fc2a3ed6860]
;;; ./libecl.so.16.1(si_signal_simple_error+0x108) [0x7fc2a46134b8]
;;; ./libecl.so.16.1(FEreader_error+0x152) [0x7fc2a46782e2]
;;; ./libecl.so.16.1(+0x195789) [0x7fc2a466f789]
;;; ./libecl.so.16.1(+0x197c3b) [0x7fc2a4671c3b]
;;; ./libecl.so.16.1(+0x199025) [0x7fc2a4673025]
;;; ./libecl.so.16.1(+0x1993d0) [0x7fc2a46733d0]
;;; ./libecl.so.16.1(+0x197ac8) [0x7fc2a4671ac8]
;;; ./libecl.so.16.1(ecl_init_module+0x37a) [0x7fc2a4674faa]
;;; ./libecl.so.16.1(init_lib__ECLJUI5KMCU6PXN9_7R2JX731+0x2ae) [0x7fc2a45728be]
;;; ./libecl.so.16.1(ecl_init_module+0x3d2) [0x7fc2a4675002]
;;; ./libecl.so.16.1(cl_boot+0x96d) [0x7fc2a457170d]
;;; ./hello(+0xd04) [0x55ec758d3d04]
;;; /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf1) [0x7fc2a3ec13f1]
;;; ./hello(+0xb9a) [0x55ec758d3b9a]
Aborted (core dumped)
(lisp-implementation-version)
"16.1.3"
;; note, I've confirmed this is a problem with the packaged version shipped with
;; Ubuntu 16.04 and with a freshly compiled ecl, built off of master.
(ext:lisp-implementation-vcs-id)
"0476bca326ede723e883fe06c19c2cb748b1deec"
(software-type)
"Linux"
(software-version)
"4.8.0-41-generic"
(machine-type)
"x86_64"
*features*
(:WALKER :CDR-1 :CDR-5 :LINUX :FORMATTER :CDR-7 :ECL-WEAK-HASH :LITTLE-ENDIAN
:ECL-READ-WRITE-LOCK :LONG-LONG :UINT64-T :UINT32-T :UINT16-T
:RELATIVE-PACKAGE-NAMES :LONG-FLOAT :UNICODE :DFFI :CLOS-STREAMS :CMU-FORMAT
:UNIX :ECL-PDE :DLOPEN :CLOS :THREADS :BOEHM-GC :ANSI-CL :COMMON-LISP
:IEEE-FLOATING-POINT :CDR-14 :PREFIXED-API :FFI :X86_64 :COMMON :ECL)
The underlying cause of the error seems to be conflicting usage of the boehm gc library. In particular, ecl tries to configure and initialize that library. However, in an embedded context the main application may also initialize the library. It seems that multiple initializations may itself not be a problem with gc, but, rather that making certain calls, such as GC_set_all_interior_pointers, after GC_init has been called and that conflict with the main program's configuration do cause issues. The specific scenario implemented in the code above makes the following gc calls which conflict:
GC_set_all_interior_pointers(1);
GC_init();
GC_set_all_interior_pointers(0);
For additional information on the API see: https://www.hboehm.info/gc/gcinterface.html and https://www.hboehm.info/gc/gc_source/gch.txt
One way of fixing the issue is simply to comment out ecl's initialization, since the library doesn't necessarily need to be initialized, depending on the platform and features used. I've attached an patch that is an example of this approach. Some thoughts on a more complete solution:
-
Attempt to detect whether the application in question has already called GC_init and don't apply any additional configurations or recall GC_init if it has. Unfortunately, there doesn't appear to be any way do do this through the documented interface, though there is an external variable declared in gc_priv.h: GC_is_initialized.
-
Paramitize the build to suppress calling GC_init if a build variable is set.
-
Only call GC_init if ecl is compiled in stand-alone mode.
-
Introduce a dynamic setting in the runtime that allows you to suppress GC_init and any associated gc configurations when some flag is set or parameter is passed to cl_boot.
0001-Commenting-out-gc-funcs-to-allow-ecl-be-embedded-in-.patch