I have an R package with C compiled code that's been relatively stable for quite a while and is frequently tested against a broad variety of platforms and compilers (windows/osx/debian/fedora gcc/clang).
More recently a new platform was added to test the package again:
Logs from checks with gcc trunk aka 10.0.1 compiled from source on Fedora 30. (For some archived packages, 10.0.0.) x86_64 Fedora 30 Linux FFLAGS="-g -O2 -mtune=native -Wall -fallow-argument-mismatch" CFLAGS="-g -O2 -Wall -pedantic -mtune=native -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection" CXXFLAGS="-g -O2 -Wall -pedantic -mtune=native -Wno-ignored-attributes -Wno-deprecated-declarations -Wno-parentheses -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection" At which point the compiled code promptly started segfaulting along these lines:
*** caught segfault *** address 0x1d00000001, cause 'memory not mapped' I've been able to reproduce the segfault consistently by using the rocker/r-base docker container with gcc-10.0.1 with optimization level -O2. Running a lower optimization gets rid of the problem. Running any other set-up, including under valgrind (both -O0 and -O2), UBSAN (gcc/clang), shows no problems at all. I'm also reasonably sure this ran under gcc-10.0.0, but don't have the data.
I ran the gcc-10.0.1 -O2 version with gdb and noticed something that seems odd to me:
While stepping through the highlighted section it appears the initialization of the second elements of the arrays is skipped (R_alloc is a wrapper around malloc that self garbage collects when returning control to R; the segfault happens before return to R). Later, the program crashes when the un-initialized element (in the gcc.10.0.1 -O2 version) is accessed.
I fixed this by explicitly initializing the element in question everywhere in the code that eventually led to the usage of the element, but it really should have been initialized to an empty string, or at least that's what I would have assumed.
Am I missing something obvious or doing something stupid? Both are reasonably likely as C is my second language by far. It's just strange that this just cropped up now, and I can't figure out what the compiler is trying to do.
UPDATE: Instructions to reproduce this, although this will only reproduce so long as debian:testing docker container has gcc-10 at gcc-10.0.1. Also, don't just run these commands if you don't trust me.
Sorry this is not a minimal reproducible example.
docker pull rocker/r-base docker run --rm -ti --security-opt seccomp=unconfined \ rocker/r-base /bin/bash apt-get update apt-get install gcc-10 gdb gcc-10 --version # confirm 10.0.1 # gcc-10 (Debian 10-20200222-1) 10.0.1 20200222 (experimental) # [master revision 01af7e0a0c2:487fe13f218:e99b18cf7101f205bfdd9f0f29ed51caaec52779] mkdir ~/.R touch ~/.R/Makevars echo "CC = gcc-10 CFLAGS = -g -O2 -Wall -pedantic -mtune=native -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong -fstack-clash-protection -fcf-protection " >> ~/.R/Makevars R -d gdb --vanilla Then in the R console, after typing run to get gdb to run the program:
f.dl <- tempfile() f.uz <- tempfile() github.url <- 'https://github.com/brodieG/vetr/archive/v0.2.8.zip' download.file(github.url, f.dl) unzip(f.dl, exdir=f.uz) install.packages( file.path(f.uz, 'vetr-0.2.8'), repos=NULL, INSTALL_opts="--install-tests", type='source' ) # minimal set of commands to segfault library(vetr) alike(pairlist(a=1, b="character"), pairlist(a=1, b=letters)) alike(pairlist(1, "character"), pairlist(1, letters)) alike(NULL, 1:3) # not a wild card at top level alike(list(NULL), list(1:3)) # but yes when nested alike(list(NULL, NULL), list(list(list(1, 2, 3)), 1:25)) alike(list(NULL), list(1, 2)) alike(list(), list(1, 2)) alike(matrix(integer(), ncol=7), matrix(1:21, nrow=3)) alike(matrix(character(), nrow=3), matrix(1:21, nrow=3)) alike( matrix(integer(), ncol=3, dimnames=list(NULL, c("R", "G", "B"))), matrix(1:21, ncol=3, dimnames=list(NULL, c("R", "G", "B"))) ) # Adding tests from docs mx.tpl <- matrix( integer(), ncol=3, dimnames=list(row.id=NULL, c("R", "G", "B")) ) mx.cur <- matrix( sample(0:255, 12), ncol=3, dimnames=list(row.id=1:4, rgb=c("R", "G", "B")) ) mx.cur2 <- matrix(sample(0:255, 12), ncol=3, dimnames=list(1:4, c("R", "G", "B"))) alike(mx.tpl, mx.cur2) Inspecting in gdb pretty quickly shows (if I understand correctly) that CSR_strmlen_x is trying to access the string that was not initialized.
UPDATE 2: this is a highly recursive function, and on top of that the string initialization bit gets called many, many times. This is mostly b/c I was being lazy, we only need the strings initialized for the one time we actually encounter something we want to report in the recursion, but it was easier to initialize every time it is possible to encounter something. I mention this because what you'll see next shows multiple initializations, but only one of them (presumably the one with address <0x1400000001>) is being used.
I can't guarantee that the stuff I'm showing here is directly related to the element that caused the segfault (though it is the same illegal address acccess), but as @nate-eldredge asked it does show that the array element is not initialized either just before return or just after return in the calling function. Note the calling function is initializing 8 of these, and I show them all, with all them filled with either garbage or inaccessible memory.
UPDATE 3, disassembly of function in question:
Breakpoint 1, ALIKEC_res_strings_init () at alike.c:75 75 return res; (gdb) p res.current[0] $1 = 0x7ffff46a0aa5 "%s%s%s%s" (gdb) p res.current[1] $2 = 0x1400000001 <error: Cannot access memory at address 0x1400000001> (gdb) disas /m ALIKEC_res_strings_init Dump of assembler code for function ALIKEC_res_strings_init: 53 struct ALIKEC_res_strings ALIKEC_res_strings_init() { 0x00007ffff4687fc0 <+0>: endbr64 54 struct ALIKEC_res_strings res; 55 56 res.target = (const char **) R_alloc(5, sizeof(const char *)); 0x00007ffff4687fc4 <+4>: push %r12 0x00007ffff4687fc6 <+6>: mov $0x8,%esi 0x00007ffff4687fcb <+11>: mov %rdi,%r12 0x00007ffff4687fce <+14>: push %rbx 0x00007ffff4687fcf <+15>: mov $0x5,%edi 0x00007ffff4687fd4 <+20>: sub $0x8,%rsp 0x00007ffff4687fd8 <+24>: callq 0x7ffff4687180 <R_alloc@plt> 0x00007ffff4687fdd <+29>: mov $0x8,%esi 0x00007ffff4687fe2 <+34>: mov $0x5,%edi 0x00007ffff4687fe7 <+39>: mov %rax,%rbx 57 res.current = (const char **) R_alloc(5, sizeof(const char *)); 0x00007ffff4687fea <+42>: callq 0x7ffff4687180 <R_alloc@plt> 58 59 res.target[0] = "%s%s%s%s"; 0x00007ffff4687fef <+47>: lea 0x1764a(%rip),%rdx # 0x7ffff469f640 0x00007ffff4687ff6 <+54>: lea 0x18aa8(%rip),%rcx # 0x7ffff46a0aa5 0x00007ffff4687ffd <+61>: mov %rcx,(%rbx) 60 res.target[1] = ""; 61 res.target[2] = ""; 0x00007ffff4688000 <+64>: mov %rdx,0x10(%rbx) 62 res.target[3] = ""; 0x00007ffff4688004 <+68>: mov %rdx,0x18(%rbx) 63 res.target[4] = ""; 0x00007ffff4688008 <+72>: mov %rdx,0x20(%rbx) 64 65 res.tar_pre = "be"; 66 67 res.current[0] = "%s%s%s%s"; 0x00007ffff468800c <+76>: mov %rax,0x8(%r12) 0x00007ffff4688011 <+81>: mov %rcx,(%rax) 68 res.current[1] = ""; 69 res.current[2] = ""; 0x00007ffff4688014 <+84>: mov %rdx,0x10(%rax) 70 res.current[3] = ""; 0x00007ffff4688018 <+88>: mov %rdx,0x18(%rax) 71 res.current[4] = ""; 0x00007ffff468801c <+92>: mov %rdx,0x20(%rax) 72 73 res.cur_pre = "is"; 74 75 return res; => 0x00007ffff4688020 <+96>: lea 0x14fe0(%rip),%rax # 0x7ffff469d007 0x00007ffff4688027 <+103>: mov %rax,0x10(%r12) 0x00007ffff468802c <+108>: lea 0x14fcd(%rip),%rax # 0x7ffff469d000 0x00007ffff4688033 <+115>: mov %rbx,(%r12) 0x00007ffff4688037 <+119>: mov %rax,0x18(%r12) 0x00007ffff468803c <+124>: add $0x8,%rsp 0x00007ffff4688040 <+128>: pop %rbx 0x00007ffff4688041 <+129>: mov %r12,%rax 0x00007ffff4688044 <+132>: pop %r12 0x00007ffff4688046 <+134>: retq 0x00007ffff4688047: nopw 0x0(%rax,%rax,1) End of assembler dump. UPDATE 4:
So, trying to parse through the standard here are the parts of it that seem relevant (C11 draft):
6.3.2.3 Par7 Conversions > Other Operands > Pointers
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned 68) for the referenced type, the behavior is undefined.
Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type,the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
6.5 Par6 Expressions
The effective type of an object for an access to its stored value is the declared type of the object, if any. 87) If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access and for subsequent accesses that do not modify the stored value. If a value is copied into an object having no declared type using memcpy or memmove, or is copied as an array of character type, then the effective type of the modified object for that access and for subsequent accesses that do not modify the value is the effective type of the object from which the value is copied, if it has one. For all other accesses to an object having no declared type, the effective type of the object is simply the type of the lvalue used for the access.
87) Allocated objects have no declared type.
IIUC R_alloc returns an offset into a malloced block that is guaranteed to be double aligned, and the size of the block after the offset is of the requested size (there is also allocation before the offset for R specific data). R_alloc casts that pointer to (char *) on return.
Section 6.2.5 Par 29
A pointer to void shall have the same representation and alignment requirements as a pointer to a character type. 48) Similarly, pointers to qualified or unqualified versions of compatible types shall have the same representation and alignment requirements. All pointers to structure types shall have the same representation and alignment requirements as each other.
All pointers to union types shall have the same representation and alignment requirements as each other.
Pointers to other types need not have the same representation or alignment requirements.48) The same representation and alignment requirements are meant to imply interchangeability asarguments to functions, return values from functions, and members of unions.
So the question is "are we allowed to recast the (char *) to (const char **) and write to it as (const char **)". My reading of the above is that so long as pointers on the systems the code run in have alignment compatible with double alignment, then its okay.
Are we violating "strict aliasing"? i.e.:
6.5 Par 7
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 88)
— a type compatible with the effective type of the object ...
88) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
So, what should the compiler think the effective type of the object pointed to by res.target (or res.current) is? Presumably the declared type (const char **), or is this actually ambiguous? It feels to me that it isn't in this case only because there is no other 'lvalue' in scope that accesses the same object.
I'll admit I'm struggling mightily to extract sense from these sections of the standard.


-mtune=nativeoptimizes for the particular CPU that your machine has. That will be different for different testers and may be part of the issue. If you run the compilation with-vyou should be able to see which cpu family that is on your machine (e.g.-mtune=skylakeon my computer).disassembleinstruction inside gdb.