DBG

Undefined Behavior Explained with Examples

761 words / 4 minutes

This is a draft, please do not share it publicly yet. I would appreciate feedback though.

It's probably undefined but it appears to work as expected. Can we ship it?

No. Let's see what it takes to break it.

🔗Prerequisite: Compier optimizations

ELI5 explnation of how compier optimizations work and a comparison of aliasing analysis in C++ and Rust.

Optimizations work by allowing the compiler to make more assumptions about your code. For example, in Rust, there can only be one mutable reference to a given place in memory. So when a function takes 2 mutable references, the compiler assumes they always point to different addresses. This means when you change data through one reference and data from the other is already in registers, it doesn't have to reload the data subsequently accessed through the other reference.

[Rust version](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,selection:(endColumn:1,endLineNumber:9,positionColumn:1,positionLineNumber:9,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:'pub+fn+aa(a:+%26mut+i32,+b:+%26mut+i32)+-%3E+i32+%7B%0A++++let+mut+sum+%3D+*a+%2B+*b%3B%0A++++if+*a+%3E+10+%7B%0A++++++++*a+/%3D+2%3B%0A++++%7D%0A++++sum+%2B%3D+*b%3B%0A++++sum%0A%7D%0A'),l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1730,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'-Copt-level%3D3',overrides:!(),selection:(endColumn:12,endLineNumber:11,positionColumn:12,positionLineNumber:11,selectionStartColumn:12,selectionStartLineNumber:11,startColumn:12,startLineNumber:11),source:1),l:'5',n:'0',o:'+rustc+1.73.0+(Editor+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)

pub fn aa(a: &mut i32, b: &mut i32) -> i32 {
    let mut sum = *a + *b;
    if *a > 10 {
        *a /= 2;
    }
    sum += *b;
    sum
}

With -Copt-level=3:

example::aa:
        mov     eax, dword ptr [rdi]
        mov     ecx, dword ptr [rsi]
        cmp     eax, 10
        jle     .LBB0_2
        mov     edx, eax
        shr     edx
        mov     dword ptr [rdi], edx
.LBB0_2:
        lea     eax, [rax + 2*rcx]
        ret

[C++ version](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,selection:(endColumn:14,endLineNumber:6,positionColumn:14,positionLineNumber:6,selectionStartColumn:14,selectionStartLineNumber:6,startColumn:14,startLineNumber:6),source:'int+aa(int++a,+int++b)+%7B%0A++++int+sum+%3D+*a+%2B+*b%3B%0A++++if+(*a+%3E+10)+%7B%0A++++++++*a+/%3D+2%3B%0A++++%7D%0A++++sum+%2B%3D+*b%3B%0A++++return+sum%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang1701,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:c%2B%2B,libs:!(),options:'-O3',overrides:!(),selection:(endColumn:34,endLineNumber:10,positionColumn:34,positionLineNumber:10,selectionStartColumn:34,selectionStartLineNumber:10,startColumn:34,startLineNumber:10),source:1),l:'5',n:'0',o:'+x86-64+clang+17.0.1+(Editor+%231)',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)

int aa(int * a, int * b) {
    int sum = *a + *b;
    if (*a > 10) {
        *a /= 2;
    }
    sum += *b;
    return sum;
}

With -O3:

aa(int*, int*):                              # @aa(int*, int*)
        mov     ecx, dword ptr [rdi]
        mov     eax, dword ptr [rsi]
        mov     edx, eax
        cmp     ecx, 11
        jl      .LBB0_2
        mov     edx, ecx
        shr     edx
        mov     dword ptr [rdi], edx
        mov     edx, dword ptr [rsi]
.LBB0_2:
        add     eax, ecx
        add     eax, edx
        ret

Unoptimized code might load/save data from/to memory each time the reference is used. Optimized code can load data from both into registers in the beginning, do all computations using only registers (which is usually faster) and only save the new values back to memory at the end. But if the assumption is wrong (for example somebody used unsafe to mistakenly create 2 mut refs to the same address), then the behavior can be different depending on opt level. In unoptimized mode, updating data through one reference will immediately be visible when accessing it through the other, whereas the optimized version will continue using the original value in registers because it never notices the memory changed.