未定义的LLVM IR取决于阵列大小



我根据本地数组的大小看到了不确定的行为。对于以下代码:


    int wbExecute_simple(char nInput, char add_pattern)
    {
        char test_array[4] = { 0xa, 0xb, 0xc, 0xd };
        int i = 0;
        for (; i < 4; ++i)
        {
            test_array[i] ^= nInput;
        }
        return (test_array[nInput] + add_pattern);
    }
The LLVM IR representation for the first line is:

    lbl_0_wb3954:
    %local_0_wb3954 = alloca [4 x i8], align 1
    %local_1_wb3954 = bitcast [4 x i8]* %local_0_wb3954 to i32*, !dbg !7
    %local_2_wb3954 = bitcast [4 x i8]* @global_0_wb3954 to i32*
    %local_3_wb3954 = load i32* %local_2_wb3954, align 1, !dbg !7
    store i32 %local_3_wb3954, i32* %local_1_wb3954, align 1, !dbg !7
    br label %lbl_1_wb3954, !dbg !8

Having the array size of 2 also produces similar result. But, changing the size of array from 4 to 3 as below,


    int wbExecute_simple(char nInput, char add_pattern)
    {
        char test_array[3] = { 0xa, 0xb, 0xc };
        int i = 0;
        for (; i < 3; ++i)
        {
        test_array[i] ^= nInput;
        }
        return (test_array[nInput] + add_pattern);
    }

yields


    define i32 @wbExecute_simple(i8 signext %nInput, i8 signext %add_pattern) #0 {
        lbl_0_wb3954:
        %local_0_wb3954 = alloca [3 x i8], align 1
        %local_1_wb3954 = getelementptr inbounds [3 x i8]* %local_0_wb3954, i32 0, i32 0, !dbg !7
        %local_2_wb3954 = getelementptr [3 x i8]* @global_0_wb3954, i32 0, i32 0
        call void @llvm.memcpy.p0i8.p0i8.i32(i8* %local_1_wb3954, i8* %local_2_wb3954, i32 3, i32 1, i1 false), !dbg !7
        br label %lbl_1_wb3954, !dbg !8

Is there any optimisation flag to make the LLVM IR for arrays of both size to be the same?

I am not sure what you mean by undefined behaviour. This looks like a legitimate compiler optimization.

When the array has length 4, the compiler replaces copying the array by copying a single integer, because that has size 4 too, and can be done in a single operation. I assume for size two it will copy a 16-bit integer.

Your system will probably not support any 24-bit integers, so the compiler decided not to optimize this for size 3 and keep the memcpy intrinsic. Note that without an "int24" type supported by the processor and memory system, this optimization would not make sense for a size 3 array. There may be further optimizations in the compiler backend to improve the remaining memcpy intrinsic, depending on whether it makes sense on the target machine.

I will comment the generated IR to clarify what the code does:

   lbl_0_wb3954:
    // allocate the local array
    %local_0_wb3954 = alloca [4 x i8], align 1
    // cast the address of the local array to an integer pointer
    %local_1_wb3954 = bitcast [4 x i8]* %local_0_wb3954 to i32*, !dbg !7
    // cast the address of the constant array {  0xa, 0xb, 0xc, 0xd } to an integer pointer
    %local_2_wb3954 = bitcast [4 x i8]* @global_0_wb3954 to i32*
    // load the constant array as a 32 bit integer
    %local_3_wb3954 = load i32* %local_2_wb3954, align 1, !dbg !7
    // store the value to the local array
    store i32 %local_3_wb3954, i32* %local_1_wb3954, align 1, !dbg !7
    br label %lbl_1_wb3954, !dbg !8

我认为这种类型的优化很容易被强加于长度3。

通过-O2会导致长度3循环的完整展开,然后删除静态初始化,而有利于将常数0xa0xb0xc插入代码中,四个尺寸的代码相似。

  %3 = alloca [3 x i8], align 1
  // compute address of first array element
  %4 = getelementptr inbounds [3 x i8], [3 x i8]* %3, i64 0, i64 0, !dbg !22
  // compute 0xa ^ nInput;
  %5 = xor i8 %0, 10, !dbg !25
  // store the result
  store i8 %5, i8* %4, align 1, !dbg !25, !tbaa !29
  // do the same for the second and third elements
  %6 = getelementptr inbounds [3 x i8], [3 x i8]* %3, i64 0, i64 1, !dbg !32
  %7 = xor i8 %0, 11, !dbg !25
  store i8 %7, i8* %6, align 1, !dbg !25, !tbaa !29
  %8 = getelementptr inbounds [3 x i8], [3 x i8]* %3, i64 0, i64 2, !dbg !32
  %9 = xor i8 %0, 12, !dbg !25
  store i8 %9, i8* %8, align 1, !dbg !25, !tbaa !29

最新更新