c++ boost base64解码器在出现换行符时失败



当给定要解码的base64 'text'包含新行时,下面将抛出异常-存在非base64字符。关于新行

抛出boost::archive::iterators::dataflow_exception实例后调用终止What():尝试解码不属于base64字符集的值

有人知道如何告诉boost优雅地处理换行符吗?我意识到我可以在解码之前自己从字符串中删除它们,但我希望并猜测有一种更精简的方法。

typedef transform_width< binary_from_base64<remove_whitespace<char*>>, 8, 6 > base64_dec;
unsigned int size = s.size(); //where 's' is the string holding the base64 characters to include newlines at every 76th character
std::string decoded_token(base64_dec(s.c_str()), base64_dec(s.c_str() + size));

问题是换行符是而不是这个问题。filter_iterator从根本上就坏了。

一旦输入序列以不满足过滤器谓词的字符结束(在本例中为空白字符),它将导致未定义行为:

Live On Compiler Explorer

#include <boost/archive/iterators/remove_whitespace.hpp>
#include <iomanip>
#include <iostream>
namespace bai = boost::archive::iterators;
int main() {
using It = bai::remove_whitespace<const char*>;
std::string const s = "oops "; // ends in whitespace, causes UB
std::string filtered(It(s.c_str()), It(s.c_str() + s.length()));
std::cout << std::quoted(filtered) << std::flush;
}

打印,启用了ASan(没有它只是段错误):

=================================================================
==1==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffecf132b0 at pc 0x00000040390d bp 0x7fffecf12fd0 sp 0x7fffecf12fc8
READ of size 1 at 0x7fffecf132b0 thread T0
#0 0x40390c in dereference_impl /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105
#1 0x40390c in dereference /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:113
#2 0x40390c in dereference<boost::archive::iterators::filter_iterator<(anonymous namespace)::remove_whitespace_predicate<char>, char const*> > /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:550
#3 0x40390c in operator* /opt/compiler-explorer/libs/boost_1_78_0/boost/iterator/iterator_facade.hpp:656
#4 0x40390c in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<boost::archive::iterators::remove_whitespace<char const*> >(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::input_iterator_tag) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.tcc:204
#5 0x40390c in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string<boost::archive::iterators::remove_whitespace<char const*>, void>(boost::archive::iterators::remove_whitespace<char const*>, boost::archive::iterators::remove_whitespace<char const*>, std::allocator<char> const&) /opt/compiler-explorer/gcc-trunk-20220419/include/c++/12.0.1/bits/basic_string.h:756
#6 0x40390c in main /app/example.cpp:11
#7 0x7fb625b560b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x240b2)
#8 0x4041ed in _start (/app/output.s+0x4041ed)
Address 0x7fffecf132b0 is located in stack of thread T0 at offset 576 in frame
#0 0x40245f in main /app/example.cpp:7
This frame has 21 object(s):
[32, 33) '<unknown>'
[48, 49) '<unknown>'
[64, 65) '<unknown>'
[80, 81) '<unknown>'
[96, 97) '<unknown>'
[112, 113) '<unknown>'
[128, 136) 'start'
[160, 168) 'start'
[192, 200) '__guard'
[224, 232) 'start'
[256, 264) 'start'
[288, 296) '__capacity'
[320, 328) '__guard'
[352, 368) '<unknown>'
[384, 400) '<unknown>'
[416, 432) '<unknown>'
[448, 464) '<unknown>'
[480, 496) '<unknown>'
[512, 528) '<unknown>'
[544, 576) 's' (line 9) <== Memory access at offset 576 overflows this variable
[608, 640) 'filtered' (line 11)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /opt/compiler-explorer/libs/boost_1_78_0/boost/archive/iterators/remove_whitespace.hpp:105 in dereference_impl
Shadow bytes around the buggy address:
0x10007d9da600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1
0x10007d9da610: f1 f1 f8 f2 01 f2 f8 f2 f8 f2 f8 f2 01 f2 00 f2
0x10007d9da620: f2 f2 00 f2 f2 f2 f8 f2 f2 f2 00 f2 f2 f2 00 f2
0x10007d9da630: f2 f2 00 f2 f2 f2 00 f2 f2 f2 00 00 f2 f2 00 00
0x10007d9da640: f2 f2 00 00 f2 f2 00 00 f2 f2 00 00 f2 f2 00 00
=>0x10007d9da650: f2 f2 00 00 00 00[f2]f2 f2 f2 00 00 00 00 f3 f3
0x10007d9da660: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da670: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da690: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10007d9da6a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable:           00
Partially addressable: 01 02 03 04 05 06 07 
Heap left redzone:       fa
Freed heap region:       fd
Stack left redzone:      f1
Stack mid redzone:       f2
Stack right redzone:     f3
Stack after return:      f5
Stack use after scope:   f8
Global redzone:          f9
Global init order:       f6
Poisoned by user:        f7
Container overflow:      fc
Array cookie:            ac
Intra object redzone:    bb
ASan internal:           fe
Left alloca redzone:     ca
Right alloca redzone:    cb
==1==ABORTING

你可以认为自己很幸运,因为你注意到了症状,而不是它吃你的小狗或在生产代码中发射核武器。

应该报告。没有单独测试filter_iterator(甚至remove_whitespace),以前的票似乎表明了一个立场,如"它适用于我";("Me"是Boost序列化库)。参见https://github.com/boostorg/serialization/issues/135.

有趣的是,在那个票证上的分析是没有意义的,因为filter_iterator没有两个迭代器的构造函数,因为…直到永远。我只能猜测Robert错误地查看了Boost Iterator,而不是Boost Archive中的filter_iterator

所以我的直觉是建议你使用Boost Iterator中的filter_iterator.hpp代替。具有讽刺意味的是,它花了几次尝试(和旅行到cppslack/github)Live On Compiler Explorer

修复Boost迭代器的filter_iterator

我们应该能够修复使用工作filter_iterator实现:

using FiltIt =
boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;
using base64_dec =                              //
bai::transform_width<                       //
bai::binary_from_base64<FiltIt>, 8, 6>; //

现在,它仍然是棘手的把它做好。值得注意的是,朴素的方法只会在UB上再次失败:

// CAUTION: this invokes UB:
std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));

这是隐式转换+默认实参的诅咒。我们必须显式地单独构造FiltIt:

FiltIt f(IsGraph{}, s.begin(), s.end()), // !!
l(f.predicate(), f.end(), f.end());  // !!

现在我们可以"在base64_dec:

中使用
std::string filtered(base64_dec{f}, base64_dec{l});

注意,统一的{}初始化器可以避开大多数令人烦恼的解析

Live On Compiler Explorer

#include <boost/archive/iterators/binary_from_base64.hpp>
#include <boost/archive/iterators/transform_width.hpp>
#include <boost/iterator/filter_iterator.hpp>
#include <iomanip>
#include <iostream>
namespace bai = boost::archive::iterators;
static std::string const s = "aGVsbG8gnd29ybGQK";
int main() {
std::cout << std::unitbuf;
struct IsGraph {
// unsigned char prevents sign extension
bool operator()(unsigned char ch) const {
return std::isgraph(ch); // !std::isspace
}
};
using FiltIt =
boost::iterators::filter_iterator<IsGraph, std::string::const_iterator>;
using base64_dec =                              //
bai::transform_width<                       //
bai::binary_from_base64<FiltIt>, 8, 6>; //
//// CAUTION: this invokes UB:
//std::string filtered(base64_dec(s.begin()), base64_dec(s.end()));
FiltIt f(IsGraph{}, s.begin(), s.end()),   // !!
l(f.predicate(), f.end(), f.end()); // !!
std::string filtered(base64_dec{f}, base64_dec{l});
std::cout << "OUT:" << std::quoted(filtered) << std::endl;
}

打印自身(减去样本base64)

OUT:"hello world
"

总结/TL;博士

考虑潜在的bug和未记录的限制,考虑使用正确的,更简单的base64实现。

Beast在其实现细节中有一个,所以它也是不支持的,但是很有可能它至少不那么脆。

或者,必须是一个具有适当测试和文档的库。

最新更新