在没有船长的情况下提升解析精神



考虑一个预处理器,它将读取原始文本(没有显著的空白或标记)。

有三条规则。

  • resolve_para_entry应该解决调用中的Argument。顶级文本以字符串形式返回。

  • resolve_para应该解析整个参数列表,并将所有顶级参数放在字符串列表中。

  • resolve是入口

在跟踪迭代器并获取文本部分的过程中

样品:

  • sometext(para)→期望字符串列表中的para

  • sometext(para1,para2)→期望字符串列表中的para1para2

  • sometext(call(a))→字符串列表中预期call(a)

  • sometext(call(a,b))←在这里它失败了;"!lit(',')"不会让Parser走到外面。。

规则:

resolve_para_entry = +(  
(iter_pos >> lit('(') >> (resolve_para_entry | eps) >> lit(')') >> iter_pos) [_val=  phoenix::bind(&appendString, _val, _1,_3)]
| (!lit(',') >> !lit(')') >> !lit('(') >> (wide::char_ | wide::space))         [_val = phoenix::bind(&appendChar, _val, _1)]
);
resolve_para = (lit('(') >> lit(')'))[_val = std::vector<std::wstring>()]  // empty para -> old style
| (lit('(') >> resolve_para_entry >> *(lit(',') >> resolve_para_entry) > lit(')'))[_val = phoenix::bind(&appendStringList, _val, _1, _2)]
| eps;
;
resolve = (iter_pos >> name_valid >> iter_pos >> resolve_para >> iter_pos);

最后看起来不是很优雅。也许有一种更好的方法可以在没有队长的情况下解析这些东西

事实上,这应该简单得多。

首先,我不明白为什么队长的缺席是相关的。

其次,暴露原始输入最好使用qi::raw[],而不是使用iter_pos和笨拙的语义动作。

在我看到的其他观察结果中:

  • 否定字符集是用~完成的,因此例如~char_(",()")
  • (p|eps)最好拼写为-p
  • (lit('(') >> lit(')'))可能只是"()"(毕竟没有队长,对吧)
  • p >> *(',' >> p)相当于p % ','
  • 有了以上内容,resolve_para简化为:

    resolve_para = '(' >> -(resolve_para_entry % ',') >> ')';
    
  • 对我来说,resolve_para_entry似乎很奇怪。似乎任何嵌套的括号都被简单地吞噬了。为什么不真正解析递归语法以便检测语法错误呢?


这是我的看法:

定义AST

我更喜欢将此作为第一步,因为它有助于我思考解析器产品:

namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}

创建语法规则

qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()>  resolve;
qi::rule<It, Ast::ArgList()>  arglist;
qi::rule<It, std::string()>   arg, identifier;

以及它们的定义:

identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg        = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist    = '(' >> -(arg % ',') >> ')';
resolve    = identifier >> arglist;
start      = *qr::seek[hold[resolve]];

注:

  • 不再执行语义操作
  • 不再eps
  • 不再使用iter_pos
  • 我选择了不选择arglist。如果你真的想要,把它改回来:

    resolve    = identifier >> -arglist;
    

    但在我们的样本中,它会产生大量噪声输出。

  • 当然,你的切入点(start)会有所不同。我只是做了一件可能可行的最简单的事情,使用Spirit Repository中的另一个方便的解析器指令(比如您已经使用的iter_pos):seek[]

  • 之所以会出现这种情况,是因为:boost::spirit::qi在输出上重复解析——在实际的解析器中可能不需要它

Coliru

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
namespace Ast {
using ArgList = std::list<std::string>;
struct Resolve {
std::string name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = char_("a-zA-Z_") >> *char_("a-zA-Z0-9_");
arg        = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist    = '(' >> -(arg % ',') >> ')';
resolve    = identifier >> arglist;
start      = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()>  resolve;
qi::rule<It, Ast::ArgList()>  arglist;
qi::rule<It, std::string()>   arg, identifier;
};
#include <iostream>
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(
Samples:
sometext(para)        → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a))     → expect call(a) in the string list
sometext(call(a,b))   ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolvesn";
} else {
std::cout << "Parsing failedn";
}
for (auto& resolve: data) {
std::cout << " - " << resolve.name << "n   (n";
for (auto& arg : resolve.arglist) {
std::cout << "       " << arg << "n";
}
std::cout << "   )n";
}
}

打印

Parsed 6 resolves
- sometext
(
para
)
- sometext
(
para1
para2
)
- sometext
(
call(a)
)
- call
(
a
)
- call
(
a
b
)
- lit
(
'
'
)

更多想法

最后一个输出显示了当前语法的一个问题:lit(',')显然不应该被视为具有两个参数的调用。

我最近做了一个关于用参数提取(嵌套)函数调用的答案,它可以更巧妙地完成任务:

  • 未应用Boost精神解析规则
  • 或者这个boost spirit报告语义错误

奖金

使用string_view并显示所有提取单词的精确行/列信息的奖励版本。

请注意,它仍然不需要任何phoenix或语义操作。相反,它只是定义了从迭代器范围分配给boost::string_view的必要特性。

在Coliru上直播

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/repository/include/qi_seek.hpp>
#include <boost/utility/string_view.hpp>
namespace Ast {
using Source  = boost::string_view;
using ArgList = std::list<Source>;
struct Resolve {
Source name;
ArgList arglist;
};
using Resolves = std::vector<Resolve>;
}
BOOST_FUSION_ADAPT_STRUCT(Ast::Resolve, name, arglist)
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<boost::string_view, It, void> {
static void call(It f, It l, boost::string_view& attr) { 
attr = boost::string_view { f.base(), size_t(std::distance(f.base(),l.base())) };
}
};
} } }
namespace qi = boost::spirit::qi;
namespace qr = boost::spirit::repository::qi;
template <typename It>
struct Parser : qi::grammar<It, Ast::Resolves()>
{
Parser() : Parser::base_type(start) {
using namespace qi;
identifier = raw [ char_("a-zA-Z_") >> *char_("a-zA-Z0-9_") ];
arg        = raw [ +('(' >> -arg >> ')' | +~char_(",)(")) ];
arglist    = '(' >> -(arg % ',') >> ')';
resolve    = identifier >> arglist;
start      = *qr::seek[hold[resolve]];
}
private:
qi::rule<It, Ast::Resolves()> start;
qi::rule<It, Ast::Resolve()>  resolve;
qi::rule<It, Ast::ArgList()>  arglist;
qi::rule<It, Ast::Source()>   arg, identifier;
};
#include <iostream>
struct Annotator {
using Ref = boost::string_view;
struct Manip {
Ref fragment, context;
friend std::ostream& operator<<(std::ostream& os, Manip const& m) {
return os << "[" << m.fragment << " at line:" << m.line() << " col:" << m.column() << "]";
}
size_t line() const {
return 1 + std::count(context.begin(), fragment.begin(), 'n');
}
size_t column() const {
return 1 + (fragment.begin() - start_of_line().begin());
}
Ref start_of_line() const {
return context.substr(context.substr(0, fragment.begin()-context.begin()).find_last_of('n') + 1);
}
};
Ref context;
Manip operator()(Ref what) const { return {what, context}; }
};
int main() {
using It = std::string::const_iterator;
std::string const samples = R"--(Samples:
sometext(para)        → expect para in the string list
sometext(para1,para2) → expect para1 and para2 in string list
sometext(call(a))     → expect call(a) in the string list
sometext(call(a,b))   ← here it fails; it seams that the "!lit(',')" wont make the parser step outside
)--";
It f = samples.begin(), l = samples.end();
Ast::Resolves data;
if (parse(f, l, Parser<It>{}, data)) {
std::cout << "Parsed " << data.size() << " resolvesn";
} else {
std::cout << "Parsing failedn";
}
Annotator annotate{samples};
for (auto& resolve: data) {
std::cout << " - " << annotate(resolve.name) << "n   (n";
for (auto& arg : resolve.arglist) {
std::cout << "       " << annotate(arg) << "n";
}
std::cout << "   )n";
}
}

打印

Parsed 6 resolves
- [sometext at line:3 col:1]
(
[para at line:3 col:10]
)
- [sometext at line:4 col:1]
(
[para1 at line:4 col:10]
[para2 at line:4 col:16]
)
- [sometext at line:5 col:1]
(
[call(a) at line:5 col:10]
)
- [call at line:5 col:34]
(
[a at line:5 col:39]
)
- [call at line:6 col:10]
(
[a at line:6 col:15]
[b at line:6 col:17]
)
- [lit at line:6 col:62]
(
[' at line:6 col:66]
[' at line:6 col:68]
)

提升精神:";语义行为是邪恶的"?

相关内容

最新更新