我想从文本中获取哨兵。文本充满了段落和!
,.
或任何其他行分隔符。使用正则表达式我可以做到,但想要它没有正则表达式库。有没有C++类可以分隔句子?
否则,另一个步骤是将每个字符与行分隔字符进行比较。但我不知道如何使用矢量来做到这一点。任何帮助,不胜感激。
在这里它与正则表达式一起使用
#include <string>
#include <vector>
#include <iostream>
#include <iterator>
#include <boost/regex.hpp>
int main()
{
/* Input. */
std::string input = "Here is a short sentence. Here is another one. And we say "this is the final one.", which is another example.";
/* Define sentence boundaries. */
boost::regex re("(?: [\.\!\?]\s+" // case 1: punctuation followed by whitespace
"| \.\",?\s+" // case 2: start of quotation
"| \s+\")", // case 3: end of quotation
boost::regex::perl | boost::regex::mod_x);
/* Iterate through sentences. */
boost::sregex_token_iterator it(begin(input),end(input),re,-1);
boost::sregex_token_iterator endit;
/* Copy them onto a vector. */
std::vector<std::string> vec;
std::copy(it,endit,std::back_inserter(vec));
/* Output the vector, so we can check. */
std::copy(begin(vec),end(vec),
std::ostream_iterator<std::string>(std::cout,"n"));
return 0;
}
使用蛮力方法...我希望我正确理解了您的要求...
#include <vector>
#include <string>
#include <iostream>
int main()
{
std::string input = "Here is a short sentence. Here is another one. And we say "this is the final one.", which is another example.";
int i = 0;
std::vector<std::string> sentences;
std::string current;
while(i < input.length())
{
current += input[i];
if(input[i] == '"')
{
int j = i + 1;
while( j < input.length() && input[j] != '"')
{
current += input[j];
j ++;
}
current += input[j];
i = j + 1;
}
if(input[i] == '.' || input [i] == '!' || input[i] == '?')
{
sentences.push_back(current);
current = "";
}
i ++;
}
for(i =0; i<sentences.size(); i++)
{
std::cout << i << " -> " << sentences[i] << std::endl;
}
}
显然它需要更多的改进,例如删除多个空格等......