我有中文新闻提要,我想把句子分成更小的块传递给API。
在ios中怎么做?我把英语的字符长度设置为50个字符。
目前我正在使用rangeOfString:
函数查找点,逗号和断句。
NSString *str = nil, *rem = nil;
str = [final substringToIndex:MAX_CHAR_Private];
rem = [final substringFromIndex:MAX_CHAR_Private];
NSRange rng = [rem rangeOfString:@"?"];
if (rng.location == NSNotFound) {
rng = [rem rangeOfString:@"!"];
if (rng.location == NSNotFound) {
rng = [rem rangeOfString:@","];
if (rng.location == NSNotFound) {
rng = [rem rangeOfString:@"."];
if (rng.location == NSNotFound) {
rng = [rem rangeOfString:@" "];
}
}
}
}
if (rng.location+1 + MAX_CHAR_Private > MAXIMUM_LIMIT_Private) {
rng = [rem rangeOfString:@" "];
}
if (rng.location == NSNotFound) {
remaining = [[final substringFromIndex:MAX_CHAR_Private] retain];
}
else{
//NSRange rng = [rem rangeOfString:@" "];
str = [str stringByAppendingString:[rem substringToIndex:rng.location]];
remaining = [[final substringFromIndex:MAX_CHAR_Private + rng.location+1] retain];
}
中文和日文字符不能正常工作。
检查NSLinguisticTagger,它应该可以支持中文:
来自Apple:"NSLinguisticTagger类用于自动分割自然语言文本并标记信息,例如词性。它还可以标记语言、脚本、词干形式等。"
Apple documentation NSLinguisticTagger Class Reference
另见NSHipster NSLinguisticTagger
参见objc。IO第7期
NSString提供了开箱使用的NSStringEnumerationBySentences枚举选项:
[string enumerateSubstringsInRange:NSMakeRange(0, [string length])
options:NSStringEnumerationBySentences
usingBlock:^(NSString *substring, NSRange substringRange, NSRange enclosingRange, BOOL *stop)
{
NSString *sentence = [substring stringByTrimmingCharactersInSet:whiteSpaceSet];
// process sentence
}
];