<tag k="addr:street" v="St. Croix gate"/>
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
String cb = itr.nextToken();
if(cb.contains("k="addr:street"")){
String roadName = itr.nextToken();
while(!roadName.contains(""/>")) {
roadName = roadName + itr.nextToken();
}
word.set(roadName);
context.write(word, one);
}
}
}
}
如您所见,我正在尝试在 v="圣克罗伊门"/> 中获取字符串,但由于 Tokenizer 为每个空格添加一个新令牌,我只获得输出"门">
这对我有用:
String element = "<tag k="addr:street" v="St. Croix gate"/>";
String searchAtt = "v";
StringTokenizer itr = new StringTokenizer(element);
while (itr.hasMoreTokens()) {
// split by '='
String s = itr.nextToken("=");
// is splited by '=' so the last word is the attribute name
if (s.endsWith(searchAtt)) {
// next token is '=' then comes the value of the attribute
// split it by "
itr.nextToken(""");
// next token will be the content
String content = itr.nextToken();
System.out.println("Searched attribute: " + content);
}
}
请允许我首先说,由于多种原因,在没有 xml 解析器的情况下解析 xml 是一个非常糟糕的主意。
但是,如果您想仅使用字符串操作来提取v
的连续性,这里有一种方法可以做到这一点:
String s = "<tag k="addr:street" v="St. Croix gate"/>";
int vIndex = s.indexOf("v="");
int vendQuotesIndex = s.indexOf(""", vIndex + 3);
System.out.println(s.substring(vIndex + 3, vendQuotesIndex)); // Prints "St. Croix gate"