我有这样的文本-
This is a test text. <span> with bold </span> and with <span> italic </span> and so on and so forth.
现在,我使用这个正则表达式来识别所有的html <[^>]*>
然后我用空字符串替换所有的html,所以结果会像这个
This is a test text. with bold and with italic and so and so forth.
在上面的文本中,我想识别文本,比如"斜体",并在其周围插入特殊标签,然后重建原始文本。因此,结果将是
This is a test text. <span> with bold </span> and with <span> <span class='special'>italic</span> </span> and so on and so forth.
我正在创建获取matcher.start()和matcher.end()的代码,以生成所有html标签的列表,然后我正在考虑基于该列表进行重构。有更好的方法吗?你将如何解决它?
编辑
替换html后搜索文本的原因是,html干扰了我要查找的文本。例如,它可能像这个
This is a test text. <span> with bold </span> and with <span> it</span>al<span>ic </span> and so on and so forth.
EDIT2
这不是一个重复的问题,就像它被建议的那样。想象一个场景,你需要突出显示你在屏幕上看到的html,只需在你选择的文本中添加一个背景色为黄色的简单跨度。现在,假设这个文本是单词italic,但它显示为<span>ita</span>l<span>ic</span>
。我的问题是,你如何找到这个词,然后在它周围加上跨度?
编辑3最终编辑以简化问题陈述。我希望这能说明问题。这是输入-
This is a test text with <span>it<span>al<span>ic</span> and etc.
这是预期输出-
This is a test text with <span class='highlight'><span>it<span>al<span>ic</span></span> and etc.
这将完成您想要的操作,但它不能检测/防止错误的html生成。
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HtmlHighlighter {
private final String inputWithoutTags;
private final List<Tag> tags;
private static class Tag {
private final String text;
private final int startPos;
private Tag(final String text, final int startPos) {
this.text = text;
this.startPos = startPos;
}
}
public HtmlHighlighter(final String input, final String tagRegex) {
final Pattern p = Pattern.compile(tagRegex);
tags = new ArrayList<>();
final Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
int cursor = 0;
int cursorExcludingTags = 0;
while (m.find()) {
cursorExcludingTags += m.start() - cursor;
tags.add(new Tag(input.substring(m.start(), m.end()), cursorExcludingTags));
cursor = m.end();
m.appendReplacement(sb, "");
}
m.appendTail(sb);
inputWithoutTags = sb.toString();
}
public String highlightText(String regexToFind, String openingTag, String closingTag) {
final List<Tag> allTags = getAllTags(regexToFind, openingTag, closingTag);
return combineTags(allTags);
}
private List<Tag> getAllTags(final String regexToFind, final String openingTag, final String closingTag) {
final List<Tag> ret = new ArrayList<>(tags);
final Pattern p = Pattern.compile(regexToFind);
final Matcher m = p.matcher(inputWithoutTags);
while (m.find()) {
addTag(new Tag(openingTag, m.start()), true, ret);
addTag(new Tag(closingTag, m.end()), false, ret);
}
return ret;
}
private void addTag(final Tag tag, final boolean beforeIgnored, final List<Tag> allTags) {
for (int i = 0; i < allTags.size(); i++) {
if (allTags.get(i).startPos >= tag.startPos && beforeIgnored) {
allTags.add(i, tag);
return;
}
if (allTags.get(i).startPos > tag.startPos) {
allTags.add(i, tag);
return;
}
}
allTags.add(allTags.size(), tag);
}
private String combineTags(final List<Tag> allTags) {
final StringBuilder sb = new StringBuilder(inputWithoutTags);
for (int i = allTags.size() - 1; i >= 0; i--) {
final Tag tag = allTags.get(i);
sb.insert(tag.startPos, tag.text);
}
return sb.toString();
}
public static void main(String... args) {
final HtmlHighlighter highlighter = new HtmlHighlighter("This is a test text with <span>it<span>al<span>ic</span> and etc.", "\<.*?\>");
System.out.println(highlighter.highlightText("italic", "<span class='highlight'>", "</span>"));
}
}