我正在尝试从下面的html代码中获取softwareVersion
。
<div class="title">Current Version</div> <div class="content" itemprop="softwareVersion"> 1.1.3 </div> </div> <div class="meta-info"> <div class="title">Requires Android</div> <div class="content" itemprop="operatingSystems"> 2.2 and up </div> </div>
我用了下面的代码
String Html = GetHtml("https://play.google.com/store/apps/details?id="+ AppID)
Pattern pattern = Pattern.compile("softwareVersion">[^<]*</dd");
Matcher matcher = pattern.matcher(Html);
matcher.find();
String GetHtml(String url1)
{
String str = "";
try
{
URL url = new URL(url1);
URLConnection spoof = url.openConnection();
spoof.setRequestProperty("User-Agent",
"Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)");
BufferedReader in = new BufferedReader(new InputStreamReader(
spoof.getInputStream()));
String strLine = "";
// Loop through every line in the source
while ((strLine = in.readLine()) != null)
{
str = str + strLine;
}
}
catch (Exception e)
{
}
return str;
}
但matcher总是返回false。我想我的模式有问题,有人能帮我吗感谢
正如其他人所评论的,我通常会使用html解析器从html中提取内容。然而,在您只从字符串中提取一点信息的情况下,我可以理解为什么要使用regex。
您需要做的是这样的事情——正则表达式的问题是额外的d
。此外,如果你把你关心的比特放在括号里,你可以用.group
来获取它。
import java.util.regex.*;
public class R {
public static void main(String[] args){
String Html = "<div class="title">Current Version</div> <div class="content" itemprop="softwareVersion"> 1.1.3 </div> </div> <div class="meta-info"> <div class="title">Requires Android</div> <div class="content" itemprop="operatingSystems"> 2.2 and up </div> </div>";
Pattern pattern = Pattern.compile("softwareVersion">([^<]*)</d");
Matcher matcher = pattern.matcher(Html);
System.out.println(matcher.find());
System.out.println(matcher.group(1));
}
}