所以,这是我的代码:
Dim sourceString As String = New System.Net.WebClient().DownloadString("www.example.com")
TextBox2.Text = sourceString
Dim findtext2 As String = "(?<=<div class=""books"">)(.*?)(?=</div>)"
Dim myregex2 As String = TextBox2.Text
Dim doregex2 As MatchCollection = Regex.Matches(myregex2, findtext2)
Dim matches2 As String = ""
For Each match2 As Match In doregex2
matches2 = matches2 + match2.ToString + Environment.NewLine
Next
MsgBox(matches2)
它获取<div class="books">
和</div>
之间的所有值,但有一个大问题。
在"书"之后,有 3 个字符(如 <div class="books672">
)。
在 example.com 上,HTML 是这样的:
<div class="books321">Book1</div>
<div class="books785">Book2</div>
<div class="books547">Book3</div>
<div class="books182">Book4</div>
<div class="books317">Book5</div>
<div class="books970">Book6</div>
我怎样才能得到"书1,书2..."?正则表达式中是否存在随机字符的东西?
通过添加w{1}
,它会将其识别为一个随机字符。在这种情况下,我需要 3 个随机字符,所以解决方案是:
(?<=<div class="booksw{3}">)(.*?)(?=</div>)