在Powershell中解析文件



我在文件中有以下原始内容。我正在尝试打印所有网址的列表。我写了一些剧本。从文件中获取内容(读取(并在行中使用 ForEach 行 - 但不知道如何从内容中仅过滤 URL。 有什么想法吗?

Line 18942:         "url": "http://harvardpolitics.com/tag/brussels/",
Line 18994:         "url": "http://203.36.101.164/4f64555b4217b47b7c64b3fec19e389b/1502455203/Telstra/Foxtel-Vod/fxmultismvod5256/store2/ON307529/ON307529_hss.ism/QualityLevels(791000)/Fragments(video=9900000000)"
Line 19044:         "url": "https://www.gucci.com/int/en/ca/women/handbags/womens-shoulder-bags-c-women-handbags-shoulder-bags?filter=%3ANewest%3Acolors%3AGold%7Ccb9822",
Line 19096:         "url": "https://bagalio.cz/batohy-10l?cat=3p%3D1urceni%3D2582p%3D1kapsa_ntb_velikost%3D2179p%3D1manufacturer%3D1302p%3D1color%3D84p=1kapsa_ntb_velikost=2192",
Line 19148:         "url": "http://www.csillagjovo.gportal.hu/gindex.php?pg=31670155",
Line 19200:         "url": "http://www.copiersupplystore.com/hp/color-laserjet-4700dn/j7934a-j7934ar",

一种方法可能是子字符串方法,另一个版本可能是一些正则表达式。

$Text = Get-Content D:Testtest.txt
foreach ($Line in $Text) {
# SubString Version
$FirstIndex = $Line.IndexOf('http')
$URLLength = ($Line.LastIndexOf('"') - $FirstIndex)
$Line.Substring($FirstIndex, $URLLength)
# Regex Version 
$Regex = '(http[s]?|[s]?ftp[s]?)(://)([^s,]+)'
([regex]::Matches($Line,$Regex)).Value.TrimEnd('"')([^s,]+)')).Value.TrimEnd('"')
}

尝试一下来获取网址:

$content = Get-Content <file-with-output> # or other way of getting the data
$urls = $content | ForEach-Object { ($_ -replace ".+?(?=http.+)","").Trim('",')}

编辑:添加了$urls以捕获结果。

$Urls = Get-Content file.txt | ForEach-Object { $_.Split('"')[3] }

相关内容

  • 没有找到相关文章

最新更新