使用Powershell从给定的URL获取网站元数据,例如标题,描述



如何使用Powershell从给定的URL中检索网站元数据,例如标题,描述,关键字?

例如:给定以下网址

输入: www.amazon.com

输出

title: "Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more",
description: "Online shopping from the earth's biggest selection of books, magazines, music, DVDs, videos, electronics, computers, software, apparel & accessories, shoes, jewelry, tools & hardware, housewares, furniture, sporting goods, beauty & personal care, broadband & dsl, gourmet food & just about anything else.",
keyword: "Amazon, Amazon.com, Books, Online Shopping, Book Store, Magazine, Subscription, Music, CDs, DVDs, Videos, Electronics, Video Games, Computers, Cell Phones, Toys, Games, Apparel, Accessories, Shoes, Jewelry, Watches, Office Products, Sports & Outdoors, Sporting Goods, Baby Products, Health, Personal Care, Beauty, Home, Garden, Bed & Bath, Furniture, Tools, Hardware, Vacuums, Outdoor Living, Automotive Parts, Pet Supplies, Broadband, DSL"

输入:www.youtube.com

输出

title: YouTube
description: Enjoy the videos and music you love, upload original content and share it all with friends, family and the world on YouTube.
keywords: video, sharing, camera phone, video phone, free, upload

注意 这仅适用于 PowerShell 5.1 及更低版本。

正如@StephenP所说

不能保证您访问的网站将拥有数据您希望以任何实际方式公开。您可以轻松检索带有 Invoke-WebRequest 和 Invoke-RestMethod 的网页,但随后您将需要解析返回的标头/数据

此外,不能保证该网站不会尝试阻止您正在做的事情。

下面是使用 .NET HTML DOM 的解析示例。 @tim-aitken 举了一个使用 RegEx 查找相同信息的示例,但正如他所提到的,这将取决于正确使用正则表达式。 另一方面,有时HTML DOM也无法解析页面。

# First retrieve the website
$result = Invoke-webrequest -Uri http://www.youtube.com/ -Method Get
$resultTable = @{}
# Get the title
$resultTable.title = $result.ParsedHtml.title
# Get the HTML Tag
$HtmlTag = $result.ParsedHtml.childNodes | Where-Object {$_.nodename -eq 'HTML'} 
# Get the HEAD Tag
$HeadTag = $HtmlTag.childNodes | Where-Object {$_.nodename -eq 'HEAD'}
# Get the Meta Tags
$MetaTags = $HeadTag.childNodes| Where-Object {$_.nodename -eq 'META'}
# You can view these using $metaTags | select outerhtml | fl 
# Get the value on content from the meta tag having the attribute with the name keywords
$resultTable.keywords = $metaTags  | Where-Object {$_.name -eq 'keywords'} | Select-Object -ExpandProperty content
# Do the same for description
$resultTable.description = $metaTags  | Where-Object {$_.name -eq 'description'} | Select-Object -ExpandProperty content
# Return the table we have built as an object
Write-Output New-Object -TypeName PSCustomObject -Property $resultTable

您可以使用 Invoke-WebRequest,然后使用正则表达式匹配所需的字符串:

$response = Invoke-WebRequest -Uri www.amazon.com -UseBasicParsing
PS C:> $response.Content -match "<title>(?<title>.*)</title>" | out-null
$matches['title']
Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more
PS C:> $response.Content -match "<meta name=`"description`" content=`"(?<description>.*)`">" | out-null
$matches['description']
Online shopping from the earth's biggest selection of books, magazines, music, DVDs, videos, electronics, computers, software, apparel & accessories, shoes, jewelry, tools & hardware, housewares, furniture, sporting goods, beauty & personal care, broadband & dsl, gourmet food & just about anything else.
PS C:> $response.Content -match "<meta name=`"keywords`" content=`"(?<keywords>.*)`">" | out-null
$matches['keywords']
Amazon, Amazon.com, Books, Online Shopping, Book Store, Magazine, Subscription, Music, CDs, DVDs, Videos, Electronics, Video Games, Computers, Cell Phones, Toys, Games, Apparel, Accessories, Shoes, Jewelry, Watches, Office Products, Sports & Outdoors, Sporting Goods, Baby Products, Health, Personal Care, Beauty, Home, Garden, Bed & Bath, Furniture, Tools, Hardware, Vacuums, Outdoor Living, Automotive Parts, Pet Supplies, Broadband, DSL

这将取决于所有网站对其元字段使用相同的模式。例如,上述内容不适用于Stack Overflow的网站,因为它们以"/>"关闭其元字段。

不能保证您访问的网站将以任何实际方式公开您想要的数据。您可以使用Invoke-WebRequest和Invoke-RestMethod轻松检索网页,但随后您需要解析返回的标头/数据。

相关内容

  • 没有找到相关文章