Powershell:抓取http并返回特定行作为变量



我对PowerShell比较陌生,已经达到了知识的极限。我正在编写一个脚本,从内部网页抓取备份数据,然后从抓取中提取信息进行操作,稍后在excel中显示。

$Yesterday = [DateTime]::Now.AddDays(-1)
$datestr = $Yesterday.ToString("dd-MMM-yyyy")
$WebClient = New-Object System.Net.WebClient
$Results = $WebClient.DownloadString("http://fakeurl")

这导致大量的输出包含http代码以及我感兴趣的数据,但都聚集在一起。然后我这样做:

[StringSplitOptions]$option = "None"
[string[]]$separator = "</td>"
$SPL = $Results.Split($separator, $option)

这将数据分割成更可读的格式。这是这部分的片段从$SPL.

<tr><td headers="HOST_NAME" class="t13dataalt">server01
<td headers="AUTOSYS_JOB" class="t13dataalt">nbu.os.wn.135b.server01
<td headers="START_TIME" class="t13dataalt">01-Aug-2011 21:23
<td headers="END_TIME" class="t13dataalt">01-Aug-2011 21:51
<td headers="BACKUP_TYPE" class="t13dataalt">differential
<td headers="SCHEDULE" class="t13dataalt">daily
<td align="right"  headers="SIZE_MB" class="t13dataalt">       2,091.18
<td headers="IMAGES" class="t13dataalt">1
<td headers="EXIT_STATUS" class="t13dataalt">0
</tr><tr><td headers="HOST_NAME" class="t13data">server02
<td headers="AUTOSYS_JOB" class="t13data">nbu.os.wn.135b.server02
<td headers="START_TIME" class="t13data">31-Jul-2011 21:22
<td headers="END_TIME" class="t13data">31-Jul-2011 21:41
<td headers="BACKUP_TYPE" class="t13data">differential
<td headers="SCHEDULE" class="t13data">daily
<td align="right"  headers="SIZE_MB" class="t13data">       2,496.31
<td headers="IMAGES" class="t13data">1
<td headers="EXIT_STATUS" class="t13data">0

我需要从中提取开始时间和结束时间,以计算经过的时间,并返回最近备份的EXIT_STATUS。我试过以下方法,但我觉得我可能找错对象了:

$Position = select-string -inputobject $SPL -pattern $datestr

美元地位。匹配结果为:

PS C:Scripts> $Position.matches
Groups   : {03-Aug-2011}
Success  : True
Captures : {03-Aug-2011}
Index    : 12056
Length   : 11
Value    : 03-Aug-2011

我的理论是做一个子字符串使用索引添加到长度提取日期后的时间值,但我不知道如何做到这一点。我也认为这有点原始。必须有一个更简单的方法返回行信息,我需要从该变量没有计数到现场,然后撕下其余的行吗?


好的,因为我不确定如何在页面底部添加这样的部分,所以我将在这里添加。

这是我目前的脚本,它运行没有任何错误,但不返回任何结果。

# Get yesterdays date and convert it to the required search format
    $Yesterday = [DateTime]::Now.AddDays(-1)
    $datestr = $Yesterday.ToString("dd-MMM-yyyy")
# Scrape the webpage
    $url = "http://fake-url"
    $WebClient = New-Object System.Net.WebClient
    $Results = $WebClient.DownloadString($url)
# Determine if the previous day is listed in the backups
    $IsDateThere = $Results.Contains($datestr)
        If ($IsDateThere){
            # split the data into rows
            [StringSplitOptions]$option = "None"
            [string[]]$separator = "</td>"
            $SPL = $Results.Split($separator, $option)
            #strip the data into a hash table
            $SPL | 
                Foreach-Object {
                    where {$_ -match 'headers="(.*)" class.*>(.*)'} |
                        ForEach-Object { 
                        @{
                                $matches[1] = ($matches[2]).trim() 
                            }
                        }
                }           
        }
        Else{
            Write-Host "Yesterday's date not found"
        }

任何想法?我不确定下一步该怎么做才能获得最近备份的开始时间和结束时间以及退出代码作为变量。

我会这样处理它

$html = @"
<tr><td headers="HOST_NAME" class="t13dataalt">server01
<td headers="AUTOSYS_JOB" class="t13dataalt">nbu.os.wn.135b.server01
<td headers="START_TIME" class="t13dataalt">01-Aug-2011 21:23
<td headers="END_TIME" class="t13dataalt">01-Aug-2011 21:51
<td headers="BACKUP_TYPE" class="t13dataalt">differential
<td headers="SCHEDULE" class="t13dataalt">daily
<td align="right"  headers="SIZE_MB" class="t13dataalt">       2,091.18
<td headers="IMAGES" class="t13dataalt">1
<td headers="EXIT_STATUS" class="t13dataalt">0
</tr><tr><td headers="HOST_NAME" class="t13data">server02
<td headers="AUTOSYS_JOB" class="t13data">nbu.os.wn.135b.server02
<td headers="START_TIME" class="t13data">31-Jul-2011 21:22
<td headers="END_TIME" class="t13data">31-Jul-2011 21:41
<td headers="BACKUP_TYPE" class="t13data">differential
<td headers="SCHEDULE" class="t13data">daily
<td align="right"  headers="SIZE_MB" class="t13data">       2,496.31
<td headers="IMAGES" class="t13data">1
<td headers="EXIT_STATUS" class="t13data">0
"@
$html -split "`r`n" | where {$_ -match 'start_time|end_time'} |
    ForEach {
        $pos = $_.IndexOf("headers")
        $begin = $pos+9
        $end = $_.IndexOf('"', $begin)
        new-object PSObject -Property @{
            Key   = $_.SubString($begin, $end-$begin)
            Value = Get-Date( $_.SubString( $_.IndexOf(">")+1 ) )
        }
    }
<标题> 结果
Key        Value               
---        -----               
START_TIME 8/1/2011 9:23:00 PM 
END_TIME   8/1/2011 9:51:00 PM 
START_TIME 7/31/2011 9:22:00 PM
END_TIME   7/31/2011 9:41:00 PM

这不是一个原始的答案——只是Doug使用regex捕获所有数据的另一个版本:

$html -split "`n" | where {$_ -match 'headers="(.*)" class.*>(.*)'} |
    % { 
        @{
                $matches[1] = ($matches[2]).trim() 
            }
    }

编辑:使用问题中的代码:

$Yesterday = [DateTime]::Now.AddDays(-1)
$datestr = $Yesterday.ToString("dd-MMM-yyyy")
$WebClient = New-Object System.Net.WebClient
$Results = $WebClient.DownloadString("http://fakeurl")
[StringSplitOptions]$option = "None"
[string[]]$separator = "</td>"
$SPL = $Results.Split($separator, $option)
$SPL | 
    Foreach-Object {
        where {$_ -match 'headers="(.*)" class.*>(.*)'} |
            % { 
            @{
                    $matches[1] = ($matches[2]).trim() 
                }
            }
    }

EDIT 2:

    $SPL | 
        Foreach-Object {
            where {$_ -match 'headers="(.*)" class.*>(.*)'} |
                % { 
if (($matches[2]).trim() -eq $datestr ) { "$($matches[1]) is yesterday's back up" }
                }
        }

最新更新