Powershell Regex内容和回写



我想写一个脚本的一部分,用RegEx替换匹配行。

输入是这样的:

Name, type, ADDRESSES
“Aaa”, “bbb”, “19 S 149TH $NEWPORT NEWS, WA 96332”
“Aaa”, “bbb”,  “851 16TH AVE #365$SALISH, WA 98402-4410”
“Aaa”, “bbb”,  “2445 E BROADWAY #204$YELM WA 98653”

这是我尝试过的

$regex = 'd{5}([ -]d{4})?'
##get the data
$people = Get-Content 'C:test.csv'
## let's convert the data first
foreach ($p in $people) {
if ($p -match $regex) { $p | out-file -append C:test.csv }
}

这是我期望的结果

Name, type, ADDRESSES
“Aaa”, “bbb”,  “96332”
“Aaa”, “bbb”,  “98402-4410”
“Aaa”, “bbb”,  “98653”

结果如下:


Name, type, ADDRESSES
“Aaa”, “bbb”, “19 S 149TH $NEWPORT NEWS, WA 96332”
“Aaa”, “bbb”,  “851 16TH AVE #365$SALISH, WA 98402-4410”
“Aaa”, “bbb”,  “2445 E BROADWAY #204$YELM WA 98653”

这个

$text = @'
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb,  851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb,  2445 E BROADWAY #204$YELM WA 98653
'@ -split 'r?n' | Select-Object -Skip 1
$result = $text.ForEach({
$name, $type, $addresses = $_.Split(',',3)
$addresses = [regex]::Matches($addresses, '[d-]+(?=$)').Value
[pscustomobject]@{
Name = $name
Type = $type
Addresses = $addresses
}
})
Name Type Addresses 
---- ---- --------- 
Aaa   bbb 96332     
Aaa   bbb 98402-4410
Aaa   bbb 98653     

继续注释,由于csv数据格式不佳,可能最好使用不同的正则表达式和-replace来修改数据。


$file = 'c:temptest.csv'
# add test data to a file
@'
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb,  851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb,  2445 E BROADWAY #204$YELM WA 98653
'@ | Set-Content $file
$regex = ',[ w$#]+,?[ w]+(d{5}(?:-d+)?)$'
# This line will read in the file, skipping the header line.
# Then it will perform a replace using the regex above 
# substituting whatever is matched with the first matching group (d{5}(?:-d+).
# Finally the lines are appended to the end of the file
(Get-Content $file | Select-Object -Skip 1) -replace $regex, ', $1' | Add-Content -Path $file
# Get-Content to check our file
Get-Content $file

输出
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb,  851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb,  2445 E BROADWAY #204$YELM WA 98653
Aaa, bbb, 96332
Aaa, bbb, 98402-4410
Aaa, bbb, 98653

这对我有用。只要把5位数以内的东西都换成5位数。如果开头是5位数,它仍然有效。https://javascript.info/regexp-greedy-and-lazy

import-csv file.csv | 
% { $_.addresses = $_.addresses -replace '.*(d{5})', '$1'; $_ }
Name type ADDRESSES
---- ---- ---------
Aaa  bbb  96332
Aaa  bbb  98402-4410
Aaa  bbb  98653

我的看法是:

$csv = Import-Csv -Path 'theOriginal.csv' | ForEach-Object {
$_ | Select-Object *, @{Name = 'ADDRESSES'; Expression = { $_.ADDRESSES -replace '.*s([-d]+)$', '$1'}} -ExcludeProperty ADDRESSES
}
# output on screen
$csv
# write to file
$csv | Export-Csv -Path 'theUpdated.csv' -NoTypeInformation

结果:

Name type ADDRESSES 
---- ---- --------- 
Aaa  bbb  96332     
Aaa  bbb  98402-4410
Aaa  bbb  98653

Regex细节:

.             Match any single character that is not a line break character
*          Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
s            Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
(             Match the regular expression below and capture its match into backreference number 1
[-d]      Match a single character present in the list below
The character “-”
A single digit 0..9
+       Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)            
$             Assert position at the end of the string (or before the line break at the end of the string, if any)

相关内容

  • 没有找到相关文章

最新更新