我想写一个脚本的一部分,用RegEx替换匹配行。
输入是这样的:
Name, type, ADDRESSES
“Aaa”, “bbb”, “19 S 149TH $NEWPORT NEWS, WA 96332”
“Aaa”, “bbb”, “851 16TH AVE #365$SALISH, WA 98402-4410”
“Aaa”, “bbb”, “2445 E BROADWAY #204$YELM WA 98653”
这是我尝试过的
$regex = 'd{5}([ -]d{4})?'
##get the data
$people = Get-Content 'C:test.csv'
## let's convert the data first
foreach ($p in $people) {
if ($p -match $regex) { $p | out-file -append C:test.csv }
}
这是我期望的结果
Name, type, ADDRESSES
“Aaa”, “bbb”, “96332”
“Aaa”, “bbb”, “98402-4410”
“Aaa”, “bbb”, “98653”
结果如下:
Name, type, ADDRESSES
“Aaa”, “bbb”, “19 S 149TH $NEWPORT NEWS, WA 96332”
“Aaa”, “bbb”, “851 16TH AVE #365$SALISH, WA 98402-4410”
“Aaa”, “bbb”, “2445 E BROADWAY #204$YELM WA 98653”
这个。
$text = @'
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb, 851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb, 2445 E BROADWAY #204$YELM WA 98653
'@ -split 'r?n' | Select-Object -Skip 1
$result = $text.ForEach({
$name, $type, $addresses = $_.Split(',',3)
$addresses = [regex]::Matches($addresses, '[d-]+(?=$)').Value
[pscustomobject]@{
Name = $name
Type = $type
Addresses = $addresses
}
})
Name Type Addresses
---- ---- ---------
Aaa bbb 96332
Aaa bbb 98402-4410
Aaa bbb 98653
继续注释,由于csv数据格式不佳,可能最好使用不同的正则表达式和-replace
来修改数据。
$file = 'c:temptest.csv'
# add test data to a file
@'
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb, 851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb, 2445 E BROADWAY #204$YELM WA 98653
'@ | Set-Content $file
$regex = ',[ w$#]+,?[ w]+(d{5}(?:-d+)?)$'
# This line will read in the file, skipping the header line.
# Then it will perform a replace using the regex above
# substituting whatever is matched with the first matching group (d{5}(?:-d+).
# Finally the lines are appended to the end of the file
(Get-Content $file | Select-Object -Skip 1) -replace $regex, ', $1' | Add-Content -Path $file
# Get-Content to check our file
Get-Content $file
输出Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb, 851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb, 2445 E BROADWAY #204$YELM WA 98653
Aaa, bbb, 96332
Aaa, bbb, 98402-4410
Aaa, bbb, 98653
这对我有用。只要把5位数以内的东西都换成5位数。如果开头是5位数,它仍然有效。https://javascript.info/regexp-greedy-and-lazy
import-csv file.csv |
% { $_.addresses = $_.addresses -replace '.*(d{5})', '$1'; $_ }
Name type ADDRESSES
---- ---- ---------
Aaa bbb 96332
Aaa bbb 98402-4410
Aaa bbb 98653
我的看法是:
$csv = Import-Csv -Path 'theOriginal.csv' | ForEach-Object {
$_ | Select-Object *, @{Name = 'ADDRESSES'; Expression = { $_.ADDRESSES -replace '.*s([-d]+)$', '$1'}} -ExcludeProperty ADDRESSES
}
# output on screen
$csv
# write to file
$csv | Export-Csv -Path 'theUpdated.csv' -NoTypeInformation
结果:
Name type ADDRESSES
---- ---- ---------
Aaa bbb 96332
Aaa bbb 98402-4410
Aaa bbb 98653
Regex细节:
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
( Match the regular expression below and capture its match into backreference number 1
[-d] Match a single character present in the list below
The character “-”
A single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)