OUTLOOK.HOL(假日(文件结构如下:
[Portugal] 207
All Saints' Day,2021/11/1
All Saints' Day,2022/11/1
Assumption,2021/8/15
Assumption,2022/8/15
Carnival,2021/2/16
Carnival,2022/3/1
[Puerto Rico] 489
Birthday of Eugenio María de Hostos,2021/1/11
Birthday of Eugenio María de Hostos,2022/1/10
Birthday of José de Diego,2021/4/19
Birthday of José de Diego,2022/4/18
Birthday of Don Luis Muñoz Rivera,2021/7/19
Birthday of Don Luis Muñoz Rivera,2022/7/18
[Qatar] 118
...
如何使用PowerShell将文件解析为结构化数据,以将CSV转换为具有标头的文件:
国家;数字假期名称;日期
/Michal
您需要逐个循环遍历文件中的所有行,并使用regex解析不同的"字段"。
$result = switch -Regex -File 'D:Testoutlook.hol' {
'^[([^]]+)]s+(d+)' {
$country = $matches[1]
$number = $matches[2]
}
'^([^,]+),(d{4}/d{1,2}/d{1,2})$' {
# found a data line, output a PSObject
[PsCustomObject]@{
Country = $country
Number = $number
Holiday_name = $matches[1]
Date = $matches[2]
}
}
}
# output on screen
$result | Format-Table -AutoSize
# output to CSV file
$result | Export-Csv -Path 'D:TestOutlookHolidays.csv' -NoTypeInformation -Encoding UTF8
输出(屏幕上(
Country Number Holiday_name Date
------- ------ ------------ ----
Portugal 207 All Saints' Day 2021/11/1
Portugal 207 All Saints' Day 2022/11/1
Portugal 207 Assumption 2021/8/15
Portugal 207 Assumption 2022/8/15
Portugal 207 Carnival 2021/2/16
Portugal 207 Carnival 2022/3/1
Puerto Rico 489 Birthday of Eugenio María de Hostos 2021/1/11
Puerto Rico 489 Birthday of Eugenio María de Hostos 2022/1/10
Puerto Rico 489 Birthday of José de Diego 2021/4/19
Puerto Rico 489 Birthday of José de Diego 2022/4/18
Puerto Rico 489 Birthday of Don Luis Muñoz Rivera 2021/7/19
Puerto Rico 489 Birthday of Don Luis Muñoz Rivera 2022/7/18
Regex 1详细信息:
^ Assert position at the beginning of the string
[ Match the character “[” literally
( Match the regular expression below and capture its match into backreference number 1
[^]] Match any character that is NOT a “A ] character”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
] Match the character “]” literally
s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 2
d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Regex 2详细信息:
^ Assert position at the beginning of the string
( Match the regular expression below and capture its match into backreference number 1
[^,] Match any character that is NOT a “,”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
, Match the character “,” literally
( Match the regular expression below and capture its match into backreference number 2
d Match a single digit 0..9
{4} Exactly 4 times
/ Match the character “/” literally
d Match a single digit 0..9
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
/ Match the character “/” literally
d Match a single digit 0..9
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)