如何从一个巨大的txt文件中提取和排序特定的元素到csv?



我正在尝试创建一个powershell脚本来提取所有包含"ERROR"并将其数据库路径条目转换成一个巨大的日志TXT文件,并将其分类为CSV文件。错误示例:

2022-04-17 00:00:00.9999|ERROR|texte:texte|texte \DATABASEPathPathPathPathItem[Item Name] (ID:########-####-####-###-############ Rank:#). description of the error. 

然后我想恢复日期和错误元素的完整路径(DATABASE path path path path path Item[Item Name])以及错误的描述并删除重复项。我也不知道是否可以直接将日期,路径和消息在csv文件中的三列分开。

日志示例(截图):

2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|################# Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms].
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \DATABASEPath1Path2Path3Path4Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input A name'
Failed to resolve required input 'input B name'
No output is defined.
2022-04-17 00:00:00.9999|WARN|ANTimeClassManagerHelper:Configuration|Ignoring partial cache signup errors for \DATABASEPath1Path2Path3Path4Item[1. Item Name]  (ID:########-####-####-###-############ Rank:#). Failed to signup some input(s) for receiving updates. 
 Net Volume in Tank: Point not found 'Point Name'.
2022-04-17 00:00:00.9999|ERROR|ANCalculationEngine:Configuration|Failed to initialize \DATABASEPath1Path2Path3Path4Item[1. Item Name] (ID:########-####-####-###-############ Rank:#). Failed to resolve required input 'input name'
There is no time rule configured for this analysis.
No output is defined.
2022-04-17 00:00:00.9999|WARN|###########:#########|############[#####] Ignoring attempt to remove non-existent calculation '\DATABASEPath1Path2Path3Path4Item[1. Item Name] (ID:########-####-####-###-############ Rank:#)'
2022-04-17 00:00:00.9999|ERROR|ANDataCache:Configuration|DataCache:################ Error when adding input attributes to data cache (Failed:8/Total:12) [99.9999999999999 ms]. 

期望结果的示例(根据上面的示例)

(我只是想检索错误与路径("DATABASE path path path path path Item[项目名称]"),而不是警告日志或错误没有路径)

我开始写这个:

$File = "logs.txt"
$Pattern = '([ERROR[^\]+(?<DatabasePath>[^\]]+])(?<ErrorText>[^rn]+=)'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv" 

或者只检索路径:

$File = "logs.txt"
$Pattern = '(?<=\DATABASE\).+?(?=])'
$Content = Get-Content $File
[regex]::Matches($Content, $Pattern).Value | Set-Content "output.csv"

但是在第二种情况下"DATABASE"没有出现在输出文件中

提前感谢您的回答。

正则表达式可能会得到改进,但就目前而言,这可能会帮助您获得您正在寻找的内容。我鼓励您检查这个regex101链接,以测试当前的正则表达式(并可能改进它),如果有什么不工作。

$re = [regex]"(?m)(?<date>^[d-]+s[d:.]+)|ERROR|.*?(?<path>\[\ws[.]]+).*?.(?<description>[ws'r?n.]+$)"
& {
    $content = Get-Content $File -Raw
    foreach($match in $re.Matches($content)) {
        $date, $path, $description = $match.Groups['date','path','description']
        [pscustomobject]@{
            Date = $date.Value -as [datetime]
            Path = $path.Value.Trim()
            Description = ($description.Value -replace 'r?n', ' ').Trim()
        }
    }
} | Export-Csv "output.csv" -NoTypeInformation

使用问题中提供的示例数据得到的输出如下所示,可以导出为适当的CSV:

PS /> $output | Format-Table
Date                  Path                                                  Description
----                  ----                                                  -----------
4/17/2022 12:00:00 AM \DATABASEPath1Path2Path3Path4Item[1. Item Name] Failed to resolve required input 'input A name'. F… 
4/17/2022 12:00:00 AM \DATABASEPath1Path2Path3Path4Item[1. Item Name] Failed to resolve required input 'input name'. The…
PS /> $output[0].Description
Failed to resolve required input 'input A name' Failed to resolve required input 'input B name' No output is defined.

如果你想保持当前文件中的日期格式,可以删除-as [datetime]

相关内容

  • 没有找到相关文章

最新更新