查找SSIS .dtsx包中的表名



我正在尝试扫描SSIS .dtsx包中的表名。是的,我知道我应该使用[xml]和一个解析SQL语言的工具。这在目前看来是不可能的。PowerShell可以理解[xml],但是SQL解析器通常使用++,而使用ANTLR是一种投资,目前是不可接受的。我愿意听取建议,但我不是要求你推荐工具。

有两(2)个问题。

1) `&.;` does not appear to be recognized as separate from the table name capture item
2) TABLE5 does not appear to be found

是的,我也知道模式名不应该硬编码到源代码中。这使得dba很难或不可能管理数据库。这就是这里的做法。

如何使正则表达式从捕获中省略&.*;并识别dbo ?TABLE5

这是我用来扫描。dtsx文件的代码。

PS C:srcsql> Get-Content .Find-FromJoinSql.ps1
Get-ChildItem -File -Filter '*.dtsx' |
ForEach-Object {
$Filename = $_.Name
Select-String -Pattern '(FROM|JOIN)(s|&.*;)+(S+)(s|&.*;)+' -Path $_ -AllMatches |
ForEach-Object {
if ($_.Matches.Groups.captures[3].value -match 'dbo') {
"$Filename === $($_.Matches.Groups.captures[3].value)"
}
}
}

下面是.dtsx文件中文本类型的一个小示例。

PS C:srcsql> Get-Content .sls_test.dtsx
USE ADATABASE;
SELECT * FROM dbo.TABLE1 WHERE F1 = 3;
SELECT * FROM dbo.TABLE2 T2
FULL OUTER JOIN dbo.TABLEJ TJ
ON T2.KEY = TJ.KEY;
SELECT * FROM dbo.TABLE3 T3
INNER JOIN ADATABASE2.dbo.TABLEK
TK ON
T3.user_id = TK.user_id

SELECT * FROM dbo.TABLE4 T4 FULL OUTER JOIN dbo.TABLE5 T5
ON T4.F1 = T5.F1;
EXIT

在这些数据上运行脚本产生:

PS C:srcsql> .Find-FromJoinSql.ps1
sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
TK
sls_test.dtsx === dbo.TABLE4
PS C:srcsql> $PSVersionTable.PSVersion.ToString()
7.1.5

确实很奇怪,有些实体(
)在这些文件中没有被替换。

稍微更改一下regex模式以捕获dbo。表名如下:

使用获取内容

$regex = [regex] '(?im)(?:FROM|JOIN)(?:s|&[^;]+;)+([^s&]+)(?:s|&[^;]+;)*'
Get-ChildItem -Path D:Test -File -Filter '*.dtsx' |
ForEach-Object {
$match = $regex.Match((Get-Content -Path $_.FullName -Raw))
while ($match.Success) {
"$($_.Name) === $($match.Groups[1].Value)"
$match = $match.NextMatch()
} 
}

使用选择string

关于为什么Select-String -AllMatches跳过你的表5。
来自文档:"当Select-String在一行文本中找到多个匹配项时,它仍然只为该行发出一个MatchInfo对象,但是该对象的Matches属性包含所有匹配项。">

这意味着您需要另一个循环从每个$MatchInfo对象中获取所有的$Matches,以便在您的输出中获取它们:

$pattern = '(?:FROM|JOIN)(?:s|&[^;]+;)+([^s&]+)(?:s|&[^;]+;)*'
Get-ChildItem -Path 'D:Test' -File -Filter '*.dtsx' |
ForEach-Object {
$Filename = $_.Name
Select-String -Pattern $pattern -Path $_.FullName -AllMatches |
ForEach-Object {
# loop again, because each $MatchInfo object may contain multiple
# $Matches objects if more matches were found in the same line
foreach ($match in $_.Matches) {
if ($match.Groups[1].value -match 'dbo') {
"$Filename === $($match.Groups[1].value)"
}
}
}
}

输出:

sls_test.dtsx === dbo.TABLE1
sls_test.dtsx === dbo.TABLE2
sls_test.dtsx === dbo.TABLEJ
sls_test.dtsx === dbo.TABLE3
sls_test.dtsx === ADATABASE2.dbo.TABLEK
sls_test.dtsx === dbo.TABLE4
sls_test.dtsx === dbo.TABLE5

Regex细节:

(?im)              Use case-insensitive matching and have '^' and '$' match at linebreaks
(?:                Match the regular expression below
Match either the regular expression below (attempting the next alternative only if this one fails)
FROM         Match the characters “FROM” literally
|               Or match regular expression number 2 below (the entire group fails if this one fails to match)
JOIN         Match the characters “JOIN” literally
)                 
(?:                Match the regular expression below
|               Match either the regular expression below (attempting the next alternative only if this one fails)
s           Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
|               Or match regular expression number 2 below (the entire group fails if this one fails to match)
&            Match the character “&” literally
[^;]         Match any character that is NOT a “;”
+         Between one and unlimited times, as many times as possible, giving back as needed (greedy)
;            Match the character “;” literally
)+                 Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  Match the regular expression below and capture its match into backreference number 1
[^s&]          Match a single character NOT present in the list below
A whitespace character (spaces, tabs, line breaks, etc.)
The character “&”
+            Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                 
(?:                Match the regular expression below
|               Match either the regular expression below (attempting the next alternative only if this one fails)
s           Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
|               Or match regular expression number 2 below (the entire group fails if this one fails to match)
&            Match the character “&” literally
[^;]         Match any character that is NOT a “;”
+         Between one and unlimited times, as many times as possible, giving back as needed (greedy)
;            Match the character “;” literally
)*                 Between zero and unlimited times, as many times as possible, giving back as needed (greedy)

相关内容

  • 没有找到相关文章

最新更新