PowerShell脚本,用于按重叠的开始日期和结束日期对记录进行分组



我正在处理一个CSV文件,该文件有开始和结束日期,要求在日期重叠时按日期对记录进行分组。例如,在下表中,Bill_Number 177835 Start_Date和End_Date与178682179504178990 Start_Date and End_Date重叠,因此所有记录都应分组在一起,依此类推。

Bill_Number,Start_Date,End_Date
177835,4/14/20 3:00 AM,4/14/20 7:00 AM
178682,4/14/20 3:00 AM,4/14/20 7:00 AM
179504,4/14/20 3:29 AM,4/14/20 6:29 AM
178662,4/14/20 4:30 AM,4/14/20 5:30 AM
178990,4/14/20 6:00 AM,4/14/20 10:00 AM
178995,4/15/20 6:00 AM,4/15/20 10:00 AM
178998,4/15/20 6:00 AM,4/15/20 10:00 AM

我尝试过不同的组合,比如"Group by"one_answers"for loop",但都没能产生结果。对于CSV的上述示例,预期结果为;

Group1: 177835,178682,179504, 178990
Group2: 177835,178682,179504, 178662
Group3: 178995, 178998

目前我手头有以下代码。如有任何帮助,我们将不胜感激,并提前表示感谢。

$array = @(‘ab’,’bc’,’cd’,’df’)

for ($y = 0; $y -lt $array.count) {
for ($x = 0; $x -lt $array.count) {
if ($array[$y]-ne $array[$x]){
Write-Host $array[$y],$array[$x]
}
$x++ 
}
$y++
}

您可以执行以下操作。可能有一个更清洁的解决方案,但这可能需要很多时间。

$csv = Import-Csv file.csv
# Creates all inclusive groups where times overlap
$csvGroups = foreach ($row in $csv) {
$start = [datetime]$row.Start_Date
$end = [datetime]$row.End_Date
,($csv | where { ($start -ge [datetime]$_.Start_Date -and $start -le [datetime]$_.End_Date) -or ($end -ge [datetime]$_.Start_Date -and $end -le [datetime]$_.End_Date) })
}
# Removes duplicates from $csvGroups
$groups = $csvGroups | Group {$_.Bill_number -join ','} |
Foreach-Object { ,$_.Group[0] }
# Compares current group against all groups except itself
$output = for ($i = 0; $i -lt $groups.count; $i++) {
$unique = $true # indicates if the group's bill_numbers are in another group
$group = $groups[$i]
$list = $groups -as [system.collections.arraylist]
$list.RemoveAt($i) # Removes self
foreach ($innerGroup in $list) {
# If current group's bill_numbers are in another group, skip to next group
if ((compare $group.Bill_Number $innergroup.Bill_Number).SideIndicator -notcontains '<=') {
$unique = $false
break
} 
}
if ($unique) {
,$group
}
}
$groupCounter = 1
# Output formatting
$output | Foreach-Object { "Group{0}:{1}" -f $groupCounter++,($_.Bill_Number -join ",")}

解释:

我添加了一些评论,以便对正在发生的事情有一个想法

,$variable语法使用一元运算符,。它将输出转换为数组。通常,PowerShell会将数组展开为单独的项。展开在这里成为一个问题,因为我们希望组保持为组(数组)。否则,会有很多重复的账单号码,我们会在各组之间失去联系。

CCD_ 3用于CCD_。这样我们就可以访问RemoveAt()方法。典型的array是固定大小的,不能以这种方式进行操作。使用array可以有效地做到这一点,但代码不同。您必须选择要跳过的项周围的索引范围,或者使用其他条件语句创建一个新数组,以排除目标项。arraylist对我来说更容易(个人偏好)。

所以这是一种非常肮脏的方法。我认为有很多方法可以确定一个特定的比较是否有重叠,一个记录与另一个记录。但是,您可能需要一份账单日期范围冲突的账单编号列表。使用Select Object语句/表达式中的函数调用,我为对象添加了一个碰撞属性。

该功能很冗长,可能会有所改进,但要点是,对于每个记录,如果开始或结束日期在其他记录范围内,它将与所有其他记录进行比较,并在其冲突属性中报告账单编号。

这当然只是演示代码,我相信它可以为您的目的做得更好,但可能是您的起点。

显然要更改CSV文件的路径。

Function Get-Collisions
{
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
ForEach($Object in $CompareObjects)
{
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
If(
( $ReferenceObject.Start_Date -ge $Objact.StartDate -and  $ReferenceObject.Start_Date -le  $Objact.End_Date ) -or
( $ReferenceObject.End_Date -ge $Object.Start_Date -and $ReferenceObject.End_Date -le $Object.End_Date      ) -or
( $ReferenceObject.Start_Date -le $Object.Start_Date -and $ReferenceObject.End_Date -ge $Object.Start_Date  )
)
{
$Object.Bill_Number
}
}
}
} # End Get-Collisions
$Objects = Import-Csv 'C:tempDateOverlap.CSV' 
$Objects |
ForEach-Object{
$_.Start_Date = [DateTime]$_.Start_Date
$_.End_Date   = [DateTime]$_.End_Date
}
$Objects = $Objects | 
Select-object *,@{Name = 'Collisions'; Expression = { Get-Collisions -ReferenceObject $_ -CompareObjects $Objects }}
$Objects | Format-Table -AutoSize

让我知道进展如何。谢谢

@Shan,我看到了你的评论,所以我想用一些额外的代码和讨论来回应。我可能太过火了,但你表达了学习的愿望,这样你就可以在未来维护这些代码片段。所以,我花了很多时间。

我可能会提到一些@AdminOfThings也很有效。这不是批评,而是合作。他的例子既聪明又充满活力,既能完成任务,又能在达到预期产出的过程中使用正确的工具。

我最初回避了分组问题,因为我觉得给分组命名/编号没有任何意义。例如:"第1组"表示其所有成员的账单期都有重叠,但没有说明重叠的内容或时间。也许我匆匆忙忙地看完了……我可能读得太多了,甚至可能让我自己的偏见妨碍了我。无论如何,我选择从每个法案编号的角度建立一种关系,这就是我的第一个答案。

从那时起,由于你的评论,我努力扩展和记录我给出的第一个例子。修订后的代码将是下面的示例1。我已经对它进行了大量评论,大多数评论也将适用于原始示例。扩展的分组功能带来了一些差异,但注释应该反映这些情况。

注意:您还会看到,我不再称它们为"碰撞",而是将它们称为"重叠"。

示例1:

Function Get-Overlaps
{
<#
.SYNOPSIS
Given an object (reference object) compare to a collection of other objects of the same 
type.  Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.DESCRIPTION
Given an object (reference object) compare to a collection of other objects of the same 
type.  Return an array of billing numbers for which the billing period overlaps that of
the reference object.
.PARAMETER ReferenceObject
This is the current object you wish to compare to all other objects.
.PARAMETER
The collection of objects you want to compare with the reference object.
.NOTES
> The date time casting could probably have been done by further preparing 
the objects in the calling code.  However, givin this is for a 
StackOverflow question I can polish that later.
#>
Param(
[Parameter(Mandatory = $true)]
[Object]$ReferenceObject,
[Parameter( Mandatory = $true )]
[Object[]]$CompareObjects
) # End Parameter Block
[Collections.ArrayList]$Return = @()
$R_StartDate = [DateTime]$ReferenceObject.Start_Date
$R_EndDate   = [DateTime]$ReferenceObject.End_Date
ForEach($Object in $CompareObjects)
{
$O_StartDate = [DateTime]$Object.Start_Date
$O_EndDate   = [DateTime]$Object.End_Date
# The first if statement skips the reference object's bill_number
If( !($ReferenceObject.Bill_Number -eq $Object.Bill_Number) )
{
# This logic can use some explaining.  So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
#    object.  This cases is handle by the first 2 conditions.
# 2) If the reference objects timespan covers the entire timespan of the comparison object.
#    Meaning the start date is before and the end date is after, fitting the entire 
#    comparison timespan is within the bounds of the reference timespan.  I elected to use 
#    the 3rd condition below to detect that case because once the start date is earlier I
#    only have to care if the end date is greater than the start date.  It's a little more 
#    inclusive and partially covered by the previous conditions, but whatever, you gotta
#    pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
#       in my first example (later corrected) and I think @AdminOfThings also overlooked it.
If(
( $R_StartDate -ge $O_StartDate -and $R_StartDate -le $O_EndDate   ) -or
( $R_EndDate   -ge $O_StartDate -and $R_EndDate   -le $O_EndDate   ) -or
( $R_StartDate -le $O_StartDate -and $R_EndDate   -ge $O_StartDate )
)
{
[Void]$Return.Add( $Object.Bill_Number )
}
}
}
Return $Return
} # End Get-Overlaps
$Objects = 
Import-Csv 'C:tempDateOverlap.CSV' |
ForEach-Object{
# Consider overlap as a relationship from the perspective of a given Object.
$Overlaps = [Collections.ArrayList]@(Get-overlaps -ReferenceObject $_ -CompareObjects $Objects)
# Knowing the overlaps I can infer the group, by  adding the group's bill_number to its group property.    
If( $Overlaps )
{   # Don't calculate a group unless you actually have overlaps:
$Group = $Overlaps.Clone()
[Void]$Group.Add( $_.Bill_Number ) # Can you do in the above line, but for readability I separated it.
}
Else { $Group = $null } # Ensure's not reusing group from a previous iteration of the loop.
# Create a new PSCustomObject with the data so far.
[PSCustomObject][Ordered]@{
Bill_Number = $_.Bill_Number
Start_Date  = [DateTime]$_.Start_Date
End_Date    = [DateTime]$_.End_Date
Overlaps    = $Overlaps
Group       = $Group | Sort-Object # Sorting will make it a lot easier to get unique lists later.
}
}
# The reason I recreated the objects from the CSV file instead of using Select-Object as I had 
# previously is that I simply couldn't get Select-Object to maintain type ArrayList that was being
# returned from the function.  I know that's a documented problem or circumstance some where.
# Now I'll add one more property called Group_ID a comma delimited string that we can later use
# to echo the groups according to your original request.
$Objects =
$Objects | 
Select-Object *,@{Name = 'Group_ID'; Expression = { $_.Group -join ', ' } }
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}
# Below is a traditional for loop that does the same thing.  I did that first before deciding the ForEach 
# was cleaner.  Leaving it commented below, because you're on a learning-quest, so just more demo code...
# For($i = 0; $i -lt $UniqueGroups.Count; ++$i)
# {
#     $Num = $i + 1
#     $UniqueGroup = $UniqueGroups[$i]
#     "Group $Num : $UniqueGroup"
# }

示例2:

$Objects = 
Import-Csv 'C:tempDateOverlap.CSV' |
Select-Object Bill_Number,
@{ Name = 'Start_Date'; Expression = { [DateTime]$_.Start_Date } },
@{ Name = 'End_Date';   Expression = { [DateTime]$_.End_Date } }
# The above select statement converts the Start_Date & End_Date properties to [DateTime] objects
# While you had asked to pack everything into the nested loops, that would have resulted in 
# unnecessary recasting of object types to ensure proper comparison.  Often this is a matter of
# preference, but in this case I think it's better.  I did have it  working well without the 
# above select, but the code is more readable / concise with it. So even if you treat  the 
# Select-Object command as a blackbox the rest of the code should be easier to understand.
#
# Of course, and if you couldn't tell from my samples Select-Object is incredibly useful. I 
# recommend taking the time to learn it thoroughly.  The MS documentation can be found here:
# https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-object?view=powershell-5.1
:Outer ForEach( $ReferenceObject in $Objects )
{    
# In other revisions I had assigned these values to some shorter variable names.
# I took that out. Again since you're learning I wanted the  all the dot referencing
# to be on full display.
$ReferenceObject.Start_Date = $ReferenceObject.Start_Date
$ReferenceObject.End_Date   = $ReferenceObject.End_Date
[Collections.ArrayList]$TempArrList = @() # Reset this on each iteration of the outer loop.
:Inner ForEach( $ComparisonObject in $Objects )
{
If( $ComparisonObject.Bill_Number -eq $ReferenceObject.Bill_Number )
{   # Skip the current reference object in the $Objects collection! This prevents the duplication of
# the current Bill's number within it's group, helping to ensure unique-ification.
#
# By now you should have seen across all revision including AdminOfThings demo, that there was some 
# need skip the current item when searching for overlaps.  And, that there are a number of ways to 
# accomplish that.  In this case I simply go back to the top of the loop when the current record
# is encountered, effectively skipping it.
Continue Inner
}   
# The below logic needs some explaining.  So far as I could tell there were 2 cases to look for:
# 1) Either or both the start and end dates fell inside the the timespan of the comparison
#    object.  This cases is handle by the first 2 conditions.
# 2) If the reference object's timespan covers the entire timespan of the comparison object.
#    Meaning the start date is before and the end date is after, fitting the entire 
#    comparison timespan is within the bounds of the reference timespan.  I elected to use 
#    the 3rd condition below to detect that case because once the start date is earlier I
#    only have to care if the end date is greater than the other start date.  It's a little 
#    more inclusive and partially covered by the previous conditions, but whatever, you gotta
#    pick something...
#
# Note: This was a deceptively difficult thing to comprehend, I missed that last condition
#       in my first example (later corrected) and I think @AdminOfThings also overlooked it.
If(
( $ReferenceObject.Start_Date -ge $ComparisonObject.Start_Date -and $ReferenceObject.Start_Date -le $ComparisonObject.End_Date   ) -or
( $ReferenceObject.End_Date   -ge $ComparisonObject.Start_Date -and $ReferenceObject.End_Date   -le $ComparisonObject.End_Date   ) -or
( $ReferenceObject.Start_Date -le $ComparisonObject.Start_Date -and $ReferenceObject.End_Date   -ge $ComparisonObject.Start_Date )
)
{
[Void]$TempArrList.Add( $ComparisonObject.Bill_Number )
}
}
# Now Add the properties!
$ReferenceObject | Add-Member -Name Overlaps -MemberType NoteProperty -Value $TempArrList 
If( $ReferenceObject.Overlaps )
{           
[Void]$TempArrList.Add($ReferenceObject.Bill_Number)
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value ( $TempArrList | Sort-Object )
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value ( $ReferenceObject.Group -join ', ' )
# Below a script property also works, but I think the above is easier to follow:
# $ReferenceObject | Add-Member -Name Group_ID -MemberType ScriptProperty -Value { $this.Group -join ', ' }
}
Else
{
$ReferenceObject | Add-Member -Name Group -MemberType NoteProperty -Value $null
$ReferenceObject | Add-Member -Name Group_ID -MemberType NoteProperty -Value $null
}    
}
# This output is just for the sake of showing the new objects:
$Objects | Format-Table -AutoSize -Wrap
# Now create an array of unique Group_ID strings, this is possible of the sorts and joins done earlier.
#
# It's important to point out I chose to sort because I saw the clever solution that AdminOfThings
# used.  There's a need to display only groups that have unique memberships, not necessarily unique
# ordering of the members.  He identified these by doing some additional loops and using the Compare
# -Object cmdlet.  Again, I must say that was very clever, and Compare-Object is another tool very much 
# worth getting to know.  However, the code didn't seem like it cared which of the various orderings it
# ultimately output.  Therefore I could conclude the order wasn't really important, and it's fine if the
# groups are sorted.  With the objects sorted it's much easier to derive the truely unique lists with the
# simple Select-Object command below.
$UniqueGroups = $Objects.Group_ID | Select-Object -Unique
# Finally Loop through the UniqueGroups
$Num = 1
ForEach($UniqueGroup in $UniqueGroups)
{
"Group $Num : $UniqueGroup"
++$Num # Increment the $Num, using convienient unary operator, so next group is echoed properly.
}

其他讨论:

希望这些例子能有所帮助。我想再提几点:

  1. 使用ArrayLists([System.Collections.ArrayList])而不是本机数组。这样做的典型原因是能够快速添加和删除元素。如果你在互联网上搜索,你会发现数百篇文章解释为什么它更快。它非常常见,你经常会发现经验丰富的PowerShell用户本能地实现它。但主要原因是速度和灵活性,可以轻松添加和删除元素
  2. 您会注意到,我在很大程度上依赖于将新属性附加到对象的能力。有几种方法可以做到这一点,Select-Object,创建您自己的对象,在上面的示例2中,我使用了Get-Member。我使用Get-Member的主要原因是在使用Select-Object时无法使ArrayList类型保持不变
  3. 关于循环。这是特定于您对嵌套循环的需求。我的第一个答案仍然有循环,只是有些是管道隐含的,而另一些则存储在辅助函数中。后者实际上也是一种偏好;为了可读性,有时将一些代码放在主代码体的视图之外是有帮助的。也就是说,所有相同的概念从一开始就存在。您应该熟悉管道内衬功能所带来的隐含回路

我想我没有什么好说的了。我真的希望这对我有帮助,这对我来说当然很有趣。如果你有问题或反馈,请告诉我。谢谢

最新更新