Powershell:排序/删除CSV文件中的重复项



首先,我是Powershell的新手,我要感谢本网站的所有参与者通过在不同的四个单词中提供答案来帮助我!因为这个网站,我在短时间内完成了很多!

这就是问题所在,我会尽力解释。我有一个CSV文件来创建学生帐户。我们的学生管理系统会在每次学生注册、更改或退出某个项目时生成一份记录。如果该学生"试用"了几个不同的程序,他们将在CSV文件中有多个记录。因此,我的目标是按userID(userID从未更改)和CurrentStatusDate(即创建记录的时间)对CSV文件进行排序。使用此命令:

Import-CSV "C:students.csv" | sort UserID,CurrentStatusDate

CSV记录示例:

"UserID","AccountStatus","PersonID","PIN","FirstName","LastName","IDEXPIRY","Term","Role","Course","SectionName","locationDescription","Location","CurrentStatusDate"
"aboggs","Add","xxxxxxx","xxxxxxx","Ashley","Baggs","5/11/2013","xxxxxx","Student","Accounting Technology","xxxxxx","xxxxxx","xxxxxx","9/12/2011"
"aboutilier","Add","xxxxxxx","xxxxxxx","Amelia","Boutilier","5/3/2012","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/15/2011"
"abowtle","Delete","xxxxxxx","xxxxxxx","Aleisha","Bowtle","7/31/2013","xxxxxx","Student","Business Administration","xxxxxx","xxxxxx","xxxxxx","2/1/2011"
"abowtle","Add","xxxxxxx","xxxxxxx","Aleisha","Bowtle","7/31/2012","xxxxxx","Student","General Studies","xxxxxx","xxxxxx","xxxxxx","9/9/2011"
"abradley","Delete","xxxxxxx","xxxxxxx","Anna","Bradley","10/25/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/17/2011"
"abridges","Delete","xxxxxxx","xxxxxxx","Ashley","Bridges","10/5/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/15/2011"
"abrown10165","Add","xxxxxxx","xxxxxxx","Adam","Brown","10/28/2011","xxxxxx","Student","Advanced Firefighting STCW VI/3","xxxxxx","xxxxxx","xxxxxx","10/24/2011"
"abrown10165","Add","xxxxxxx","xxxxxxx","Adam","Brown","12/16/2011","xxxxxx","Student","Simulated Electronic Navigation Level 1, Part B","xxxxxx","xxxxxx","xxxxxx","11/10/2011"
"abrown8081","Add","xxxxxxx","xxxxxxx","Alex","Brown","5/25/2013","xxxxxx","Student","Culinary Arts","xxxxxx","xxxxxx","xxxxxx","9/6/2011"
"abrown8950","Delete","xxxxxxx","xxxxxxx","Ashley","Brown","9/13/2012","xxxxxx","Student","Medical Support Services","xxxxxx","xxxxxx","xxxxxx","9/14/2011"
"acameron2637","Delete","xxxxxxx","xxxxxxx","Anne","Cameron","10/14/2011","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","10/14/2011"
"acameron4368","Add","xxxxxxx","xxxxxxx","Amanda","Cameron","4/20/2013","xxxxxx","Student","Applied Degree in Culinary Operations","xxxxxx","xxxxxx","xxxxxx","10/12/2011"
"acampbell10266","Add","xxxxxxx","xxxxxxx","Amanda","Campbell","5/4/2012","xxxxxx","Student","Adult Education","xxxxxx","xxxxxx","xxxxxx","11/7/2011"
"acampbell6499","Delete","xxxxxxx","xxxxxxx","Aaron","Campbell","10/31/2012","xxxxxx","Student","Retail Business Management","xxxxxx","xxxxxx","xxxxxx","11/1/2011"
"acampbell6499","Add","xxxxxxx","xxxxxxx","Aaron","Campbell","12/13/2011","xxxxxx","Student","Complete the Accounting Cycle - Part II","xxxxxx","xxxxxx","xxxxxx","9/26/2011"

这应该将所有具有相同记录的userID分组,然后按创建日期对其进行排序。然后我想删除重复项并保留最后创建的记录。我熟悉-Unique,但它不适用于上面的命令,因为它只会删除具有重复userID和CurrentStatusDate的记录。

如果我"谷歌搜索"了两天,然后撞了我的头。。。开始认为没有"简单"的答案,但我的编程能力很弱。。。只是想朝着正确的方向"轻推"一下。

谢谢!

Chris

正如Andy所说,这有点困难,因为我们没有CSV格式的示例。然而,我认为像下面这样的东西是你想要的:

Import-CSV "C:students.csv" | Group-Object userid | foreach-object { $_.group | sort-object currentstatusdate | select -last 1}

正如您所描述的,我们按ID分组,然后按CurrentStatusDate排序,然后选择最近的记录。我不确定CurrentStatusDate是如何格式化的,所以我不知道直接排序对象是否足够好。

怎么样:

  • 把田地连在一起(http://www.johndcook.com/PowerShellCookbook.html#a19)在分离器上
  • 使用唯一
  • 分割

未测试:

 $new_csv = @()
 Import-CSV "C:students.csv" | sort UserID |
  foreach {
    if ($temp -eq $null){$temp = $_}
    if ($_.UserID -ne $temp.UserID){
       $new_csv += $temp
       $temp = $_
       }
elseif ([datetime]$_.CurrentStatusDate -gt [datetime]$temp.CurrentStatusDate){
   $temp = $_
   }
 } 
 $new_csv += $temp
 export-csv $new_csv c:somedirnew_csv.csv -notype

当第一条记录通过时,$temp将被设置为该记录。当每个新记录通过时,如果是同一个学生ID,则会根据$temp中的记录检查时间戳。如果它是新的,它会被放入$temp中。当它看到userID更改时,它将$temp(现在应该是最后一个用户的最新记录)写入$new_csv。然后,它将$temp设置为当前记录,并为下一个userID重新开始。由于它不会看到最后一个帐户的用户ID更改,因此您必须在循环结束后选择该帐户,然后导出csv。

不确定时间戳的确切格式,但我假设它会正确解析为[datetime]。但如果它来自.csv,它将是一个字符串,而且它似乎不太可能在字符串排序中按日期时间顺序排序,所以我甚至没有费心。

最新更新