按标识符组织csv,并在R中打开/处理子集



我有以下CSV文件:

files = c("C:\Users\sh\/2018/April 17 2018/user_22226.csv", 
"C:\Users\sh\/2018/April 17 2018/user_22227.csv", 
"C:\Users\sh\/2018/April 17 2018/user_22228.csv", 
"C:\Users\sh\/2018/April 17 2018/user_22232.csv", 
"C:\Users\sh\/2018/April 2 2018/user_21785.csv", 
"C:\Users\sh\/2018/April 2 2018/user_21815.csv", 
"C:\Users\sh\/2018/April 2 2018/user_21821.csv", 
"C:\Users\sh\/2018/April 2 2018/user_21822.csv", 
"C:\Users\sh\/2018/April 2 2018/user_22226.csv", 
"C:\Users\sh\/2018/April 2 2018/user_22227.csv", 
"C:\Users\sh\/2018/April 2 2018/user_22228.csv", 
"C:\Users\sh\/2018/April 2 2018/user_22230.csv", 
"C:\Users\sh\/2018/April 2 2018/user_22232.csv", 
"C:\Users\sh\/2018/April 23 2018/user_22921.csv", 
"C:\Users\sh\/2018/April 9 2018/user_22226.csv", 
"C:\Users\sh\/2018/April 9 2018/user_22227.csv", 
"C:\Users\sh\/2018/April 9 2018/user_22228.csv", 
"C:\Users\sh\/2018/April 9 2018/user_22230.csv", 
"C:\Users\sh\/2018/April 9 2018/user_22232.csv", 
"C:\Users\sh\/2018/August 13 2018/user_29607.csv")

我可以通过以下结束标识符来组织它们:

files_sorted = files[order(gsub('.*_(\d{5})[.].*','\1',files))]

给我:

[1] "C:\Users\sh\/2018/April 2 2018/user_21785.csv"  
[2] "C:\Users\sh\/2018/April 2 2018/user_21815.csv"  
[3] "C:\Users\sh\/2018/April 2 2018/user_21821.csv"  
[4] "C:\Users\sh\/2018/April 2 2018/user_21822.csv"  
[5] "C:\Users\sh\/2018/April 17 2018/user_22226.csv" 
[6] "C:\Users\sh\/2018/April 2 2018/user_22226.csv"  
[7] "C:\Users\sh\/2018/April 9 2018/user_22226.csv"  
[8] "C:\Users\sh\/2018/April 17 2018/user_22227.csv" 
[9] "C:\Users\sh\/2018/April 2 2018/user_22227.csv"  
[10] "C:\Users\sh\/2018/April 9 2018/user_22227.csv"  
[11] "C:\Users\sh\/2018/April 17 2018/user_22228.csv" 
[12] "C:\Users\sh\/2018/April 2 2018/user_22228.csv"  
[13] "C:\Users\sh\/2018/April 9 2018/user_22228.csv"  
[14] "C:\Users\sh\/2018/April 2 2018/user_22230.csv"  
[15] "C:\Users\sh\/2018/April 9 2018/user_22230.csv"  
[16] "C:\Users\sh\/2018/April 17 2018/user_22232.csv" 
[17] "C:\Users\sh\/2018/April 2 2018/user_22232.csv"  
[18] "C:\Users\sh\/2018/April 9 2018/user_22232.csv"  
[19] "C:\Users\sh\/2018/April 23 2018/user_22921.csv" 
[20] "C:\Users\sh\/2018/August 13 2018/user_29607.csv"

我的目标是现在处理具有相同标识ID的CSV文件,例如,那些以"22226.CSV"结尾的文件

我的理想输出是有一个单独的DF/表或具有相同ID的CSV文件的列表。然后,我将在我编写的预处理数据的函数中运行该单独的DF/列表。

我尝试过使用group_by((和unique((,但返回了NA。

我们可以使用split创建文件的list。分组基于通过移除字符直到_而导出的子串

lst1 <- split(files, sub('.*_', '', files))

[[提取元素

lst1[["22232.csv"]]
#[1] "C:\Users\sh\/2018/April 17 2018/user_22232.csv"
#[2] "C:\Users\sh\/2018/April 2 2018/user_22232.csv" 
#[3] "C:\Users\sh\/2018/April 9 2018/user_22232.csv" 

最新更新