U-SQL -- 从文件夹中读取最新修改的文件



我们如何从U-SQL中的两个不同文件夹中读取最新修改的文件? 注意:文件夹中将有许多文件。但我们只想要最新的文件(单个文件(

第一个文件夹:E:\mysystem\dailyfiles\Daily 第二个文件夹:E:\mysystem\weeklyfiles\weekly

声明@file1字符串 = "dailyfiles/daily/LATESTMODIFIEDfilename.csv"; 声明@file2字符串 = "weeklyfiles/weekly/LATESTMODIFIED文件名.csv";

声明@out字符串 = "/输出/结果.csv";

@data = 提取 col1 字符串, Col2 字符串, Col3弦, col4 字符串 @file1,@file2起 USING Extractors.csv((;

所以我想你想从两个不同的文件夹中获取一个包含许多文件(我想文件具有相同格式(的文件(最新修改的文件(。您应该使用文件函数和虚拟列作为动态路径

@allData =
EXTRACT col1 string,
col2 string,
col3 string,
DateModified = FILE.MODIFIED(),
folder1 string, //virtualcolumn
folder2 string //virtualcolumn
FROM "mysystem/{folder1}/{folder2}/{*}.csv"
USING Extractors.Csv();

OUTPUT
(
SELECT col1,
col2,
col3
FROM @allData AS a
SEMIJOIN
(
SELECT MAX(DateModified) AS MaxFileDate
FROM @allData
WHERE (folder1 == "dailyfiles" AND folder2 == "daily") OR (folder1 == "weeklyfiles" AND folder2 == "weekly")
GROUP BY DateModified
ORDER BY DateModified DESC
FETCH 1 ROWS
) AS b
ON a.DateModified == b.MaxFileDate
WHERE (folder1 == "dailyfiles" AND folder2 == "daily") OR (folder1 == "weeklyfiles" AND folder2 == "weekly")
)

最新更新