我有两个CSV文件,其中包含许多n列。我必须将这两个 csv 文件与一个 CSV 文件合并,该文件在两个输入文件中都有一个唯一的列。
我彻底浏览了所有的博客和网站。所有这些都将导致使用自定义 .NET Activity.So 我只是通过这个网站
但我仍然无法弄清楚 C# 编码中的哪一部分。任何人都可以共享有关如何在 Azure 数据工厂中使用自定义 .NET 活动合并这两个 CSV 文件的代码吗?
如何使用 U-SQL 在列上联接这两个制表符分隔文件的示例Zip_Code。 此示例假定这两个文件都保存在 Azure 数据湖存储 (ADLS) 中。 此脚本可以轻松合并到数据工厂管道中:
// Get raw input from file A
@inputA =
EXTRACT
Date_received string,
Product string,
Sub_product string,
Issue string,
Sub_issue string,
Consumer_complaint_narrative string,
Company_public_response string,
Company string,
State string,
ZIP_Code string,
Tags string,
Consumer_consent_provided string,
Submitted_via string,
Date_sent_to_company string,
Company_response_to_consumer string,
Timely_response string,
Consumer_disputed string,
Complaint_ID string
FROM "/input/input48A.txt"
USING Extractors.Tsv();
// Get raw input from file B
@inputB =
EXTRACT Provider_ID string,
Hospital_Name string,
Address string,
City string,
State string,
ZIP_Code string,
County_Name string,
Phone_Number string,
Hospital_Type string,
Hospital_Ownership string,
Emergency_Services string,
Meets_criteria_for_meaningful_use_of_EHRs string,
Hospital_overall_rating string,
Hospital_overall_rating_footnote string,
Mortality_national_comparison string,
Mortality_national_comparison_footnote string,
Safety_of_care_national_comparison string,
Safety_of_care_national_comparison_footnote string,
Readmission_national_comparison string,
Readmission_national_comparison_footnote string,
Patient_experience_national_comparison string,
Patient_experience_national_comparison_footnote string,
Effectiveness_of_care_national_comparison string,
Effectiveness_of_care_national_comparison_footnote string,
Timeliness_of_care_national_comparison string,
Timeliness_of_care_national_comparison_footnote string,
Efficient_use_of_medical_imaging_national_comparison string,
Efficient_use_of_medical_imaging_national_comparison_footnote string,
Location string
FROM "/input/input48B.txt"
USING Extractors.Tsv();
// Join the two files on the Zip_Code column
@output =
SELECT b.Provider_ID,
b.Hospital_Name,
b.Address,
b.City,
b.State,
b.ZIP_Code,
a.Complaint_ID
FROM @inputA AS a
INNER JOIN
@inputB AS b
ON a.ZIP_Code == b.ZIP_Code
WHERE a.ZIP_Code == "36033";
// Output the file
OUTPUT @output
TO "/output/output.txt"
USING Outputters.Tsv(quoting : false);
这也可以转换为具有文件名和邮政编码参数的 U-SQL 存储过程。
当然,可能有一些方法可以实现这一目标,每种方法都有自己的优点和缺点。 例如,对于具有 .net 背景的人来说,.net 自定义活动可能会感觉更舒服,但你需要一些计算来运行它。 将文件导入 Azure SQL 数据库对于具有 SQL/数据库背景和订阅中的 Azure SQL 数据库的人来说是一个不错的选择。