我正在尝试使用Azure Data Factory V2读取最新的BLOB文件(CSV(。该文件名还包含日期(yyyy-mm-dd mm:ss-abcd.csv(。我需要阅读最新文件中的数据并加载到表存储中。您能帮助我如何使用ADF
你好,faiz rahman,谢谢你的问题。您选择的日期格式具有词典分类匹配时间顺序排序的有用特征。这意味着,一旦您有斑点列表,就需要提取日期和比较。
如果您有很大的斑点列表,则可能是不切实际的。在这种情况下,每当您写一个新斑点时,请在某个地方跟踪它,例如" maxblobname.txt",并让管道阅读该文件以获取最新文件的名称。
这是一些示例代码,用于比较斑点名称的日期部分。为了适应您的目的,您需要使用getMetadata活动来获取BLOB名称,而某些字符串函数仅提取名称的日期部分进行比较。
{
"name": "pipeline9",
"properties": {
"activities": [
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "init array",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"items": {
"value": "@variables('list')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "If Condition1",
"type": "IfCondition",
"typeProperties": {
"expression": {
"value": "@greater(item(),variables('max'))",
"type": "Expression"
},
"ifTrueActivities": [
{
"name": "write new max",
"type": "SetVariable",
"typeProperties": {
"variableName": "max",
"value": {
"value": "@item()",
"type": "Expression"
}
}
}
]
}
}
]
}
},
{
"name": "init array",
"type": "SetVariable",
"typeProperties": {
"variableName": "list",
"value": {
"value": "@split(pipeline().parameters.input,',')",
"type": "Expression"
}
}
}
],
"parameters": {
"input": {
"type": "string",
"defaultValue": "'2019-07-25','2018-06-13','2019'-06-24','2019-08-08','2019-06-23'"
}
},
"variables": {
"max": {
"type": "String",
"defaultValue": "0001-01-01"
},
"list": {
"type": "Array"
}
}
}
}