Concat 3列中的CSV,如何执行此操作



我正在尝试为CSV文件中的数据创建一个唯一的ID。我在CSV中有3列,想在1列中围绕3列,然后在第四列中输出,这个怎么做?例子:Col1 Col2 Col3 Col4ab cd ef ab_cd_ef

这也应该对所有其他行进行,我喜欢大约90k行。

您可以使用Logstash CSV输入将日期导入到Elasticsearch。以下配置将解析CSV文件并使用Spesific Pipeline发送到Elasticsearch。

input {
  beats {
    port => 5044
    type => "myindex"
  }
}
filter {
  csv {
    separator => ","
    columns => ["Ref","ID","Case_Number","Date","Block","IUCR","Primary_Type","Description","Location_Description","Arrest","Domestic","Beat","District","Ward","Community_Area","FBI_Code","X_Coordinate","Y_Coordinate","Year","Updated_On","Latitude","Longitude","Location"]
    remove_field => ["Location"]
  }
}
output {
  stdout
  {
  }
  elasticsearch
  {
    hosts => ["elasticsearch:9200"]
    index => "myindex"
    template_name => "myindex"
    template => "/etc/logstash/conf.d/template.json"
    template_overwrite => true
    document_type => "mytype"
    pipeline => "my_pipeline"
  }
}

在此示例中,我们可以使用LogStash解析CSV文件,然后我们可以创建一个管道来限制字段。例如,以下示例将concort coordiante字段限制为 location字段。

PUT _ingest/pipeline/my_pipeline
{
  "description" : "Crimes in Chicago Pipeline",
  "processors" : [
    {
      "script": {
        "lang": "painless",
        "inline": "ctx.Location = ctx.Location.X_Coordinate + ctx.Location.Y_Coordinate"
      }
    }
  ]
}

有关完整的示例,您可以检查此回购。(https://github.com/hkulekci/es5-devnot(。

另一方面,LogStash还处理您的cont式转换。为此,您可以使用logstash的mutate过滤器。您应该更新LogStash过滤器配置:

filter {
  csv {
    separator => ","
    columns => ["Ref","ID","Case_Number","Date","Block","IUCR","Primary_Type","Description","Location_Description","Arrest","Domestic","Beat","District","Ward","Community_Area","FBI_Code","X_Coordinate","Y_Coordinate","Year","Updated_On","Latitude","Longitude","Location"]
    remove_field => ["Location"]
  }
  mutate {
    "add_field" => { "your_new_field" => "%{col_1} %{col_2} ..." }
  }
}

最新更新