我正在尝试为CSV文件中的数据创建一个唯一的ID。我在CSV中有3列,想在1列中围绕3列,然后在第四列中输出,这个怎么做?例子:Col1 Col2 Col3 Col4ab cd ef ab_cd_ef
这也应该对所有其他行进行,我喜欢大约90k行。
您可以使用Logstash CSV输入将日期导入到Elasticsearch。以下配置将解析CSV文件并使用Spesific Pipeline发送到Elasticsearch。
input {
beats {
port => 5044
type => "myindex"
}
}
filter {
csv {
separator => ","
columns => ["Ref","ID","Case_Number","Date","Block","IUCR","Primary_Type","Description","Location_Description","Arrest","Domestic","Beat","District","Ward","Community_Area","FBI_Code","X_Coordinate","Y_Coordinate","Year","Updated_On","Latitude","Longitude","Location"]
remove_field => ["Location"]
}
}
output {
stdout
{
}
elasticsearch
{
hosts => ["elasticsearch:9200"]
index => "myindex"
template_name => "myindex"
template => "/etc/logstash/conf.d/template.json"
template_overwrite => true
document_type => "mytype"
pipeline => "my_pipeline"
}
}
在此示例中,我们可以使用LogStash解析CSV文件,然后我们可以创建一个管道来限制字段。例如,以下示例将concort coordiante字段限制为 location
字段。
PUT _ingest/pipeline/my_pipeline
{
"description" : "Crimes in Chicago Pipeline",
"processors" : [
{
"script": {
"lang": "painless",
"inline": "ctx.Location = ctx.Location.X_Coordinate + ctx.Location.Y_Coordinate"
}
}
]
}
有关完整的示例,您可以检查此回购。(https://github.com/hkulekci/es5-devnot(。
另一方面,LogStash还处理您的cont式转换。为此,您可以使用logstash的mutate
过滤器。您应该更新LogStash过滤器配置:
filter {
csv {
separator => ","
columns => ["Ref","ID","Case_Number","Date","Block","IUCR","Primary_Type","Description","Location_Description","Arrest","Domestic","Beat","District","Ward","Community_Area","FBI_Code","X_Coordinate","Y_Coordinate","Year","Updated_On","Latitude","Longitude","Location"]
remove_field => ["Location"]
}
mutate {
"add_field" => { "your_new_field" => "%{col_1} %{col_2} ..." }
}
}