如何使用sed/awk解析文件的内容

我的输入文件的内容采用以下格式，其中每列由一个"空格"分隔

string1<space>string2<space>string3<space>YYYY-mm-dd<space>hh:mm:ss.SSS<space>string4<space>10:1234567890<space>0e:Apple 1.2.3.4<space><space>string5<space>HEX

"0e:Apple 1.2.3.4"后有2个"空格"，因为此字段/列中没有第14位数字。整个"0e:Apple 1.2.3.4空间"被视为该列的单个值。

在第7列中，10:表示以下字符串中的字符数。

在第8列中，0e:表示十六进制值14。因此，HEX值表示后面字符串中的字符数。

类似：

"0e:Apple 1.2.3.4 "--> this is the actual value in 8th column without " "  
(I've mentioned " " to show that the 14th digit is empty)  
It's counted as  
0e:A p p l e   1 . 2 .   3  . 4    
| | | | | | | | | |   |  | | |  
1 2 3 4 5 6 7 8 9 10 11 12 1314

让我们将输入文件的第一行视为：

string1 string2 string3 yyyy-mm-dd 23:50:45.999 string4 10:1234567890 0e:Apple 1.2.3.4  string5 001e

其中：

string1是第1列中的值
string2是第2列中的值
string3是第3列中的值
yyyy-mm-dd在第四
23:50:50.999排名第五
string3在第六
10:1234567890在第7位//末尾没有空格，因为它有10位数字
0e:Apple 1.2.3.4在第8个//末尾空格
string5
001e在第10位

预期输出：

string1,string2,string3,yyyy-mm dd,23:50:50.999,string3,1234567890,Apple_1.2.3.4,string5,30

要求：

消除第7列和第8列的计数(10:&0e:)
空间b/wApple和1.2.3.4应替换为"_">
最后一列中的十六进制值应转换为十进制值
将列之间的"空格"替换为"，">
我只在第10列使用了十六进制值。如果它在几列中呢？有什么方法可以将它转换为特定的列吗

我试过使用这个：

$ cat input.txt |sed 's/[a-z0-9].*://g'

其输出为：

string1,string2,string3,yyyy-mm-dd,45.999,string4,1234567890,Apple,1.2.3.4,,string5,001e

这将对示例输入执行您想要的操作：

awk -F "[ ]" '{sub(/.*:/, "", $7) sub(/.*:/, "", $8); printf "%s,%s,%s,%s,%s,%s,%s,%s_%s,%s,%s,%dn", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, "0x"$12}' input.txt

零件说明：

awkprintf允许您指定输出格式，因此您可以手动指定要用,定界的字段和要用_定界的字段。

-F "[ ]"强制字段分隔符为单个空间，以便它知道两个单个空间之间有一个空字段。默认行为是允许多个空格作为一个分隔符，根据问题，这不是您想要的。

sub函数允许您执行正则表达式替换，在本例中删除字段7和8中的..:前缀。

对于字段12，我们告诉printf以数字形式输出(%d)，并将前缀为0x的字符串作为输入，以便它将其解释为十六进制。

注意：如果您并不总是希望输出为$8_$9，那么您实际上需要解析十六进制前缀和计数字符，以确定字段的结束位置。如果是这样的话，我个人更喜欢用其他东西来写整个东西，比如Python。

相关内容

最新更新

热门标签：