任何人都可以提供sed或awk方法来删除csv文件的最后两列吗



编辑大家好,谢谢您的回复。我的问题不是如何解决我在这里提供的sample.csv,情况是我有100多个类似的文件,我希望我能快速有效地解决它们,我用python解决了这个问题,但我更喜欢sed,因为我知道sed可以直接修改文件。我不想运行类似的命令数百次

我每天都会生成文件,大约4个月,每个文件包含9列,现在我想从所有这些文件中删除最后两列

我计划使用sed来删除-i的最后两列,我的目的是可以直接修改所有文件,而不需要写入新文件。不幸的是,我找不到这样做的方法,然后我写了我的python脚本来完成所有的工作。这是我的代码:

    def remove_last_two_columns(input_dir, output_dir, file_name):
    writer = open(output_dir + file_name, "w")
    with open(input_dir + file_name, "r") as inputs:
        for line in inputs:
            parts = line.strip().split(",")
            outline = ""
            for index, part in enumerate(parts):
                if index < 7:
                    outline += "," + part
            writer.write(outline[1:] + "n")
    writer.close()
remove_last_two_columns("/home/haifzhan/input/", "/home/haifzhan/output/", "sample.csv") 

输入:

C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N,2014-07-01 00:37:22,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N,2014-07-01 00:25:55,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13,N,N
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24,N,N
C1,C2,2014-06-30 18:41:27,N,N,N,N,2014-07-01 00:52:22,N

my output:
C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N

有人能提供一种sed/awk方法来实现这一点吗?我想在未来的工作中使用sed/awk。提前谢谢。

Awk解决方案

awk 'BEGIN{FS=OFS=","}NF=(NF-2)' file

此语句删除最后两列,其中sample.csv是输入文件的名称。

sed s/,[^,]*,[^,]*$//g sample.csv

我的结果是:

C1,C2,2014-06-30 13:11:46,2014-07-01 00:19:12,43,N,N
C1,C2,2014-06-30 13:37:40,N,N,N,N
C1,C2,2014-06-30 15:35:40,2014-07-01 00:23:14,36,N,N
C1,C2,2014-06-30 16:54:07,2014-07-01 00:08:38,35,N,N
C1,C2,2014-06-30 17:13:33,N,N,N,N
C1,C2,2014-06-30 17:23:05,N,N,2014-07-01 00:26:03,13
C1,C2,2014-06-30 17:49:59,2014-07-01 02:46:20,11,N,N
C1,C2,2014-06-30 18:16:51,2014-07-01 06:15:25,20,N,N
C1,C2,2014-06-30 18:18:07,N,N,2014-07-01 00:02:22,24
C1,C2,2014-06-30 18:41:27,N,N,N,N

在您的示例中,您删除了最后3列,您可以通过将原始语句修改为以下内容来做到这一点:

sed s/,[^,]*,[^,]*,[^,]*$//g sample.csv

cut无疑是实现这一点最简单的工具:

cat input | cut -d, -f8,9 --complement

请注意,剪切的osx版本已经过时,所以最好获取最新版本:

brew install coreutils

相关内容

最新更新