从列中提取信息,并将其打印在单独对齐的列上



我正试图提取年份并将其打印在一个单独的新列上,并保持新列对齐。

这是输入文件:

0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest (1975)
0000000124  733447   8.7  Inception (2010)
0000000233  411397   8.7  Goodfellas (1990)
0000000123  519051   8.7  Star Wars (1977)
0000000124  146841   8.7  Shichinin no samurai (1954)
0000000123  618195   8.7  Forrest Gump (1994)
0000000123  680520   8.7  The Matrix (1999)
0000000123  604519   8.7  The Lord of the Rings: The Two Towers (2002)
0000000233  309137   8.7  Cidade de Deus (2002)
0000000232  548307   8.6  Se7en (1995)
0000000232  459707   8.6  The Silence of the Lambs (1991)

我如何才能在这样一个单独的专栏中获得年份?

0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back                  1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring               2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                                 1975
0000000124  733447   8.7  Inception                                                       2010
0000000233  411397   8.7  Goodfellas                                                      1990
0000000123  519051   8.7  Star Wars                                                       1977
0000000124  146841   8.7  Shichinin no samurai                                            1954
0000000123  618195   8.7  Forrest Gump                                                    1994
0000000123  680520   8.7  The Matrix                                                      1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                           2002
0000000233  309137   8.7  Cidade de Deus                                                  2002
0000000232  548307   8.6  Se7en                                                           1995
0000000232  459707   8.6  The Silence of the Lambs                                        1991
sed 's/)s*$//' file|column -s '(' -t

将处理给定的输入并为您提供预期的输出。

此处测试:

kent$  echo "0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980) 
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest (1975)
0000000124  733447   8.7  Inception (2010)
0000000233  411397   8.7  Goodfellas (1990)
0000000123  519051   8.7  Star Wars (1977)
0000000124  146841   8.7  Shichinin no samurai (1954)
0000000123  618195   8.7  Forrest Gump (1994)
0000000123  680520   8.7  The Matrix (1999)
0000000123  604519   8.7  The Lord of the Rings: The Two Towers (2002)
0000000233  309137   8.7  Cidade de Deus (2002)
0000000232  548307   8.6  Se7en (1995)
0000000232  459707   8.6  The Silence of the Lambs (1991)"|sed 's/)s*$//'|column -s '(' -t
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back      1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring   2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                     1975
0000000124  733447   8.7  Inception                                           2010
0000000233  411397   8.7  Goodfellas                                          1990
0000000123  519051   8.7  Star Wars                                           1977
0000000124  146841   8.7  Shichinin no samurai                                1954
0000000123  618195   8.7  Forrest Gump                                        1994
0000000123  680520   8.7  The Matrix                                          1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers               2002
0000000233  309137   8.7  Cidade de Deus                                      2002
0000000232  548307   8.6  Se7en                                               1995
0000000232  459707   8.6  The Silence of the Lambs                            1991

这里有一个awk的解决方案,它可以处理您的样本数据:

$ awk -F( '{printf("%-77s %dn", $1, $2)}' movies.txt

根据您的喜好调整格式(此处,年份位于78列。您可以在格式说明符中更改它,例如,如果希望它从第100列开始,请使用%-99s

这里有一个快速破解方法:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF}1' file | column -s'{' -t 
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back      1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring   2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest                     1975
0000000124 733447 8.7 Inception                                           2010
0000000233 411397 8.7 Goodfellas                                          1990
0000000123 519051 8.7 Star Wars                                           1977
0000000124 146841 8.7 Shichinin no samurai                                1954
0000000123 618195 8.7 Forrest Gump                                        1994
0000000123 680520 8.7 The Matrix                                          1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers               2002
0000000233 309137 8.7 Cidade de Deus                                      2002
0000000232 548307 8.6 Se7en                                               1995
0000000232 459707 8.6 The Silence of the Lambs                            1991

awk用于删除最后一个字段中的括号并插入一个{字符。输出通过管道传输到column,以使用{作为分隔符来构建表。我选择{字符,因为我认为它不太可能出现在数据中的任何其他位置,如果不是这样的话,请选择不同的字符。

如果我是你,我也会引用电影标题:

$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF;$4=q$4;$(NF-1)=$(NF-1)q}1' q='"' file | ..
0000000124 462910 8.8 "Star Wars: Episode V - The Empire Strikes Back"      1980
0000000124 698356 8.8 "The Lord of the Rings: The Fellowship of the Ring"   2001
0000000233 393855 8.8 "One Flew Over the Cuckoo's Nest"                     1975
0000000124 733447 8.7 "Inception"                                           2010
0000000233 411397 8.7 "Goodfellas"                                          1990
0000000123 519051 8.7 "Star Wars"                                           1977
0000000124 146841 8.7 "Shichinin no samurai"                                1954
0000000123 618195 8.7 "Forrest Gump"                                        1994
0000000123 680520 8.7 "The Matrix"                                          1999
0000000123 604519 8.7 "The Lord of the Rings: The Two Towers"               2002
0000000233 309137 8.7 "Cidade de Deus"                                      2002
0000000232 548307 8.6 "Se7en"                                               1995
0000000232 459707 8.6 "The Silence of the Lambs"                            1991

更好的方法是使用类似python的语言

您可以使用字符串函数rfind()来计算填充。如果您有python:,您应该使用以下脚本

import os
import sys
try:
    n = int(sys.argv[2])
except IndexError:
    n = 78
try:
    if os.path.isfile(sys.argv[1]):
        with open(sys.argv[1],'r') as f:
            for line in f:
                line = line.strip()
                pad = n - line.rfind("(")
                print line[:-7],' '*pad,line[-5:-1]
    else:
        print "Please provide a file."
except IndexError:
    print "Please provide a file."

将其保存到table.py这样的文件中,然后运行如下:

$ python table.py file
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back        1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring     2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                       1975
0000000124  733447   8.7  Inception                                             2010
0000000233  411397   8.7  Goodfellas                                            1990
0000000123  519051   8.7  Star Wars                                             1977
0000000124  146841   8.7  Shichinin no samurai                                  1954
0000000123  618195   8.7  Forrest Gump                                          1994
0000000123  680520   8.7  The Matrix                                            1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                 2002
0000000233  309137   8.7  Cidade de Deus                                        2002
0000000232  548307   8.6  Se7en                                                 1995
0000000232  459707   8.6  The Silence of the Lambs                              1991
0000000123  123456   9.9  The best file (of all time)                           2025

注意薄膜的添加:

0000000123  123456   9.9  The best file (of all time) (2025)

如果发布列的位置需要增加值作为第二个参数,如下所示:

$ python table.py file 100 

这里有一个python 2.X解决方案:

$ python --version
Python 2.7.3
$ echo "0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back (1980)" | python -c "import sys;s=sys.stdin.readlines()[0]; print '%st%s' % (s[:-7], s[-6:-2])"
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back    1980

如果您的字符串在tmpfile中,则:

$ cat tmpfile | python -c "import sys;map(lambda i: sys.stdout.write('%s %s %sn' % (i[:-8], ' '*(100-len(i)), i[-6:-2])), sys.stdin.readlines())"
0000000124  462910   8.8  Star Wars: Episode V - The Empire Strikes Back                      1980
0000000124  698356   8.8  The Lord of the Rings: The Fellowship of the Ring                   2001
0000000233  393855   8.8  One Flew Over the Cuckoo's Nest                                     1975
0000000124  733447   8.7  Inception                                                           2010
0000000233  411397   8.7  Goodfellas                                                          1990
0000000123  519051   8.7  Star Wars                                                           1977
0000000124  146841   8.7  Shichinin no samurai                                                1954
0000000123  618195   8.7  Forrest Gump                                                        1994
0000000123  680520   8.7  The Matrix                                                          1999
0000000123  604519   8.7  The Lord of the Rings: The Two Towers                               2002
0000000233  309137   8.7  Cidade de Deus                                                      2002
0000000232  548307   8.6  Se7en                                                               1995
0000000232  459707   8.6  The Silence of the Lambs                                            1991

最新更新