我正试图提取年份并将其打印在一个单独的新列上,并保持新列对齐。
这是输入文件:
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest (1975)
0000000124 733447 8.7 Inception (2010)
0000000233 411397 8.7 Goodfellas (1990)
0000000123 519051 8.7 Star Wars (1977)
0000000124 146841 8.7 Shichinin no samurai (1954)
0000000123 618195 8.7 Forrest Gump (1994)
0000000123 680520 8.7 The Matrix (1999)
0000000123 604519 8.7 The Lord of the Rings: The Two Towers (2002)
0000000233 309137 8.7 Cidade de Deus (2002)
0000000232 548307 8.6 Se7en (1995)
0000000232 459707 8.6 The Silence of the Lambs (1991)
我如何才能在这样一个单独的专栏中获得年份?
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
sed 's/)s*$//' file|column -s '(' -t
将处理给定的输入并为您提供预期的输出。
此处测试:
kent$ echo "0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring (2001)
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest (1975)
0000000124 733447 8.7 Inception (2010)
0000000233 411397 8.7 Goodfellas (1990)
0000000123 519051 8.7 Star Wars (1977)
0000000124 146841 8.7 Shichinin no samurai (1954)
0000000123 618195 8.7 Forrest Gump (1994)
0000000123 680520 8.7 The Matrix (1999)
0000000123 604519 8.7 The Lord of the Rings: The Two Towers (2002)
0000000233 309137 8.7 Cidade de Deus (2002)
0000000232 548307 8.6 Se7en (1995)
0000000232 459707 8.6 The Silence of the Lambs (1991)"|sed 's/)s*$//'|column -s '(' -t
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
这里有一个awk
的解决方案,它可以处理您的样本数据:
$ awk -F( '{printf("%-77s %dn", $1, $2)}' movies.txt
根据您的喜好调整格式(此处,年份位于78列。您可以在格式说明符中更改它,例如,如果希望它从第100列开始,请使用%-99s
。
这里有一个快速破解方法:
$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF}1' file | column -s'{' -t
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
awk
用于删除最后一个字段中的括号并插入一个{
字符。输出通过管道传输到column
,以使用{
作为分隔符来构建表。我选择{
字符,因为我认为它不太可能出现在数据中的任何其他位置,如果不是这样的话,请选择不同的字符。
如果我是你,我也会引用电影标题:
$ awk '{gsub(/[()]/,"",$NF);$NF="{"$NF;$4=q$4;$(NF-1)=$(NF-1)q}1' q='"' file | ..
0000000124 462910 8.8 "Star Wars: Episode V - The Empire Strikes Back" 1980
0000000124 698356 8.8 "The Lord of the Rings: The Fellowship of the Ring" 2001
0000000233 393855 8.8 "One Flew Over the Cuckoo's Nest" 1975
0000000124 733447 8.7 "Inception" 2010
0000000233 411397 8.7 "Goodfellas" 1990
0000000123 519051 8.7 "Star Wars" 1977
0000000124 146841 8.7 "Shichinin no samurai" 1954
0000000123 618195 8.7 "Forrest Gump" 1994
0000000123 680520 8.7 "The Matrix" 1999
0000000123 604519 8.7 "The Lord of the Rings: The Two Towers" 2002
0000000233 309137 8.7 "Cidade de Deus" 2002
0000000232 548307 8.6 "Se7en" 1995
0000000232 459707 8.6 "The Silence of the Lambs" 1991
更好的方法是使用类似python的语言
您可以使用字符串函数rfind()
来计算填充。如果您有python
:,您应该使用以下脚本
import os
import sys
try:
n = int(sys.argv[2])
except IndexError:
n = 78
try:
if os.path.isfile(sys.argv[1]):
with open(sys.argv[1],'r') as f:
for line in f:
line = line.strip()
pad = n - line.rfind("(")
print line[:-7],' '*pad,line[-5:-1]
else:
print "Please provide a file."
except IndexError:
print "Please provide a file."
将其保存到table.py
这样的文件中,然后运行如下:
$ python table.py file
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991
0000000123 123456 9.9 The best file (of all time) 2025
注意薄膜的添加:
0000000123 123456 9.9 The best file (of all time) (2025)
如果发布列的位置需要增加值作为第二个参数,如下所示:
$ python table.py file 100
这里有一个python 2.X解决方案:
$ python --version
Python 2.7.3
$ echo "0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back (1980)" | python -c "import sys;s=sys.stdin.readlines()[0]; print '%st%s' % (s[:-7], s[-6:-2])"
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
如果您的字符串在tmpfile
中,则:
$ cat tmpfile | python -c "import sys;map(lambda i: sys.stdout.write('%s %s %sn' % (i[:-8], ' '*(100-len(i)), i[-6:-2])), sys.stdin.readlines())"
0000000124 462910 8.8 Star Wars: Episode V - The Empire Strikes Back 1980
0000000124 698356 8.8 The Lord of the Rings: The Fellowship of the Ring 2001
0000000233 393855 8.8 One Flew Over the Cuckoo's Nest 1975
0000000124 733447 8.7 Inception 2010
0000000233 411397 8.7 Goodfellas 1990
0000000123 519051 8.7 Star Wars 1977
0000000124 146841 8.7 Shichinin no samurai 1954
0000000123 618195 8.7 Forrest Gump 1994
0000000123 680520 8.7 The Matrix 1999
0000000123 604519 8.7 The Lord of the Rings: The Two Towers 2002
0000000233 309137 8.7 Cidade de Deus 2002
0000000232 548307 8.6 Se7en 1995
0000000232 459707 8.6 The Silence of the Lambs 1991