每个单词在单独的行上

>我有一句话像

例如

我想把它写到一个文件中，这样这句话中的每个单词都写到单独的行中。

如何在 shell 脚本中执行此操作？

有几种方法可以做到，选择你最喜欢的！

echo "This is for example" | tr ' ' 'n' > example.txt

或者只是这样做以避免不必要地使用echo：

tr ' ' 'n' <<< "This is for example" > example.txt

<<<表示法与此处字符串一起使用

或者，使用sed而不是tr：

sed "s/ /n/g" <<< "This is for example" > example.txt

有关更多选择，请查看其他人的答案=)

$ echo "This is for example" | xargs -n1
This
is
for
example

尝试使用：

string="This is for example"
printf '%sn' $string > filename.txt

或利用 bash分词

string="This is for example"
for word in $string; do
echo "$word"
done > filename.txt

example="This is for example"
printf "%sn" $example

注意我在一些简化正则表达式的草稿中写了这个，所以如果有任何不一致，那可能是原因。

你关心标点符号吗？例如，在某些调用中，您会看到例如"单词"(等)与括号完全相同。或者这个词将是"括号"而不是"括号"。如果您正在解析具有正确句子的文件，这可能是一个问题，尤其是如果您想按单词排序甚至每个单词的字数统计。

有一些方法可以解决这个问题，但也有一些警告，当然还有改进的余地。这些恰好与数字、破折号(数字)和小数点/点(数字)有关。也许有一套精确的规则将有助于解决这个问题，但以下示例可以为您提供一些工作。我做了一些人为的输入示例来演示这些缺陷(或者你想怎么称呼它们)。

$ echo "This is an example sentence with punctuation marks and digits i.e. , . ; ! 7 8 9" | grep -o -E '<[A-Za-z0-9.]*>'
This
is
an
example
sentence
with
punctuation
marks
and
digits
i.e
7
8
9

如您所见，即结果只是即，否则标点符号未显示。好的，但这省略了 major.minor.revision-release 形式的版本号等内容，例如0.0.1-1;这也可以显示吗？是的：

$ echo "The current version is 0.0.1-1. The previous version was current from 2017-2018."|grep -o -E '<[-A-Za-z0-9.]*>'
The
current
version
is
0.0.1-1
The
previous
version
was
current
from
2017-2018

请注意句子不以句号结尾。如果在年份和破折号之间添加空格会发生什么？你不会有破折号，但每一年都会有自己的行：

$ echo "2017 - 2018" | grep -o -E '<[-A-Za-z0-9.]*>'
2017
2018

那么问题就变成了你是否希望-本身被计算在内;根据分隔单词的本质，如果有空格，你就不会将年份作为单个字符串。因为它本身不是一个词，我认为不是。

我相信这些可以进一步简化。此外，如果您根本不想要任何标点符号或数字，您可以将其更改为：

$ echo "The current version is 0.0.1-1. The previous version was current from 2017-2018."|grep -o -E '<[A-Za-z]*>'
The
current
version
is
The
previous
version
was
current
from

如果你想得到数字：

$ echo "The current version is 0.0.1-1. The previous version was current from 2017-2018."|grep -o -E '<[A-Za-z0-9]*>'
The
current
version
is
0
0
1
1
The
previous
version
was
current
from
2017
2018

至于同时带有字母和数字的"单词"，这是另一件事，可能会或可能不会考虑，但可以证明上述内容：

$ echo "The current version is 0.0.1-1. test1."|grep -o -E '<[A-Za-z0-9]*>'
The
current
version
is
0
0
1
1
test1

输出它们。但以下内容没有(因为它根本不考虑数字)：

$ echo "The current version is 0.0.1-1. test1."|grep -o -E '<[A-Za-z]*>'
The
current
version
is

忽略标点符号很容易，但在某些情况下可能需要或渴望它们。例如，我想您可以使用 sed 来更改诸如例如的行，但我想这将是个人喜好。

我可以总结它是如何工作的，但只是;我太累了，甚至无法考虑太多：

它是如何工作的？

我只会解释调用grep -o -E '<[-A-Za-z0-9.]*>'但其中大部分在其他调用中是相同的(扩展 grep 中的垂直条/管道符号允许多个模式)：

-o选项仅用于打印匹配项，而不是整行。-E用于扩展的grep(也可以使用egrep)。至于正则表达式本身：

<和>是单词边界(分别开始和结束 - 如果需要，您只能指定一个);我相信-w选项与指定两者相同，但调用可能有点不同(我实际上不知道)。

'<[-A-Za-z0-9.]*>'表示破折号，大写和小写字母以及零次或多次点。至于为什么它会变成例如.e.g，我目前只能说这是模式，但我没有能力更多地考虑它。

字频计数奖励脚本

#!/bin/bash
if [ $# -eq 0 ]; then
echo "Usage: $(basename ${0}) <FILE> [FILE...]"
exit 1
fi
for file do
if [ -e "${file}" ]
then
echo "** ${file}: "
grep -o -E '<[-A-Za-z0-9.]*>' "${file}"|sort | uniq -c | sort -rn
else
echo >&2 "${1}: file not found"
continue
fi
done

例：

$ cat example 
The current version is 0.0.1-1 but the previous version was non-existent.
This sentence contains an abbreviation i.e. e.g. (so actually two abbreviations).
This sentence has no numbers and no punctuation  
$ ./wordfreq example 
** example: 
2 version
2 sentence
2 no
2 This
1 was
1 two
1 the
1 so
1 punctuation
1 previous
1 numbers
1 non-existent
1 is
1 i.e
1 has
1 e.g
1 current
1 contains
1 but
1 and
1 an
1 actually
1 abbreviations
1 abbreviation
1 The
1 0.0.1-1

注：注：我没有将大写字母音译为小写，因此单词"The"和"the"显示为不同的单词。如果您希望它们全部小写，则可以在排序之前将脚本中的 grep 调用更改为通过管道传输到 tr：

grep -o -E '<[-A-Za-z0-9.]*>' "${file}"|tr '[A-Z]' '[a-z]'|sort | uniq -c | sort -rn

哦，既然你问你是否要把它写到一个文件中，你可以添加到命令行(这适用于原始调用)：

> output_file

对于脚本，您可以像这样使用它：

$ ./wordfreq file1 file2 file3 > output_file

使用fmt命令

>> echo "This is for example" | fmt -w1 > textfile.txt ; cat textfile.txt
This
is
for
example

有关fmt及其选项的完整说明，请查看相关的手册页。

尝试使用：

str="This is for example"
echo -e ${str// /\n} > file.out

输出

> cat file.out 
This
is
for
example

没有人建议bash的内置read命令：

s='This is for example'
read -ra words <<< "$s"
printf '%sn' "${words[@]}"

This
is
for
example

数据始终被完全引用，因此不受文件名扩展的影响。

$IFS的当前值将控制拆分。默认值为空格制表符换行符：IFS=$' tn'

注意我在一些简化正则表达式的草稿中写了这个，所以如果有任何不一致，那可能是原因。

它是如何工作的？

字频计数奖励脚本

相关内容

最新更新

热门标签：