使用终端和正则表达式从.txt文件中选择变量名



我试图使一个文件与变量的名称:

我使用自然表达式和bash终端。主要的。txt文件包含以下内容:

" 1. symboling:                -3, -2, -1, 0, 1, 2, 3.
2. normalized-losses:        continuous from 65 to 256.
3. make:                     alfa-romero, audi, bmw, chevrolet, dodge, honda,
4. fuel-type:                diesel, gas.
5. aspiration:               std, turbo.
6. num-of-doors:             four, two.
7. body-style:               hardtop, wagon, sedan, hatchback, convertible.
8. drive-wheels:             4wd, fwd, rwd.
9. engine-location:          front, rear.
10. wheel-base:               continuous from 86.6 120.9.
11. length:                   continuous from 141.1 to 208.1.
12. width:                    continuous from 60.3 to 72.3.
13. height:                   continuous from 47.8 to 59.8.
14. curb-weight:              continuous from 1488 to 4066.
15. engine-type:              dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
16. num-of-cylinders:         eight, five, four, six, three, twelve, two.
17. engine-size:              continuous from 61 to 326.
18. fuel-system:              1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
19. bore:                     continuous from 2.54 to 3.94.
20. stroke:                   continuous from 2.07 to 4.17.
21. compression-ratio:        continuous from 7 to 23.
22. horsepower:               continuous from 48 to 288.
23. peak-rpm:                 continuous from 4150 to 6600.
24. city-mpg:                 continuous from 13 to 49.
25. highway-mpg:              continuous from 16 to 54.
26. price:                    continuous from 5118 to 45400."

我想要一个这样的文件:

"symboling               
normalized-losses       
make
fuel-type
.
.
.
"

我的尝试:

我知道选择正确信息(但带有数字)的正则表达式是:

([0-9]+.s([a-z]+-[a-z]+-[a-z]+))|([0-9]+.s[a-z]+-[a-z]+)|([0-9]+.s[a-z]+)

然后我尝试在bash中执行以下命令:

egrep "([0-9]+.s([a-z]+-[a-z]+-[a-z]+))|([0-9]+.s[a-z]+-[a-z]+)|([0-9]+.s[a-z]+)" file.txt  > names_col.txt 

但是不像我期望的那样工作。任何建议都会很棒!

使用sed

$ sed '/^$/d;s/ [^[:alpha:]]*([^:]*)[^"]*/1/' input_file
"symboling
normalized-losses
make
fuel-type
aspiration
num-of-doors
body-style
drive-wheels
engine-location
wheel-base
length
width
height
curb-weight
engine-type
num-of-cylinders
engine-size
fuel-system
bore
stroke
compression-ratio
horsepower
peak-rpm
city-mpg
highway-mpg
price"
sed -En "s/^(.*[0-9].s)([a-z-]*)(:.*$)/2/p" file.txt > names_col.txt

使用您显示的样本,请尝试以下awk程序。简单的解释是,这里使用awkgsub(全局替换)。在这里,我使用regex:[^"]*[[:space:]]+[^.]*.[[:space:]]+从冒号中删除所有内容,就在"和空格出现之前,然后是.的第一次出现,然后是NULL的空格,然后检查NF条件,如果一行不是空白,打印该行。

awk '{gsub(/:[^"]*|[[:space:]]+[^.]*.[[:space:]]+/,"")} NF' Input_file

最新更新