从 Python 中的 txt 文件导入单个列,避免标题



我是Python的新手,请为我的初学者问题道歉。 我有一个 txt 文件,如图所示,我想导入三列(第 1 列 2nd 第 6 列(并将数据存储在三个不同的向量中,避免所有标头。

我知道有类似的问题,但我没有设法:(

奇尔斯

Name: Z1836_Tb10-TbCoTb_DL_MzDown_FS_Phi-90∞_I-665uA_Offs-0uA_Avg-5s_(0)_S1.dat
Date: Samstag, 1. Juni 2019 - new scaling
Scan Type: Field Scan
Angle [∞]: 90
Current [uA]: 665
Frequency [Hz]: 10
Offset [uA]: 0
Sampling Rate [Hz]: 204800
Averaging Duration [s]: 5
Measurement Duration: 00:30:44
----------
Field   R+(1f) Real    R+(1f) Img R+(1f) Mag R+(1f) Phase   R+(2f) Real    R+(2f) Img R+(2f) Mag R+(2f) Phase   Field Set
mT  g(W)   g(W)   g(W)   ∞   g(W)   g(W)   g(W)   ∞   A
1019.14 -0.135007229    0.015354704 -0.135877588    173.51149082    -2.776103401E-6 -2.996982259E-6 -4.085174752E-6 -132.808926217     20
1000.95 -0.134959525    0.015398131 -0.135835105    173.491016631   -1.41565391E-5  4.583223348E-6  -1.487997096E-5 162.060479267    19.6
982.67  -0.134951253    0.015396305 -0.13582668 173.491386228   1.196964996E-5  -3.522605161E-6 1.247722995E-5  -16.398879321    19.2
964.17  -0.134935857    0.015381909 -0.135809751    173.496684402   5.150366012E-7  -2.854084284E-5 2.854548953E-5  -88.966175556    18.8
945.27  -0.134941957    0.015372557 -0.135814754    173.500895888   -7.177408364E-6 -2.703168168E-5 -2.796832146E-5 -104.869975658   18.4
926.12  -0.134916606    0.01535581  -0.13578767 173.506706039   -1.599523307E-5 1.704176103E-5  -2.337240039E-5 133.18562213       18
906.81  -0.134895719    0.015356654 -0.135767013    173.505355181   -7.367897986E-6 2.807593732E-6  -7.884700584E-6 159.14027614     17.6
887.36  -0.134877099    0.015409203 -0.135754468    173.482430298   -1.011942317E-5 -1.588290362E-5 -1.883266717E-5 -122.502303207   17.2

TXT 文件

示例数据:

Lorem ipsum dolor sit amet consectetur adipisicing elit.
Cum repudiandae ipsam repellendus quas facere quidem 
sit saepe libero ut pariatur consectetur ad at nisi consequatur,
minima cupiditate iusto? Aut, quibusdam. Lorem ipsum, dolor sit amet consectetur adipisicing elit. Nulla consequuntur hic tempora nobis libero nihil maxime magnam ratione voluptatum veritatis ipsum ducimus enim, sequi beatae suscipit laboriosam maiores mollitia soluta.
col1 col2 col3 col4
1 2 3 4 
1 2 3 4 
1 2 3 4   
1 2 3 4  
1 2 3 4 
1 2 3 4

法典:

import pandas as pd
#Set you skiprows according to your text file
df = pd.read_csv('sample.txt', delim_whitespace=True, skiprows=5)
vector_col_2 = list(df.iloc[:,1])
vector_col_4 = list(df.iloc[:,3])
print('V2: ',vector_col_2)
print('V4: ',vector_col_4)

输出:

V2:  [2, 2, 2, 2, 2, 2]
V4:  [4, 4, 4, 4, 4, 4]

您可以执行以下操作:

import re
with open("file.txt", "r") as f:
lines = f.readlines()
data = []
for line in lines:
if not re.search(r'[A-DF-Za-df-z]', line): #Don't allow any letter except E or e
if re.search(r'd', line): # It has at least to have a line with a number
data.append(line.replace("n","").split())
data = list(zip(*data))
print(data[0])
print(data[1])
print(data[5])

输出:

('1019.14', '1000.95', '982.67', '964.17', '945.27', '926.12', '906.81', '887.36')
('-0.135007229', '-0.134959525', '-0.134951253', '-0.134935857', '-0.134941957', '-0.134916606', '-0.134895719', '-0.134877099')
('-2.776103401E-6', '-1.41565391E-5', '1.196964996E-5', '5.150366012E-7', '-7.177408364E-6', '-1.599523307E-5', '-7.367897986E-6', '-1.011942317E-5')

工作原理:

我们将逐行进行。如果该行包含任何不等于E/e的字母,我们将跳过该行。那就是忽略标头。 (由于科学记数法,允许使用E(

然后,如果该行不包含任何数字,我们也跳过该行。(这是在空行或您的文件中的行:-------.

如果该行除了 E 之外没有任何字母,并且至少有一个数字,我们将其放入拆分data

之后,我们使用zip将其格式化为列,然后您可以打印所需的列。

约束

  1. 如果任何列中包含文本,则此方法将不起作用。它适用于您的情况,因为您只有数字。

  2. 如果标题中的一行除了E和至少一个数字之外没有字母,它也会失败。

  3. 这是一种灵活的方法,因此您不需要预先了解文件标题的行数。但你必须知道 1.和 2.约束。如果您不能保证 1.和 2.约束 您可以使用跳过行来实现相同的效果。像这样更改for-loop

#This might change according to your need, in your example it's 13.
skip_lines = 13
for line in lines[skip_lines:]:
data.append(line.replace("n","").split())

我已经为您实现了一个示例脚本。该代码包含相关部分中的注释。

法典:

# Open the related file.
with open("ci/common/python_utils/test_text.txt", "r") as opened_file:
# Read the all lines of file. It returns a list type object.
lines = opened_file.readlines()
# Cut the unrelated lines (the header).
related_content = lines[13:]
# Init the vectors (lists).
col1, col2, col6 = [], [], []
for row in related_content:
# You should use the expected column -1 for the list indexing.
col1.append(row.split()[0])
col2.append(row.split()[1])
col6.append(row.split()[5])
# Print the content of columns
print("First column content: {}".format(col1))
print("Second column content: {}".format(col2))
print("Sixth column content: {}".format(col6))

使用的 txt 文件:

Name: Z1836_Tb10-TbCoTb_DL_MzDown_FS_Phi-90∞_I-665uA_Offs-0uA_Avg-5s_(0)_S1.dat
Date: Samstag, 1. Juni 2019 - new scaling
Scan Type: Field Scan
Angle [∞]: 90
Current [uA]: 665
Frequency [Hz]: 10
Offset [uA]: 0
Sampling Rate [Hz]: 204800
Averaging Duration [s]: 5
Measurement Duration: 00:30:44
----------
Field   R+(1f) Real    R+(1f) Img R+(1f) Mag R+(1f) Phase   R+(2f) Real    R+(2f) Img R+(2f) Mag R+(2f) Phase   Field Set
mT  g(W)   g(W)   g(W)   ∞   g(W)   g(W)   g(W)   ∞   A
1019.14 -0.135007229    0.015354704 -0.135877588    173.51149082    -2.776103401E-6 -2.996982259E-6 -4.085174752E-6 -132.808926217     20
1000.95 -0.134959525    0.015398131 -0.135835105    173.491016631   -1.41565391E-5  4.583223348E-6  -1.487997096E-5 162.060479267    19.6
982.67  -0.134951253    0.015396305 -0.13582668 173.491386228   1.196964996E-5  -3.522605161E-6 1.247722995E-5  -16.398879321    19.2
964.17  -0.134935857    0.015381909 -0.135809751    173.496684402   5.150366012E-7  -2.854084284E-5 2.854548953E-5  -88.966175556    18.8
945.27  -0.134941957    0.015372557 -0.135814754    173.500895888   -7.177408364E-6 -2.703168168E-5 -2.796832146E-5 -104.869975658   18.4
926.12  -0.134916606    0.01535581  -0.13578767 173.506706039   -1.599523307E-5 1.704176103E-5  -2.337240039E-5 133.18562213       18
906.81  -0.134895719    0.015356654 -0.135767013    173.505355181   -7.367897986E-6 2.807593732E-6  -7.884700584E-6 159.14027614     17.6
887.36  -0.134877099    0.015409203 -0.135754468    173.482430298   -1.011942317E-5 -1.

输出:

>>> python ci/common/python_utils/test_file.py 
First column content: ['1019.14', '1000.95', '982.67', '964.17', '945.27', '926.12', '906.81', '887.36']
Second column content: ['-0.135007229', '-0.134959525', '-0.134951253', '-0.134935857', '-0.134941957', '-0.134916606', '-0.134895719', '-0.134877099']
Sixth column content: ['-2.776103401E-6', '-1.41565391E-5', '1.196964996E-5', '5.150366012E-7', '-7.177408364E-6', '-1.599523307E-5', '-7.367897986E-6', '-1.011942317E-5']

最新更新