pandas/numbery指数值的选择


一个Python问题。我有个问题。下面有一个格式化的表格(开头是为了引起更多注意,而不是真正在表格中(:
Step  Time          Apple_price         fluctuation 
BFGS:    0 18:21:43    -6442.333161        7.4744
BFGS:    1 18:21:43   *-6442.899477        5.8484*
Step     Time       Apple_price         fluctuation
BFGS:    0 18:21:53    -6441.911200       16.3190
BFGS:    1 18:21:53    -6442.540975       10.6048
BFGS:    2 18:21:53    -6443.107163        7.6685
BFGS:    3 18:21:53    -6443.565044        6.2186
BFGS:    4 18:21:54    *-6443.954663        5.7485*
Step     Time      Apple_price         fluctuation
BFGS:    0 18:27:00    -6440.611426       24.6802
BFGS:    1 18:27:00    -6441.602767       21.3009
BFGS:    2 18:27:00    -6442.446886       15.6698
BFGS:    3 18:27:01    -6443.084822       11.6312
BFGS:    4 18:27:01    -6443.582671        8.6795
BFGS:    5 18:27:01    -6444.019236        7.4906
BFGS:    6 18:27:01    -6444.389951        6.7435
BFGS:    7 18:27:02   *-6444.732455        6.5221*

我想提取"0"one_answers"0"之间的值*"如下所示:

-6442.899477        5.8484
-6443.954663        5.7485
-6444.732455        6.5221

我的代码如下:

import pandas as pd
import numpy as np

all_lines = []                                   
file_name = input("What's the file name with extension?: ")
with open (f'{file_name}', 'r') as file:                     
for each_line in file:
all_lines.append(each_line.strip())

#print(all_lines)
for j in all_lines:
if j == 0:
j = j + 1
if 'fluctuation' in i:
all_lines.index(j-1)
print(j)

不幸的是,输出只是答案的第一行:

-6442.899477 5.8484

让我知道它如何提取某些列表中的索引值

导入正则表达式

import re

准备数据:

text = """   Step  Time          Apple_price         fluctuation 
BFGS:    0 18:21:43    -6442.333161        7.4744
BFGS:    1 18:21:43   *-6442.899477        5.8484*
Step     Time       Apple_price         fluctuation
BFGS:    0 18:21:53    -6441.911200       16.3190
BFGS:    1 18:21:53    -6442.540975       10.6048
BFGS:    2 18:21:53    -6443.107163        7.6685
BFGS:    3 18:21:53    -6443.565044        6.2186
BFGS:    4 18:21:54    *-6443.954663        5.7485*
Step     Time      Apple_price         fluctuation
BFGS:    0 18:27:00    -6440.611426       24.6802
BFGS:    1 18:27:00    -6441.602767       21.3009
BFGS:    2 18:27:00    -6442.446886       15.6698
BFGS:    3 18:27:01    -6443.084822       11.6312
BFGS:    4 18:27:01    -6443.582671        8.6795
BFGS:    5 18:27:01    -6444.019236        7.4906
BFGS:    6 18:27:01    -6444.389951        6.7435
BFGS:    7 18:27:02   *-6444.732455        6.5221*"""

定义正则表达式:*之间可能包含的字符

p = re.compile(r'*[- 0-9.]**')

匹配正则表达式和文本

a = p.findall(text)

a: 匹配数组。枚举检索索引和内容:

for k, v in enumerate(a):
print(k, v)

输出:

0-6442.899477 5.84841-6443.954663 5.7485264-44.732455 6.5221

我想我找到了一个简单的解决方案:

1-在bash

awk '{$1=$2=$3=""; print $0}' filename.out > filename.out2

2-型";错误";在最后一行3-以下代码

import numpy as np
import pandas as pd

f = open ('filename.out2', 'r')   
all_lines = []
for each_line in f:
all_lines.append(each_line.strip())

#for j in all_lines:
#    print(j)

df = pd.DataFrame(all_lines)
count_row = df.shape[0]         # Gives number of rows
print("count_row=", count_row)
count_col = df.shape[1]         # Gives number of columns
print("count_col=", count_col)

max_sw = 'Error'
lines = [i for i in range(len(all_lines)) if all_lines[i] == max_sw]
#print([i for i in range(len(all_lines)) if all_lines[i]== max_sw])
print(lines)

lines2 = []
for i in lines:
i = i - 1
lines2.append(i)
print(lines2)

lines3 = []
for i in lines2:
if i != -1:
#     print(i)
#      lines3 = [i for i in all_lines[i]] 
#      return 
lines3.append(all_lines[i])
print (lines3)

4-答案:

count_row=19

count_col=1

[0,3,9,18]

[-1,2,8,17]

['-6442.899477 5.8484','-6443.954663 5.7485','64.44.732455 6.5221']

无论如何,我欢迎任何新的帮助。

最新更新