CSV中行和列的条件更改



我有如下数据:

idx A B C D
0 0.0 0.0 0.0 apple
1 0.5 0.5 0.6 car
2 0.7 0.7 0.2 vegetables
3 0.8 0.9 0.4 fruits
4 0.9 1.0 0.8 metal
idx E 
0 0.000006
idx A B C D
0 1.0 1.1 0.1 computer
1 0.8 1.6 1.0 books
2 0.9 1.9 1.1 textile
idx E
0 1.000009
idx A B C D
0 0.7 2.5 2 mouse
1 0.6 2.9 3 animals
2 0.5 3.0 2 birds
3 0.9 3.3 4 flower
4 1.0 3.4 5 garden
5 1.0 3.8 1 desk
6 0.85 3.9 8 tea
7 0.2 4.2 9 bread
8 0.1 4.9 3 paper
9 0.7 7.6 6 butter
idx E
0 0.9

我想改变哪里有idx E删除重复的标题,重复上面的最后一行,使一个点代替列D的值,并将E替换为列与它的值(重复到整个对应)。我想用python有条件地更改它,如下所示:

idx A B C D E
0 0.0 0.0 0.0 apple 0.000006
1 0.5 0.5 0.6 car 0.000006
2 0.7 0.7 0.2 vegetables 0.000006
3 0.8 0.9 0.4 fruits 0.000006
4 0.9 1.0 0.8 metal 0.000006
5 0.9 1.0 0.0 . 0.000006
6 1.0 1.1 0.1 computer 1.000009
7 0.8 1.6 1.0 books 1.000009
8 0.9 1.9 1.1 textile 1.000009
9 0.9 1.9 . 1.000009
10 0.7 2.5 2 mouse 0.9
11 0.6 2.9 3 animals 0.9
12 0.5 3.0 2 birds 0.9
13 0.9 3.3 4 flower 0.9
14 1.0 3.4 5 garden 0.9
15 1.0 3.8 1 desk 0.9
16 0.85 3.9 8 tea 0.9
17 0.2 4.2 9 bread 0.9
18 0.1 4.9 3 paper 0.9
19 0.7 7.6 6 butter 0.9
20 0.7 7.6 0.0 . 0.9

是否有可能进行条件循环?使用这样的数据框架?

首先删除列ASeries.isin~boolean indexing的反向掩码中AE的行,创建默认索引

df = df[~df['A'].isin(['A','E'])].reset_index(drop=True)

然后按掩码设置测试Nr的列-通过Series.where设置NaNs为D,并反向填充缺失值,然后在A, B中设置DataFrame.mask的缺失值并向前填充缺失值,最后在C列中设置.:

m = df['A'].shift().eq('E')
m1 = df['A'].eq('E')
df['E'] = df['A'].where(m).bfill()
df[['A','B', 'C']] = df[['A','B', 'C']].mask(m | m1).ffill()
df.loc[m, 'D'] = '.'
df.loc[m, 'C'] = 0
df = df[~m1].reset_index(drop=True)
print (df)
A    B    C           D         E
0    0.0  0.0  0.0       apple  0.000006
1    0.5  0.5  0.6         car  0.000006
2    0.7  0.7  0.2  vegetables  0.000006
3    0.8  0.9  0.4      fruits  0.000006
4    0.9  1.0  0.8       metal  0.000006
5    0.9  1.0    0           .  0.000006
6    1.0  1.1  0.1    computer  1.000009
7    0.8  1.6  1.0       books  1.000009
8    0.9  1.9  1.1     textile  1.000009
9    0.9  1.9    0           .  1.000009
10   0.7  2.5    2       mouse       0.9
11   0.6  2.9    3     animals       0.9
12   0.5  3.0    2       birds       0.9
13   0.9  3.3    4      flower       0.9
14   1.0  3.4    5      garden       0.9
15   1.0  3.8    1        desk       0.9
16  0.85  3.9    8         tea       0.9
17   0.2  4.2    9       bread       0.9
18   0.1  4.9    3       paper       0.9
19   0.7  7.6    6      butter       0.9
20   0.7  7.6    0           .       0.9

我不会在这里使用pandas,但我会恢复到好的旧csv模块,我认为它更通用,可以处理不是真正的csv格式的文件:

delimiter=' '         # put here the actual delimiter
with open(input_csv) as infile, open(output_csv, newline='') as outfile
rd = csv.reader(infile, delimiter=delimiter)
wr = csv.writer(outfile, delimiter=delimiter)
wr.writerow(['idx', 'A', 'B', 'C', 'D', 'E'])    # write a header
nb = 0                                           # record number

for row in rd:
if flag:
e = row[-1]           # use last value
for r in pool:        # copy it for the whole block
r.append(e)
wr.writerows(pool)    # write the block
flag = False
elif row[0] == 'idx':
if row[1] == 'E':
pool.append(pool[-1][:])
pool[-1][-1] = '.'
pool[-1][0] = nb
nb += 1
flag = True
else:
pool = []
else:
row[0] = nb
pool.append(row)
nb += 1

根据您的输入,它给出:

idx A B C D E
0 0.0 0.0 0.0 apple 0.000006
1 0.5 0.5 0.6 car 0.000006
2 0.7 0.7 0.2 vegetables 0.000006
3 0.8 0.9 0.4 fruits 0.000006
4 0.9 1.0 0.8 metal 0.000006
5 0.9 1.0 0.8 . 0.000006
6 1.0 1.1 0.1 computer 1.000009
7 0.8 1.6 1.0 books 1.000009
8 0.9 1.9 1.1 textile 1.000009
9 0.9 1.9 1.1 . 1.000009
10 0.7 2.5 2 mouse 0.9
11 0.6 2.9 3 animals 0.9
12 0.5 3.0 2 birds 0.9
13 0.9 3.3 4 flower 0.9
14 1.0 3.4 5 garden 0.9
15 1.0 3.8 1 desk 0.9
16 0.85 3.9 8 tea 0.9
17 0.2 4.2 9 bread 0.9
18 0.1 4.9 3 paper 0.9
19 0.7 7.6 6 butter 0.9
20 0.7 7.6 6 . 0.9

相关内容

  • 没有找到相关文章

最新更新