Python:为什么我不能将字符串从一个 df 复制到另一个 df?



我是一个初学者,自学成才。

我有这个初始矩阵:

,,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0

我想获得:

,,1.0,2.0,3.0,4.0,8.0,9.0
,,C,O,O,C,C,C
1.0,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,1.4784953771865779,2.4427526711622995
2.0,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,2.3918864536893496,2.871975783234887
3.0,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.3236193736523485,3.560975969980456
4.0,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,3.6419246237091034,4.772348473634059
8.0,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,0.0,1.324770443414403
9.0,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,1.324770443414403,0.0

我尝试的是:

aa=0
bb=0
for i in range(0,a):
if matrix.iloc[i,1] != "H":
for j in range(0,a):
if matrix.iloc[1,j] != "H":
#########################
d_nH2.iloc[i-aa][j-bb]=matrix.iloc[i,j]
#########################
else:
bb=bb+1
bb=0
else:
aa=aa+1

我得到了:

0    1         2         3         4         5         6         7
0  NaN  NaN  1.000000  2.000000  3.000000  4.000000  8.000000  9.000000
1  NaN  NaN  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000
2  1.0  0.0  0.000000  1.205475  1.342932  2.343014  1.478495  2.442753
3  2.0  0.0  1.205475  0.000000  2.245468  2.644316  2.391886  2.871976
4  3.0  0.0  1.342932  2.245468  0.000000  1.418916  2.323619  3.560976
5  4.0  0.0  2.343014  2.644316  1.418916  0.000000  3.641925  4.772348
6  8.0  0.0  1.478495  2.391886  2.323619  3.641925  0.000000  1.324770
7  9.0  0.0  2.442753  2.871976  3.560976  4.772348  1.324770  0.000000

不复制带字母的行和列。为什么?

我需要在这个带有数字和字母的混合表上工作,然后执行以下操作:

  1. 通过过滤"H">从初始矩阵获得新矩阵
  2. 对于
  3. 新表,我想编写两个嵌套循环:对于每个字母,!= 'H' 计算每个字母的行与"H"列相交的次数小于 1.6。
  4. 我想到的是将任何表视为bash:字符串和数字的混合,每个值都易于访问

您应该避免将数据视为其他语言(例如 bash)中的结构。 使用 numpy/pandas 的一个重要点是您可以矢量化操作。大多数情况下,您的代码会更短、更简单、更快。 每次你认为你需要循环值时,想想是否有更简单的向量方法。

input data

前两列和前两行应被视为(多)索引 (pd.MultiIndex),这在语义上是最合乎逻辑的。这样,数据只是浮点数,并且通过索引属性选择行/列要容易得多。想想你的数据真正代表什么:这里是一个带有标签的数字矩阵。

下面是从字符串导入,但您可能希望从 csv 文件直接导入,将io.StringIO块替换为您的文件名:pd.read_csv('file.csv', index_col=[0,1], header=[0,1])

import io
df = pd.read_csv(io.StringIO(''',,1,2,3,4,5,6,7,8,9,10,11,12
,,C,O,O,C,H,H,H,C,C,H,H,H
1,C,0.0,1.205475107329386,1.3429319010227962,2.3430136323519886,3.22738313640333,2.640130058756468,2.6401484355574363,1.4784953771865779,2.4427526711622995,3.4404701049315856,2.6506415109695562,2.173942147030341
2,O,1.205475107329386,0.0,2.245467917547002,2.6443156030953032,3.702905546101439,2.6354536594179083,2.6355724561170515,2.3918864536893496,2.871975783234887,3.9479515489105172,2.5936449600745437,3.2896946757332293
3,O,1.3429319010227962,2.245467917547002,0.0,1.418915551312475,2.015476882415432,2.0693088134923188,2.0692958839669946,2.3236193736523485,3.560975969980456,4.431347320573397,3.951843753512012,2.4366421143893597
4,C,2.3430136323519886,2.6443156030953032,1.418915551312475,0.0,1.0868846056358739,1.0921261760040055,1.092126228351473,3.6419246237091034,4.772348473634059,5.725281935435472,4.948741644534887,3.855293676517857
5,H,3.22738313640333,3.702905546101439,2.015476882415432,1.0868846056358739,0.0,1.7916118321336392,1.7916073980710447,4.336840746006843,5.570012200282658,6.44436962662531,5.876935928592363,4.304036910039309
6,H,2.640130058756468,2.6354536594179083,2.0693088134923188,1.0921261760040055,1.7916118321336392,0.0,1.774322615322816,3.999843247699306,5.001451201004137,5.992370839831868,5.038926795069471,4.349546588337786
7,H,2.6401484355574363,2.6355724561170515,2.0692958839669946,1.092126228351473,1.7916073980710447,1.774322615322816,0.0,3.9999029642804302,5.001556219427222,5.992449776200327,5.039085741282741,4.349558376763068
8,C,1.4784953771865779,2.3918864536893496,2.3236193736523485,3.6419246237091034,4.336840746006843,3.999843247699306,3.9999029642804302,0.0,1.324770443414403,2.107792016824585,2.085364895492881,1.079295724832157
9,C,2.4427526711622995,2.871975783234887,3.560975969980456,4.772348473634059,5.570012200282658,5.001451201004137,5.001556219427222,1.324770443414403,0.0,1.0763707503087891,1.0781013610472885,2.1192372863195152
10,H,3.4404701049315856,3.9479515489105172,4.431347320573397,5.725281935435472,6.44436962662531,5.992370839831868,5.992449776200327,2.107792016824585,1.0763707503087891,0.0,1.8418880170159488,2.4949700018092598
11,H,2.6506415109695562,2.5936449600745437,3.951843753512012,4.948741644534887,5.876935928592363,5.038926795069471,5.039085741282741,2.085364895492881,1.0781013610472885,1.8418880170159488,0.0,3.067298402780731
12,H,2.173942147030341,3.2896946757332293,2.4366421143893597,3.855293676517857,4.304036910039309,4.349546588337786,4.349558376763068,1.079295724832157,2.1192372863195152,2.4949700018092598,3.067298402780731,0.0'''),
index_col=[0,1],
header=[0,1])

现在您的数据帧类型为float,您可以使用将输出每列float64df.dtypes进行检查

如何过滤掉'H'行/列

使用索引非常简单,不需要手动计算。level选项表示您正在匹配索引第二级上的值(python 中的秒 = 1)

df.drop(columns='H', index='H', level=1)

输出:

1         2         3         4         8         9
C         O         O         C         C         C
1 C  0.000000  1.205475  1.342932  2.343014  1.478495  2.442753
2 O  1.205475  0.000000  2.245468  2.644316  2.391886  2.871976
3 O  1.342932  2.245468  0.000000  1.418916  2.323619  3.560976
4 C  2.343014  2.644316  1.418916  0.000000  3.641925  4.772348
8 C  1.478495  2.391886  2.323619  3.641925  0.000000  1.324770
9 C  2.442753  2.871976  3.560976  4.772348  1.324770  0.000000

如何计算非 H 行与值为 <1.6 的 H 列交叉的次数

以下是逐步的逻辑(仅运行最后一个):

  1. 删除 H 行和非 H 列:df.drop(index='H', columns=list('CO'), level=1)
5         6         7        10        11        12
H         H         H         H         H         H
1 C  3.227383  2.640130  2.640148  3.440470  2.650642  2.173942
2 O  3.702906  2.635454  2.635572  3.947952  2.593645  3.289695
3 O  2.015477  2.069309  2.069296  4.431347  3.951844  2.436642
4 C  1.086885  1.092126  1.092126  5.725282  4.948742  3.855294
8 C  4.336841  3.999843  3.999903  2.107792  2.085365  1.079296
9 C  5.570012  5.001451  5.001556  1.076371  1.078101  2.119237
  1. 检查是否<1.6:(df.drop(index='H', columns=list('CO'), level=1)<1.6)
5      6      7     10     11     12
H      H      H      H      H      H
1 C  False  False  False  False  False  False
2 O  False  False  False  False  False  False
3 O  False  False  False  False  False  False
4 C   True   True   True  False  False  False
8 C  False  False  False  False  False   True
9 C  False  False  False   True   True  False
  1. count true:(df.drop(index='H', columns=list('CO'), level=1)<1.6).sum(axis=1)
1  C    0
2  O    0
3  O    0
4  C    3
8  C    1
9  C    2

由于矩阵是对称的,转置操作(df.drop(index=['C', 'O'], columns='H', level=1)<1.6).sum()将给出完全相同的输出

相关内容

最新更新