修复追加数据帧时的索引

我正在附加三个CSV：


df = pd.read_csv("places_1.csv")
temp = pd.read_csv("places_2.csv")
df = df.append(temp)
temp = pd.read_csv("places_3.csv")
df = df.append(temp)
print(df.head(20))

联接的表如下所示：

location  device_count  population
0        A            11         NaN
1        B            12         NaN
2        C            13         NaN
3        D            14         NaN
4        E            15         NaN
0        F            21         NaN
1        G            22         NaN
2        H            23         NaN
3        I            24         NaN
4        J            25         NaN
0        K            31         NaN
1        L            32         NaN
2        M            33         NaN
3        N            34         NaN
4        O            35         NaN

如您所见，索引不是唯一的。

当我调用这个 iloc 函数将人口列乘以 2 时：

df2 = df.copy
for index, row in df.iterrows():
df.iloc[index, df.columns.get_loc('population')] = row['device_count'] * 2

我得到以下错误的结果：

location  device_count  population
0        A            11        62.0
1        B            12        64.0
2        C            13        66.0
3        D            14        68.0
4        E            15        70.0
0        F            21         NaN
1        G            22         NaN
2        H            23         NaN
3        I            24         NaN
4        J            25         NaN
0        K            31         NaN
1        L            32         NaN
2        M            33         NaN
3        N            34         NaN
4        O            35         NaN

对于每个 CSV，它都会覆盖第一个 CSV 的索引我还尝试创建一个新的整数列并调用 df.set_index((。那行不通。

有什么提示吗？

首先，使用ignore_index，其次，不要使用append，使用pd.concat([temp1, temp2, temp3], ignore_index=True)。

正如其他人所说，您可以使用ignore_index，您可能应该在这里使用pd.concat。或者，对于不组合数据帧的其他情况，也可以使用df = df.reset_index(drop=True)在事后更改索引。

此外，出于此处文档中列出的原因，您应该避免使用iterrows()。使用以下方法效果更好：

df.loc[:, 'population'] = df.loc[:, 'device_count'].astype('int') * 2

相关内容

最新更新

热门标签：