如何在具有通过迭代生成的数据的数据框中插入新列



我正试图在AppleStore应用程序数据帧中的Price列之后插入一个名为Price Label的列,方法是遍历数据帧并将字符串("Free"或"Not Free"(附加到具有price = $0.00.的应用程序中。因此,我尝试的代码低于

for i, row in df.iterrows():
price = row.Price.replace('$','')
if price == '0.0':
row.append("Free")
else:
row.append("Non-Free")
df[column].append("price_label")   # in an attempt to add a header to column.

但随后我看到了下面的错误消息。有人能告诉我熊猫是否有一种特殊的方法可以将字符串连接到数据帧序列/列吗?一如既往,我感谢社区的帮助。你们是最棒的。


TypeError                                 Traceback (most recent call last)
<ipython-input-191-c6a90a84d57b> in <module>
6         row.append("Free")
7     else:
----> 8         row.append("Non-Free")
9 
10 df.head()
~anaconda3libsite-packagespandascoreseries.py in append(self, to_append, ignore_index, verify_integrity)
2580             to_concat = [self, to_append]
2581         return concat(
-> 2582             to_concat, ignore_index=ignore_index, verify_integrity=verify_integrity
2583         )
2584 
~anaconda3libsite-packagespandascorereshapeconcat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
279         verify_integrity=verify_integrity,
280         copy=copy,
--> 281         sort=sort,
282     )
283 
~anaconda3libsite-packagespandascorereshapeconcat.py in __init__(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
355                     "only Series and DataFrame objs are valid".format(typ=type(obj))
356                 )
--> 357                 raise TypeError(msg)
358 
359             # consolidate
TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

price_label = []
for i, row in df.iterrows():
price = row.Price.replace('$','')
if price == '0.0':
price_label.append("Free")
else:
price_label.append("Non-Free")

然后

df["price_label"] = price_label

尝试添加一个具有默认值的新列,然后更新行,其中价格为0:

df['price_label'] = 'Non-Free' # append a new column
df.loc[df['Price'] == '0.0$', 'price_label'] = 'Free' # set the price_label column, where the Price == 0.0$

代码的第二行通过"布尔索引"进行过滤:

https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#boolean-索引

本文以为例对此进行了详细解释

https://appdividend.com/2019/01/25/pandas-boolean-indexing-example-python-tutorial/

使用loc((按行和列索引进行选择:https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-标签

您可以使用numpy.where(condition, x, y)来返回元素。然后可以使用df.columns.getloc方法来获取Price列的位置。然后,您可以指定列的顺序,以便根据需要重新排列它们。

使用此:

import numpy as np
# --> make new column named `Price-Label`
df["Price-Label"] = np.where(df["Price"].eq("$0.0"), "Free", "Non-Free")
#--> get the location of `Price` column
price_col_loc = df.columns.get_loc("Price")
#--> Obtain the resulting dataframe by specifying the order of columns
#--> such that Price-Label column appear after the Price column
result = df[list(df.columns[:price_col_loc + 1]) + [df.columns[-1]] + list(df.columns[price_col_loc + 1:-1])]

最新更新