我有一个DataFrame的列,看起来像这样:
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df:
(NYSE_close, close) (NYSE_close, open) (NYSE_close, volume) (NASDAQ_close, close) (NASDAQ_close, open) (NASDAQ_close, volume)
我想删除下划线之后的所有内容,并在逗号之后添加任何内容,以得到以下内容:
df:
NYSE_close NYSE_open NYSE_volume NASDAQ_close NASDAQ_open NASDAQ_volume
我试图去掉列名,但它用nan代替了它。有什么建议吗?
提前谢谢你。
您可以使用re.sub
提取列名的适当部分,并将其替换为:
import re
df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df.columns = [re.sub(r'(([^_]+_)w+, (w+))', r'12', c) for c in df.columns]
输出:
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
你可以:
import re
def cvt_col(x):
s = re.sub('[()_,]', ' ', x).split()
return s[0] + '_' + s[2]
df.rename(columns = cvt_col)
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []
使用列表推导式,两次:
step1 = [ent.strip('()').split(',') for ent in df]
df.columns = ["_".join([left.split('_')[0], right.strip()])
for left, right in step1]
df
Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []