我收到了一些格式如下的数据:
lvl1 - desc
lvl2 - desc
lvl2 - desc
lvl3 - desc
lvl3 - desc
lvl3 - desc
lvl4 - desc
lvl4 - desc
级别以4个空格(4、8、12等(的倍数缩进,并带有一些描述性文本。有人能给我展示一种方法,根据前面的空白将这一列拆分为多列吗?输出应该是这样的:
lvl1 - desc
lvl2 - desc
lvl2 - desc
lvl3 - desc
lvl3 - desc
lvl3 - desc
lvl4 - desc
lvl4 - desc
感谢您的帮助。
感谢
你在寻找这样的东西吗:
# sample data
s = """lvl1 - desc1
lvl2 - desc2
lvl2 - desc3
lvl3 - desc4
lvl3 - desc5
lvl3 - desc6
lvl4 - desc7
lvl4 - desc8"""
df = pd.read_csv(StringIO(s), header=None)
# strip whitespace
df[0] = df[0].str.strip()
# groupby the first 4 characters of the string then apply list
# convert to array then back to a DataFrame and transpose
df2 = pd.DataFrame(df.groupby(df[0].str[:4])[0].apply(list).values.tolist()).T
0 1 2 3
0 lvl1 - desc1 lvl2 - desc2 lvl3 - desc4 lvl4 - desc7
1 None lvl2 - desc3 lvl3 - desc5 lvl4 - desc8
2 None None lvl3 - desc6 None