如何在转换为数据帧或扩展没有外部字符的键值时从多值键中删除括号


df = pd.DataFrame.from_dict(dict_name, orient='index')
df.fillna('NaN', inplace=True)
df.to_csv('taxonomy_3.csv', index=True, header=True)

上面的代码处理一个嵌套字典到数据帧的转换非常好,但是如果你有一个用.append().extend()方法创建的嵌套字典,它会添加额外的括号[]和引号'',这使得下游分析变得困难。

例如:

{'Ceratopteris richardii': {'superkingdom': ['Eukaryota'], 'kingdom': ['Viridiplantae'], 'phylum': ['Streptophyta'], 'subphylum': ['Streptophytina'], 'clade': ['Embryophyta', 'Tracheophyta', 'Euphyllophyta'], 'class': ['Polypodiopsida'], 'subclass': ['Polypodiidae'], 'order': ['Polypodiales'], 'suborder': ['Pteridineae'], 'family': ['Pteridaceae'], 'subfamily': ['Parkerioideae'], 'genus': ['Ceratopteris']}, 'Arabidopsis thaliana': {'superkingdom': ['Eukaryota'], 'kingdom': ['Viridiplantae'], 'phylum': ['Streptophyta'], 'subphylum': ['Streptophytina'], 'clade': ['Embryophyta', 'Tracheophyta', 'Euphyllophyta', 'Spermatophyta', 'Mesangiospermae', 'eudicotyledons', 'Gunneridae', 'Pentapetalae', 'rosids', 'malvids'], 'class': ['Magnoliopsida'], 'order': ['Brassicales'], 'family': ['Brassicaceae'], 'tribe': ['Camelineae'], 'genus': ['Arabidopsis']}}

创建的设置:

line = line.strip()# remove newline character
words = line.split("t",1) # split the line at the first tab
if words[0] in taxonomy[name]: # add value if key already exists
taxonomy[name][words[0]].append(words[1])
else: # add key and value if key does not exist
taxonomy[name][words[0]] = [words[1]]

并使用pd.dataframe.from_dict()转换为数据帧

创建一个表,如下所示:

第二列tbody> <<tr>
第一列
Key1[Value1, Value2, ' value3 ']
Key2[Value2, ' value4 ', ' value5 ']

一个选项是stack列,join字符串,然后unstack:

out = pd.DataFrame(my_data).stack().map(', '.join).unstack()

但是在普通Python中先修改输入字典,然后构造DataFrame可能更有效:

for d in my_data.values():
for k,v in d.items():
d[k] = ', '.join(v)
out = pd.DataFrame(my_data)

输出:

Ceratopteris richardii                               Arabidopsis thaliana
superkingdom                                 Eukaryota                                          Eukaryota
kingdom                                  Viridiplantae                                      Viridiplantae
phylum                                    Streptophyta                                       Streptophyta
subphylum                               Streptophytina                                     Streptophytina
clade         Embryophyta, Tracheophyta, Euphyllophyta  Embryophyta, Tracheophyta, Euphyllophyta, Sper...
class                                   Polypodiopsida                                      Magnoliopsida
subclass                                  Polypodiidae                                                NaN
order                                     Polypodiales                                        Brassicales
suborder                                   Pteridineae                                                NaN
family                                     Pteridaceae                                       Brassicaceae
subfamily                                Parkerioideae                                                NaN
genus                                     Ceratopteris                                        Arabidopsis
tribe                                              NaN                                         Camelineae

最新更新