df = pd.DataFrame.from_dict(dict_name, orient='index')
df.fillna('NaN', inplace=True)
df.to_csv('taxonomy_3.csv', index=True, header=True)
上面的代码处理一个嵌套字典到数据帧的转换非常好,但是如果你有一个用.append()
或.extend()
方法创建的嵌套字典,它会添加额外的括号[]
和引号''
,这使得下游分析变得困难。
例如:
{'Ceratopteris richardii': {'superkingdom': ['Eukaryota'], 'kingdom': ['Viridiplantae'], 'phylum': ['Streptophyta'], 'subphylum': ['Streptophytina'], 'clade': ['Embryophyta', 'Tracheophyta', 'Euphyllophyta'], 'class': ['Polypodiopsida'], 'subclass': ['Polypodiidae'], 'order': ['Polypodiales'], 'suborder': ['Pteridineae'], 'family': ['Pteridaceae'], 'subfamily': ['Parkerioideae'], 'genus': ['Ceratopteris']}, 'Arabidopsis thaliana': {'superkingdom': ['Eukaryota'], 'kingdom': ['Viridiplantae'], 'phylum': ['Streptophyta'], 'subphylum': ['Streptophytina'], 'clade': ['Embryophyta', 'Tracheophyta', 'Euphyllophyta', 'Spermatophyta', 'Mesangiospermae', 'eudicotyledons', 'Gunneridae', 'Pentapetalae', 'rosids', 'malvids'], 'class': ['Magnoliopsida'], 'order': ['Brassicales'], 'family': ['Brassicaceae'], 'tribe': ['Camelineae'], 'genus': ['Arabidopsis']}}
创建的设置:
line = line.strip()# remove newline character
words = line.split("t",1) # split the line at the first tab
if words[0] in taxonomy[name]: # add value if key already exists
taxonomy[name][words[0]].append(words[1])
else: # add key and value if key does not exist
taxonomy[name][words[0]] = [words[1]]
并使用pd.dataframe.from_dict()
转换为数据帧
创建一个表,如下所示:
第一列 | 第二列Key1 | [Value1, Value2, ' value3 '] |
---|---|
Key2 | [Value2, ' value4 ', ' value5 '] |
一个选项是stack
列,join
字符串,然后unstack
:
out = pd.DataFrame(my_data).stack().map(', '.join).unstack()
但是在普通Python中先修改输入字典,然后构造DataFrame可能更有效:
for d in my_data.values():
for k,v in d.items():
d[k] = ', '.join(v)
out = pd.DataFrame(my_data)
输出:
Ceratopteris richardii Arabidopsis thaliana
superkingdom Eukaryota Eukaryota
kingdom Viridiplantae Viridiplantae
phylum Streptophyta Streptophyta
subphylum Streptophytina Streptophytina
clade Embryophyta, Tracheophyta, Euphyllophyta Embryophyta, Tracheophyta, Euphyllophyta, Sper...
class Polypodiopsida Magnoliopsida
subclass Polypodiidae NaN
order Polypodiales Brassicales
suborder Pteridineae NaN
family Pteridaceae Brassicaceae
subfamily Parkerioideae NaN
genus Ceratopteris Arabidopsis
tribe NaN Camelineae