遍历.mp3链接的大列表以获取元数据标记并将其保存到Excel文件中。导致此错误。谢谢你的帮助。谢谢。
#print is_connected();
# Create a Pandas dataframe from the data.
df = pd.DataFrame({'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years})
# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter(xlspath, engine='xlsxwriter')
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer, sheet_name='Sheet1')
#df.to_excel(writer, sheet_name='Sheet1')
# Close the Pandas Excel writer and output the Excel file.
writer.save()
Traceback (most recent call last):
File "mp.py", line 87, in <module>
df = pd.DataFrame({'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years})
File "C:Python27libsite-packagespandascoreframe.py", line 266, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "C:Python27libsite-packagespandascoreframe.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
File "C:Python27libsite-packagespandascoreframe.py", line 5409, in _arrays_to_mgr
index = extract_index(arrays)
File "C:Python27libsite-packagespandascoreframe.py", line 5457, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length
你可以这样做来避免这个错误
a = {'Links' : lines ,'Titles' : titles , 'Singers': finalsingers , 'Albums':finalalbums , 'Years' : years}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
解释:
这会创建DataFrame,因为每个键(例如'Links'
)是一行,像这样,缺失的值实际上是缺失的列,这对pandas来说没有问题(在创建过程中,只有缺失的行导致ValueError
)。之后,您将DataFrame转置(翻转轴)并使行变为列,从而产生您最初想要的DataFrame。
它告诉你数组(行,标题,finalsingers等…)的长度不相同。可以通过
进行测试print(len(lines), len(titles), len(finalsingers)) # Print all of them out here
这将向您显示哪些数据是错误的,然后您需要做一些调查,找出正确的方法来纠正这一点。
可以用空元素填充最短列表:
def pad_dict_list(dict_list, padel):
lmax = 0
for lname in dict_list.keys():
lmax = max(lmax, len(dict_list[lname]))
for lname in dict_list.keys():
ll = len(dict_list[lname])
if ll < lmax:
dict_list[lname] += [padel] * (lmax - ll)
return dict_list
dict_list = {'Links': [1, 2, 3], 'Titles': [1, 2, 3, 4], 'Singers': [1, 2], 'Albums': [1, 2, 3], 'Years': [1, 2, 3, 4]}
dict_list = pad_dict_list(dict_list, 0)
print(dict_list)
输出{'Links': [1, 2, 3, 0], 'Titles': [1, 2, 3, 4], 'Singers': [1, 2, 0, 0], 'Albums': [1, 2, 3, 0], 'Years': [1, 2, 3, 4]}
重复的变量名导致了这个问题
我在读取JSON文件到pandas框架时遇到了同样的错误。添加linesbool,默认False参数解决了这个问题。
StringData = StringIO(obj.get()['Body'].read().decode('utf-8'))
mydata = pdf.read_json(StringData, lines=True)