我首先垂直生成一些数据,但希望将它们转换为行数据,然后将它们堆叠到像Pandas数据框架这样的数组中。我如何得到一个熊猫数据框架的最终产品与4列('fr', 'en', 'ir', 'ab')和三行?
# coding=utf-8
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
import nltk
import re
import random
from random import randint
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Get csv file into data frame
data = pd.read_csv("FamilySearchData_All_OCT2015_newEthnicity_filledEthnicity_processedName_trimmedCol.csv", header=0, encoding="utf-8")
df = DataFrame(data)
columns = ['fr', 'en', 'ir', 'ab']
classes = ['ethnicity2', 'Ab_group', 'Ab_tribe']
df_count = DataFrame(columns=columns)
for j in classes:
for i in columns:
ethnicity_tar = str(i)
count = 0
try:
count = df[str(j)].value_counts()[ethnicity_tar]
except Exception as e:
count = ''
print ethnicity_tar, count
输出:fr 1554455
en 1196932
ir 941852
ab 95131
fr 1554444
en 16000
ir 940850
ab 9371
fr 1554600
en 2196931
ir 940957
ab 9399
我想要的结尾:
fr en ir ab
1554455 1196932 941852 95131
1554444 16000 940850 9371
1554600 2196931 940957 9399
要实现这一点,我将创建一个列名字典(散列),每个列名包含一个数组。然后,当我循环遍历文件中的行时,我将使用第一个值索引到字典中以获得数组,然后将数值附加到该数组。
一旦构建了这个临时数据结构,就可以循环遍历数组,为每一行提取相同的索引值并打印它们:
for i in range(0, n):
print str(hash['fr'][i]) + " " +
str(hash['en'][i]) + " " +
str(hash['ir'][i]) + " "
str(hash['ab'][i])