如何将循环的垂直列转换成行,而不是将它们堆叠在一起



我首先垂直生成一些数据,但希望将它们转换为行数据,然后将它们堆叠到像Pandas数据框架这样的数组中。我如何得到一个熊猫数据框架的最终产品与4列('fr', 'en', 'ir', 'ab')和三行?

# coding=utf-8
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
import nltk
import re
import random
from random import randint
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Get csv file into data frame
data = pd.read_csv("FamilySearchData_All_OCT2015_newEthnicity_filledEthnicity_processedName_trimmedCol.csv", header=0, encoding="utf-8")
df = DataFrame(data)
columns = ['fr', 'en', 'ir', 'ab']
classes = ['ethnicity2', 'Ab_group', 'Ab_tribe']
df_count = DataFrame(columns=columns)
for j in classes:
    for i in columns:
        ethnicity_tar = str(i)
        count = 0
        try:
            count = df[str(j)].value_counts()[ethnicity_tar]
        except Exception as e:
            count = ''
        print ethnicity_tar, count
输出:

fr 1554455
en 1196932
ir 941852
ab 95131
fr 1554444
en 16000
ir 940850
ab 9371
fr 1554600
en 2196931
ir 940957
ab 9399

我想要的结尾:

fr        en       ir      ab
1554455 1196932 941852  95131
1554444 16000 940850    9371
1554600 2196931 940957  9399

要实现这一点,我将创建一个列名字典(散列),每个列名包含一个数组。然后,当我循环遍历文件中的行时,我将使用第一个值索引到字典中以获得数组,然后将数值附加到该数组。

一旦构建了这个临时数据结构,就可以循环遍历数组,为每一行提取相同的索引值并打印它们:

for i in range(0, n):
   print str(hash['fr'][i]) + " " +
     str(hash['en'][i]) + " " +
     str(hash['ir'][i]) + " "
     str(hash['ab'][i])

最新更新