熊猫,将数据集通过多列的字典将数据集转换为特征



让我们从我制作的可怕的无缺失代码开始...我必须编写每个行才能将10 1数据集转换为50 1数据集

import numpy as np
import pandas as pd
import csv
import os
with open("dataset_feature_champion_number.csv","r") as source:
    reader = csv.reader(source)
    with open("predataset_champ_rating.csv","w",newline='') as result:
        writer = csv.writer(result)
        
        for r in reader:
            writer.writerow((r[1],r[1],r[1],r[1],r[1],
                             r[2],r[2],r[2],r[2],r[2],
                             r[3],r[3],r[3],r[3],r[3],
                             r[4],r[4],r[4],r[4],r[4],
                             r[5],r[5],r[5],r[5],r[5],
                             r[6],r[6],r[6],r[6],r[7],
                             r[7],r[7],r[7],r[7],r[8],
                             r[8],r[8],r[8],r[8],r[9],
                             r[9],r[9],r[9],r[9],r[9],
                             r[10],r[10],r[10],r[10],r[10],r[11]))

哪个功能将RAW_DATASET转换为pre-dataset_feature,之后,我将pre pre-dataset_feature转换为" true"dataset_feature,全部在CSV文件中。

我的raw_dataset:

    blue1   blue2   blue3   blue4   blue5   red1    red2    red3    red4    red5    winner
0   125 11  59  70  124 36  129 20  135 111 0
1   23  40  77  53  95  67  73  37  132 91  0
.    .   .   .   .   .   .   .   .   .   .  .
39501   54  112 11  27  92  7   23  87  49  66  1

pre-data_set:

blue1   blue1   blue1   blue1   blue1   blue2   blue2 . red5 red5 red5 red5 red5 red5 winner
125 125 125 125 125 11  11 . 111 111 111 111 111 0
23  23  23  23  23  40  40 .  91  91  91  91  91 0

我的字典:

champNum    Damage  Toughness   Control Escape  Utility
1   2   2   2   2   0
2   3   1   2   3   0
3   3   1   1   3   1
.   .   .   .   .   .
125 3   2   1   1   1
.   .   .   .   .   .
137 2   1   2   2   3   2
138 3   0   3   0   1   2

期望我的前数据库将变成" true"数据集:

blue1   blue1   blue1   blue1   blue1   blue2   blue2 . red5 red5 red5 red5 red5 red5 winner
3   2   1   1   1   1   1 . 3   0   2   1   2   0
3   0   1   2   0   3   2 . 3   1   2   2   0   0

通过手动转换行获得的价值:1,6,11,16,21,21,26,31,31,36,41,46,with";等等...

不满意,我再次旨在编写轻微的高效代码...

    def createDictionary2(csvfile):
        with open(csvfile, mode='r') as data:
            reader = csv.reader(data)
            next(reader,None)
            dict = {int(rows[0]):[rows[1],rows[2],rows[3],rows[4],rows[5]] for rows in reader}
        return dict
def convertDataframeToAnotherFeature(csvfile,dictionary):
    df = pd.read_csv(csvfile)
    temp1 = df.iloc[:,1:11]
    temp2 = df['winner']
    temp3 = temp1.applymap(dictionary.get)
    champNum = temp3.join(temp2)
    return champNum
    
    def saveAsCSV5(dataframe):
        dataframe.to_csv("dataset_feature_champ_rating.csv")
        
    def feature5():
        diction = createDictionary2("champRating1.csv")
        dataset = convertDataframeToAnotherFeature("dataset_feature_champion_number.csv", diction)
        saveAsCSV5(dataset)
        
    feature5()

这样的结果:

    blue1   blue2   blue3   blue4   blue5   red1    red2    red3    red4    red5    winner
0   ['3', '2', '1', '1', '1']   ['1', '1', '3', '2', '3']   ['3', '0', '1', '0', '1']   ['3', '1', '0', '2', '0']   ['3', '1', '2', '0', '0']   ['3', '2', '1', '2', '2']   ['3', '2', '3', '1', '0']   ['3', '2', '2', '0', '0']   ['3', '0', '3', '1', '0']   ['1', '2', '3', '1', '3']   0
1   ['3', '0', '1', '2', '0']   ['3', '2', '2', '2', '0']   ['3', '1', '0', '3', '2']   ['3', '1', '1', '1', '3']   ['3', '1', '2', '2', '2']   ['1', '3', '3', '1', '0']   ['2', '1', '3', '0', '2']   ['2', '2', '2', '2', '0']   ['3', '1', '2', '3', '2']   ['3', '2', '2', '3', '0']   0

我认为这有点好,但是这让我感到困惑,因为我不知道如何扩展"列表"。在每个Colum中扩展并填充4列...

我缺乏对精确的"术语"的了解。描述和解决问题,因此我无法通过浏览,观看免费在线课程或阅读文档来有效地搜索正确的方法来解决问题。

编辑:包括"转换"

之前没有的功能

可以使用csv进行以下操作:

from itertools import chain
import csv
with open('champRating1.csv', 'r', newline='') as f_champs:
    csv_champs = csv.reader(f_champs)
    header_champs = next(csv_champs)
    champs_dict = {int(rows[0]) : list(map(int, rows[1:])) for rows in csv_champs}
repeat = len(next(iter(champs_dict.values())))  # length of a champ rating
with open('dataset_feature_champion_number.csv', 'r', newline='') as f_dataset, 
    open('predataset_champ_rating.csv', 'w', newline='') as f_output:
    csv_dataset = csv.reader(f_dataset)
    csv_output = csv.writer(f_output)
    header = next(csv_dataset)     
    csv_output.writerow(list(chain.from_iterable([v] * repeat for v in header[1:-1])) + [header[-1]])
    for input_row in csv_dataset:
        output_row = chain.from_iterable(champs_dict.get(int(value), [0] * repeat) for value in input_row[1:-1])
        csv_output.writerow(list(output_row) + [input_row[11]])

给您输出CSV文件,看起来像:

blue1,blue1,blue1,blue1,blue1,blue2,blue2,blue2,blue2,blue2,blue3,blue3,blue3,blue3,blue3,blue4,blue4,blue4,blue4,blue4,blue5,blue5,blue5,blue5,blue5,red1,red1,red1,red1,red1,red2,red2,red2,red2,red2,red3,red3,red3,red3,red3,red4,red4,red4,red4,red4,red5,red5,red5,red5,red5,winner
3,2,1,1,1,3,1,1,3,1,3,1,1,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,0,0,0,3,1,1,3,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1

最新更新