将数据框分隔成3个新的数据框



我的目标是首先将一个数据框分为3个类别,然后创建包含这3个类别的3个新数据框。下面是我的代码:

from __future__ import absolute_import, division, print_function, unicode_literals
import tensorflow as tf
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth', 'PetalLength', 'PetalWidth', 'Species']
SPECIES = ['Setosa', 'Versicolor', 'Virginica']
train_path = tf.keras.utils.get_file(
"iris_training.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv")
test_path = tf.keras.utils.get_file(
"iris_test.csv", "https://storage.googleapis.com/download.tensorflow.org/data/iris_test.csv")
train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0)
test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0)
train.pop('SepalWidth')
train.pop('PetalWidth')
flower0 = pd.DataFrame(columns=['SepalLength', 'PetalLength'])
flower1 = pd.DataFrame(columns=['SepalLength', 'PetalLength'])
flower2 = pd.DataFrame(columns=['SepalLength', 'PetalLength'])
for row in range(len(train)):
species = train.iloc[row]['Species']
info = train.iloc[row]
info.pop('Species')
if species == 0.0:
flower0.append(info)
elif species == 1.0:
flower1.append(info)
else:
flower2.append(info)
print(flower0)
plt.scatter(flower0.pop('SepalLength'), flower0.pop('PetalLength'), color='Red')
plt.scatter(flower1.pop('SepalLength'), flower1.pop('PetalLength'), color='Blue')
plt.scatter(flower2.pop('SepalLength'), flower2.pop('PetalLength'), color='Green')
plt.show()

我对机器学习和数据工程非常陌生,所以我想把我的数据在散点图上可视化一点。因为我不能在4个维度上绘制这些数据(因为我有4个类别:萼片宽度/长度和花瓣宽度/长度),我决定只绘制2,萼片长度和花瓣长度。我通过使用.pop()方法删除了不必要的列,并且在此代码块中卡住了。

flower0 = pd.DataFrame(columns=['SepalLength', 'PetalLength'])
flower1 = pd.DataFrame(columns=['SepalLength', 'PetalLength'])
flower2 = pd.DataFrame(columns=['SepalLength', 'PetalLength'])
for row in range(len(train)):
species = train.iloc[row]['Species']
info = train.iloc[row]
info.pop('Species')
if species == 0.0:
flower0.append(info)
elif species == 1.0:
flower1.append(info)
else:
flower2.append(info)
print(flower0)
plt.scatter(flower0.pop('SepalLength'), flower0.pop('PetalLength'), color='Red')
plt.scatter(flower1.pop('SepalLength'), flower1.pop('PetalLength'), color='Blue')
plt.scatter(flower2.pop('SepalLength'), flower2.pop('PetalLength'), color='Green')
plt.show()

在这里,我创建了3个空数据框,其中有2列,我想稍后用于轴绘图,并且在for循环中循环大型数据集。for循环按种类对行进行排序,然后将它们附加到相应的数据框中。这里的附加似乎不起作用,因为当我打印出一个新的数据帧时,它读到:

Empty DataFrame
Columns: [SepalLength, PetalLength]
Index: []

有谁知道我应该如何将这些行添加到特定的新数据框架中?提前谢谢你!!

附带问题:这是显示散点图的最佳方式吗?我在网上查了一下,上面说最好的方法是把数据绘制成不同的散点集,这样我就可以独立地改变每一组的颜色。我的整个目标只是直观地看到不同颜色的每朵花的花瓣长度和萼片长度。

我认为你不需要在这里使用for循环,对于一个大的数据集来说,使用for循环遍历数据帧是非常低效的。

删除for循环,用iloc定义替换flower0, flower1, flower2的定义。

# change definition to what you want using iloc
flower0 = train.loc[train.Species==0.0][['SepalLength', 'PetalLength']]
flower1 = train.loc[train.Species==1.0][['SepalLength', 'PetalLength']]
flower2 = train.loc[train.Species>1 ][['SepalLength', 'PetalLength']]
# drop the for loop
plt.scatter(flower0.pop('SepalLength'), flower0.pop('PetalLength'), color='Red')
plt.scatter(flower1.pop('SepalLength'), flower1.pop('PetalLength'), color='Blue')
plt.scatter(flower2.pop('SepalLength'), flower2.pop('PetalLength'), color='Green')
plt.show()

在任何情况下,我相信你返回一个空数据帧,因为你试图"追加"。一个序列对象(info = train.iloc[row])到一个数据框架。要在现有数据帧上附加一个序列,请使用df = pd。concat ([df, s.to_frame () .T])

最新更新