"Can not convert a ndarray into a Tensor or Operation"在 TensorFlow 中使用 pandas 数据帧



我正在学习做一些基本的TensorFlow,我遇到了一些问题。我正在尝试使用PANDA从文件加载数据,然后在数据集中执行K-Nearest邻居,但是,我一直遇到问题

看来TensorFlow与Numpy的Ndarray无法使用,我已经呆了两天了。我想知道将CSV文件加载到TensorFlow的数据的最佳方法是什么?

typeerror:提取参数阵列([ - 70.837845,-62.241467,-37.82856, -55.596767],dtype = float32(具有无效的类型,必须是字符串或张量。(无法将ndarray转换为张量 或操作。(

import numpy as npfrom sklearn import preprocessing
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import time
Neighbors = 4
Training_step=1
data_frame=pd.read_csv('./data/breast-cancer-
wisconsin.csv',encoding='gbk')
# replace missing data with outlier inplace
data_frame.replace('?',-99999,inplace=True)
Y=np.array(data_frame['class'])
data_frame.drop(['id'],1,inplace=True)
X=np.array(data_frame.drop(['class'],1))
# splits dataset for cross validation x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=0)
y_train.shape=(489,1)
# tf Graph Input
x_training = tf.placeholder("float",[None,9],name="x_training_ph")
y_training = tf.placeholder("float",[None,1],name="y_training_ph")
x_testing = tf.placeholder("float",[9],name="x_testing_ph")
eucli_distance =tf.negative(tf.sqrt(tf.reduce_sum(tf.square(tf.subtract((x_training),         (x_testing))), axis=0)))
values, indices = tf.nn.top_k(eucli_distance, k=Neighbors, sorted=False)
nearest_neighbors = []
for i in range(Neighbors):
    #Returns the index with the largest value across axes of a tensor.
    nearest_neighbors.append(tf.argmax(y_training[indices[i]], 0))
#stack the tensor together
neighbors_tensor = tf.stack(nearest_neighbors)
#returns a tensor y containing all of the unique elements of x sorted         in the same order that they occur in x.
# This operation also returns a tensor idx the same size as x that contains the index of each value of x in the unique output y
y, idx, count = tf.unique_with_counts(neighbors_tensor)
#This operation extracts a slice of size size from a tensor input     starting at the location specified by begin.
#Get the closest neightbor
pred = tf.slice(y, begin=[tf.argmax(count, 0)], size=tf.constant([1], dtype=tf.int64))[0]
accuracy = 0.
# Initializing the variables
init = tf.global_variables_initializer()
start_time=time.time()
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
# loop over test data
for i in range(len(x_test)):
    # Get nearest neighbor
    # feed to place holder
    nn_index = sess.run(pred, feed_dict={x_training: x_train,     y_training : y_train, x_testing: x_test[i, :]})
    distance = sess.run(eucli_distance, feed_dict={x_training: x_train, y_training : y_train, x_testing: x_test[i, :]})
    print("Distnace is ", len(distance), " ", distance)
    values = sess.run(values, feed_dict={x_training: x_train, y_training : y_train, x_testing: x_test[i, :]})
    print("Value is ", len(values), " ", values)
    print("Case:", i, "Prediction:", nn_index,
         "True label", np.argmax(y_test[i]))
    #Calculate accuracy
    if nn_index == np.argmax(y_test[i]):
        accuracy += 1. / len(x_test)
    else:
        print("Not matched")
print("==========================================")
print('Neighbors:',Neighbors)
print('Training step:',Training_step)
print("Time used: %s second" % (time.time() - start_time))
print("Accuracy:", accuracy)

我正在使用的数据集是UCI中的数据集,示例如下:

id,clump_thickness,unif_cell_size,unif_cell_shape,marg_adhesion,single_epith_cell_size,bare_nuclei,bland_chromatin,normal_nucleoli,mitoses,class
1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2
1017122,8,10,10,8,7,10,9,7,1,4
1018099,1,1,1,1,2,10,3,1,1,2

您正在重新定义values,最初是top_k张量:

values, indices = tf.nn.top_k(eucli_distance, k=Neighbors, sorted=False)

...然后评估结果,即np.ndarray

values = sess.run(values, feed_dict={...})

因此,在第二个循环迭代中,TensorFlow无法弄清楚sess.run(values)的代表。只需选择一个不同的变量名称。

最新更新