属性错误:模块'pandas'没有属性'to_csv'



我从csv文件中提取了一些行,比如这个

pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 

并对其执行了一些功能。现在我想再次将其保存在csv中,但它给出了错误module 'pandas' has no attribute 'to_csv'我正试图像这个一样保存它

pd.to_csv(CV_data, sep='t', encoding='utf-8') 

这是我的完整代码。如何将生成的数据保存在csv或excel中?

   # Disable warnings, set Matplotlib inline plotting and load Pandas package
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
import pandas as pd
pd.options.display.mpl_style = 'default' 
CV_data = sqlContext.read.load('Downloads/data/churn-bigml-80.csv', 
                          format='com.databricks.spark.csv', 
                          header='true', 
                          inferSchema='true')
final_test_data = sqlContext.read.load('Downloads/data/churn-bigml-20.csv', 
                          format='com.databricks.spark.csv', 
                          header='true', 
                          inferSchema='true')
CV_data.cache()
CV_data.printSchema() 
pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 
from pyspark.sql.types import DoubleType
from pyspark.sql.functions import UserDefinedFunction
binary_map = {'Yes':1.0, 'No':0.0, True:1.0, False:0.0} 
toNum = UserDefinedFunction(lambda k: binary_map[k], DoubleType())
CV_data = CV_data.drop('State').drop('Area code') 
    .drop('Total day charge').drop('Total eve charge') 
    .drop('Total night charge').drop('Total intl charge') 
    .withColumn('Churn', toNum(CV_data['Churn'])) 
    .withColumn('International plan', toNum(CV_data['International plan'])) 
    .withColumn('Voice mail plan', toNum(CV_data['Voice mail plan'])).cache()
final_test_data = final_test_data.drop('State').drop('Area code') 
    .drop('Total day charge').drop('Total eve charge') 
    .drop('Total night charge').drop('Total intl charge') 
    .withColumn('Churn', toNum(final_test_data['Churn'])) 
    .withColumn('International plan', toNum(final_test_data['International plan'])) 
    .withColumn('Voice mail plan', toNum(final_test_data['Voice mail plan'])).cache()
pd.DataFrame(CV_data.take(5), columns=CV_data.columns) 
from pyspark.mllib.regression import LabeledPoint
from pyspark.mllib.tree import DecisionTree
def labelData(data):
    # label: row[end], features: row[0:end-1]
    return data.map(lambda row: LabeledPoint(row[-1], row[:-1]))
training_data, testing_data = labelData(CV_data).randomSplit([0.8, 0.2])
model = DecisionTree.trainClassifier(training_data, numClasses=2, maxDepth=2,
                                     categoricalFeaturesInfo={1:2, 2:2},
                                     impurity='gini', maxBins=32)
print (model.toDebugString())  
print ('Feature 12:', CV_data.columns[12])
print ('Feature 4: ', CV_data.columns[4] ) 
from pyspark.mllib.evaluation import MulticlassMetrics
def getPredictionsLabels(model, test_data):
    predictions = model.predict(test_data.map(lambda r: r.features))
    return predictions.zip(test_data.map(lambda r: r.label))
def printMetrics(predictions_and_labels):
    metrics = MulticlassMetrics(predictions_and_labels)
    print ('Precision of True ', metrics.precision(1))
    print ('Precision of False', metrics.precision(0))
    print ('Recall of True    ', metrics.recall(1))
    print ('Recall of False   ', metrics.recall(0))
    print ('F-1 Score         ', metrics.fMeasure())
    print ('Confusion Matrixn', metrics.confusionMatrix().toArray()) 
predictions_and_labels = getPredictionsLabels(model, testing_data)
printMetrics(predictions_and_labels)  
CV_data.groupby('Churn').count().toPandas() 
stratified_CV_data = CV_data.sampleBy('Churn', fractions={0: 388./2278, 1: 1.0}).cache()
stratified_CV_data.groupby('Churn').count().toPandas() 
pd.to_csv(CV_data, sep='t', encoding='utf-8') 

to_csvDataFrame对象的方法,而不是pandas模块的方法。

df = pd.DataFrame(CV_data.take(5), columns=CV_data.columns)
# whatever manipulations on df
df.to_csv(...)

您的代码中还有一行pd.DataFrame(CV_data.take(5), columns=CV_data.columns)

这一行创建一个数据帧,然后丢弃它。即使您成功地调用了to_csv,对CV_data的任何更改都不会反映在该数据帧中(因此也不会反映在输出的csv文件中)。

解决方案-你应该写df.to_csv而不是pd.to_csv

对齐-to_csv是对作为df(DataFrame)的对象的方法;而pd是Panda模块。

因此,您的代码不起作用,并抛出此错误";AttributeError:模块"pandas"没有属性"to_csv"

这就可以完成任务了!

#Create a DataFrame:    
new_df = pd.DataFrame({'id': [1,2,3,4,5], 'LETTERS': ['A','B','C','D','E'], 'letters': ['a','b','c','d','e']})
#Save it as csv in your folder:    
new_df.to_csv('C:\Users\You\Desktop\new_df.csv')

相关内容

  • 没有找到相关文章

最新更新