如何解决"AttributeError: 'float' object has no attribute 'lower'"



在这里输入图像描述得到问题与我的代码无法理解下一步该做什么,谁能帮助我出来

# Importing the libraries
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Embedding, LSTM, SpatialDropout1D
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical
import pickle
import re
# Importing the dataset
filename = "MoviePlots.csv"
data = pd.read_csv(filename, encoding= 'unicode_escape')
# Keeping only the neccessary columns
data = data[['Plot']]
# Clean the data
data['Plot'] = data['Plot'].apply(lambda x: x.lower())
data['Plot'] = data['Plot'].apply((lambda x: re.sub('[^a-zA-z0-9s]', '', x)))
# Create the tokenizer
tokenizer = Tokenizer(num_words=5000, split=" ")
tokenizer.fit_on_texts(data['Plot'].values)
# Save the tokenizer
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
# Create the sequences
X = tokenizer.texts_to_sequences(data['Plot'].values)
X = pad_sequences(X)
# Create the model
model = Sequential()
model.add(Embedding(5000, 256, input_length=X.shape[1]))
model.add(Bidirectional(LSTM(256, return_sequences=True, dropout=0.1, recurrent_dropout=0.1)))
model.add(LSTM(256, return_sequences=True, dropout=0.1, recurrent_dropout=0.1))
model.add(LSTM(256, dropout=0.1, recurrent_dropout=0.1))
model.add(Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
model.add(Dense(5000, activation='softmax'))
# Compile the model
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.01), metrics=['accuracy'])
# Train the model
model.fit(X, X, epochs=100, batch_size=128, verbose=1)
# Saving the model
model.save('visioniser.h5')

这是我的代码和错误的图像附加

谁来帮我解决这个问题,我的代码,请诊断它

似乎错误发生在data['Plot'] = data['Plot'].apply(lambda x: x.lower())(您正在数据列上调用apply函数->列中的一个值不是字符串,所以它没有lower方法)!

您可以通过检查实例是否实际上是字符串类型来修复此问题:

data['Plot'] = data['Plot'].apply(lambda x: x.lower() if isinstance(x, str) else x)

或者不使用lambda函数:

data['Plot'] = data['Plot'].str.lower()pandastr.lower跳过非字符串的值!

似乎您的列Plot保存了一些NaN值(被pandas认为是float),因此出现了错误。然后在调用pandas.Series.apply:

之前,尝试将列转换为strpandas.Series.astype
data['Plot'] = data['Plot'].astype(str).apply(lambda x: x.lower())

或者直接使用pandas.Series.str.lower:

data['Plot'] = data['Plot'].astype(str).str.lower()

re.sub相同,您可以使用pandas.Series.replace:

data['Plot'] = data['Plot'].astype(str).replace(r'[^a-zA-z0-9s]', '', regex=True)

相关内容

最新更新