重新安装sklearn后出现错误



我得到以下错误一旦我更新sklearn到一个较新的版本-我不知道为什么这是。

    Traceback (most recent call last):
    File "/Users/X/Courses/Project/SupportVectorMachine/main.py", line 95, in <module>
y, x = dmatrices(formula, data=finalDataFrame, return_type='matrix')
    File "/Library/Python/2.7/site-packages/patsy/highlevel.py", line 297, in dmatrices
NA_action, return_type)
    File "/Library/Python/2.7/site-packages/patsy/highlevel.py", line 156, in _do_highlevel_design
return_type=return_type)
    File "/Library/Python/2.7/site-packages/patsy/build.py", line 947, in build_design_matrices
value, is_NA = evaluator.eval(data, NA_action)
   File "/Library/Python/2.7/site-packages/patsy/build.py", line 85, in eval
return result, NA_action.is_numerical_NA(result)
   File "/Library/Python/2.7/site-packages/patsy/missing.py", line 135, in is_numerical_NA
mask |= np.isnan(arr)
   TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'

这是与此对应的代码。我重新安装了从Numpy到scipy的所有东西。但什么都没用。

 # Merging the two dataframes - user and the tweets
 finalDataFrame =  pandas.merge(twitterDataFrame.reset_index(),twitterUserDataFrame.reset_index(),on=['UserID'],how='inner')
 finalDataFrame = finalDataFrame.drop_duplicates()
 finalDataFrame['FrequencyOfTweets'] = numpy.all(numpy.isfinite(finalDataFrame['FrequencyOfTweets']))
 # model formula, ~ means = and C() lets the classifier know its categorical data.
  formula = 'Classifier ~ InReplyToStatusID + InReplyToUserID + RetweetCount + FavouriteCount + Hashtags + UserMentionID + URL + MediaURL + C(MediaType) + UserMentionID + C(PossiblySensitive) + C(Language) + TweetLength + Location + Description + UserAccountURL + Protected + FollowersCount + FriendsCount + ListedCount + UserAccountCreatedAt + FavouritesCount + GeoEnabled + StatusesCount + ProfileBackgroundImageURL + ProfileUseBackgroundImage + DefaultProfile + FrequencyOfTweets'
  ### create a regression friendly data frame y gives the classifiers, x gives the features and gives different columns for Categorical data depending on variables. 
 y, x = dmatrices(formula, data=finalDataFrame, return_type='matrix')
 ## select which features we would like to analyze
 X = numpy.asarray(x)

我发现在调用np时有时会出现这个错误。对包含字符串或其他非浮点值的数组执行Isnan操作。试着发你的np。数组在传递给dmatrices之前使用arr.astype(float)。

此外,您的tweet的频率列被设置为全假或全真,因为np。

在大量查看代码等之后,问题是我传递的公式希望程序使用下面的所有功能。这里的UserAccountCreatedAt列的类型是datetime[ns]。我目前已经把这个公式,没有错误,但是,我想知道如何最好地将其转换为数字数据,以便实际通过它。这是因为分类数据是由C在一些列前面处理的,如下所示,datetime在patsy中被认为是数字。

  formula = 'Classifier ~ UserAccountCreatedAt + InReplyToStatusID + InReplyToUserID + RetweetCount + FavouriteCount + Hashtags + UserMentionID + URL + MediaURL + C(MediaType) + UserMentionID + C(PossiblySensitive) + C(Language) + TweetLength + Location + Description + UserAccountURL + Protected + FollowersCount + FriendsCount + ListedCount + FavouritesCount + GeoEnabled + StatusesCount + ProfileBackgroundImageURL + ProfileUseBackgroundImage + DefaultProfile + FrequencyOfTweets'

相关内容

  • 没有找到相关文章

最新更新