ValueError:设置带有序列的数组元素(具有基于数组的功能的LogisticRegress)



预先感谢您的任何指导。我正在尝试使用scikit-learn通过逻辑回归进行分类,其中x是截然的,一个字段是一系列称为Heartrate的Heartrate数据。基于研究其他人也面临此错误的其他人,我确保漏斗阵列的形状/大小相同。

它在sklearn/utils/veration.py Line 382中获取值错误,在check_array中,在该行中的check_array中,通过array = np.Array(array,dtype = dtype,order ofers = order = order = order = dyper = dype = dype = dype = dype = dype ==复制)。我怀疑我的阵列在内存中不连续,这就是构成问题的原因,但不确定...

这是一些代码剪辑,可以帮助解决问题:

    def get_training_set(self):
        training_set = []
        after_date = datetime.utcnow() - timedelta(weeks=8)
        before_date = datetime.utcnow() - timedelta(hours=12)
        activities = self.strava_client.get_activities(after=after_date, before=before_date)
        for act in activities:
            if act.has_heartrate:
                streams = self.strava_client.get_activity_streams(activity_id=act.id, types=['heartrate'])
                heartrate = np.array(list(filter(lambda x: x is not None, streams['heartrate'].data)))
                fixed_heartrate = np.pad(heartrate, (0, 15000 - len(heartrate)), 'constant')
                item = {'activity_type': self.classes.index(act.type),'heartrate': fixed_heartrate}
                training_set.append(item)
        return pd.DataFrame(training_set)
    def train(self):
        df = self.get_training_set()
        df['Intercept'] = np.ones((len(df),))
        y = df[['activity_type']]
        X = df[['Intercept', 'heartrate']]
        y = np.ravel(y)
        #
        model = LogisticRegression()
        self.debug('y={}'.format(y))
        model = model.fit(X,y)

例外发生在fit ...

事先感谢您的任何指导。

尊重,

Mike

从评论中复制以改进格式:

/python3.5/site-packages/sklearn/linear_model/logistic.py", line 1173, in 
    fit order="C") 
File "/python3.5/site-packages/sklearn/utils/validation.py", line 521, in 
    check_X_y ensure_min_features, warn_on_dtype, estimator) 
File "/lib/python3.5/site-packages/sklearn/utils/validation.py", line 382, in 
    check_array array = np.array(array, dtype=dtype, order=order, copy=copy) 
ValueError: setting an array element with a sequence

和其他评论:

x和y看起来像这样:

X.shape=(29, 2) 
y.shape=(29,) 
X=[[1 array([74, 74, 77, ..., 0, 0, 0])] 
   [1 array([66, 67, 69, ..., 0, 0, 0])] 
   ...          
   [1 array([92, 92, 91, ..., 0, 0, 0])] 
   [1 array([79, 79, 79, ..., 0, 0, 0])]] 
y=[ 0 11 11 0 1 0 11 0 11 1 0 11 0 0 11 0 0 0 0 0 11 0 11 0 0 0 11 0 0]

如果您更改train(),那么事情会更好吗?

def train(self):
    df = self.get_training_set()
    df['Intercept'] = 1                       # (a)
    y = df['activity_type'].values            # (b)
    X = [np.concatenate(( np.array([col1]), col2 )) for col1, col2 in df[['Intercept', 'heartrate']].values.T]
    model = LogisticRegression()
    model.fit(X,y)                            # (c)

(a)将生成正确长度的序列
(b)使用值返回numpy数组而不是另一个数据框架
(c)拟合在Inplophe

最新更新