python - grid search cross-validation on SVC probability output in sci-kit learn -
i'd run grid search cross-validation on probability outputs of svc
classifier. in particular i'd minimize negative log likelihood. documentation seems gridsearchcv
calls predict()
method of estimator passed , predict()
method of svc
returns class predictions not probabilities (predict_proba()
returns class probabilities).
1) need subclass svc
, give predict()
method returns probabilities rather classes accomplish log likelihood cross validation? guess need write own score_func
or loss_func
?
2) cross-validating on negative log likelihood dumb? i'm doing b/c dataset is: a) imbalanced 5:1 , b) not @ separable i.e. "worst" observations have > 50% chance of being in "good" class. (will post 2nd question on stats q&a)
yes, would, on both accounts.
class probsvc(svc): def predict(self, x): return super(probsvc, self).predict_proba(x)
i'm not sure if work since majority class may still dominate log-likelihood scores , final estimator might still produce >.5 positive samples of minority class. i'm not sure, though, please post stats.
Comments
Post a Comment