java - Multivariate gaussian classifier implementation. Trouble understanding, going from Naive Gaussian -


thank checking question out. trying understand how use multivariate gaussian classifier.

to introduce better problem, show how classify data.

i have library of these objects:

public class accfeat {     int id;      double[] mean = new double[3];     double[] sd = new double[3];     double[] avpeakdistance = new double[3];     int[][] histogram = new int[3][10];     int[][] ffthistogram = new int[3][10];     int[] crossingcount = new int[3];     double resultantacc;     int type; 

and object type field indicating not identified.

procedure:

  1. load library of 30 training samples per class.

  2. calculate mean , variance of each feature each sample class, store these values in arrays, 1 array of 73 mean/variance pairs per class. (because there total of 73 features, including 6 histograms 10 bins each (60 numbers))

  3. create array of 73 values correspond features of accfeat object indentified.

  4. calculate probability using understand naive bayesian classifier.

we check 0 8, because there 9 sample classes.

    (int = 0; < 9; i++) {             result = 1;             (int j = 0; j < samplefeatures.get(i).size(); j++) {                 result = result * p(queryfeatures.get(j), samplefeatures.get(i).get(j));  //this p function, first argument value of feature,  //second argument mean-variance pair feature in particular class.              }             results[i] = result;         }     } 

p(x) function this:

  1. http://i.stack.imgur.com/1iws5.jpg

and have 9 probability values, each class, classifier class corresponds highest probability value.


now want create multivariate gaussian classifier.

this formula used calculate probability in case:

  1. http://i.stack.imgur.com/jniwt.jpg

  2. so create variance-covariance matrix each of 9 classes. here i'm not sure if right, take 73 features, which, again, includes 6 histograms of 10 bins each, 60 of these features histogram of acceleration frequencies , acceleration values

this find bit dodgy, should put these values 1 matrix? calculating covariance between frequencies of accelerations in range 10-20 on x-axis , peak distance of y accelerations seems bit... odd.

but it, , create 73x73 matrix each class using formula each cell:

cov(feature a, feature b) = sum ( ( featurea[i] - mean_featurea ) * ( featureb[i] - mean_featureb ) ) / n-1

  1. the next thing need mean vector, create 73-element vector of means of each feature each group of smaples associated each class, total of 9 vectors.

  2. from understand, x in formula in case of program 73-element vector of feature values of unidentified accfeat object.

  3. so implement formula, thinking: have run using covariance matrices , means each class, , 1 highest outcome candidate indentification

problems:

  • the covariance matrix full of negative values, around 5% of them positive, , when they extremely large.

  • determinant of matrix in cases extremely close 0, or negative, breaks formula.

what wrong way i'm using classifier? unfortunaltely have nobody me this, , base of weak understanding on online lecture slides....

can me in how use it?

i haven't thoroughly checked whether you're doing valid, can error you're getting. don't have enough data points estimate covariance matrix, maximum likelihood estimate singular.

one option move variant of linear discriminant analysis, use pooled covariance matrix across classes. may non-singular covariance matrix, if have enough data pooled across classes. other option naive bayes, have, have diagonal covariance matrix.

otherwise, there reason you're looking use mvn classifier? use else, example svm kernels designed compare histograms.


Comments

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -