Postgresql k-nearest neighbor (KNN) on multidimensional cube -

August 15, 2010

i have cube has 8 dimensions. want nearest neighbor matching. i'm totally new postgresql. read 9.1 supports nearest neighbor matching on multidimensions. i'd appreciate if give complete example:

how create table 8d cube ?
sample insert
lookup - exact matching
lookup - nearest neighbor matching

sample data:

for simplicity sake, can assume values range 0-100.

point1: (1,1,1,1, 1,1,1,1)

point2: (2,2,2,2, 2,2,2,2)

look value: (1,1,1,1, 1,1,1,2)

this should match against point1 , not point2.

refs:

what's_new_in_postgresql_9.1

https://en.wikipedia.org/wiki/k-d_tree#nearest_neighbour_search

postgresql supports distance operator <-> , understand it, can used analyzing text (with pg_trgrm module) , geometry data type.

i not know how can use more 1 dimension. maybe have define own distance function or somehow convert data 1 column text or geometry type. example if have table 8 columns (8-dimensional cube):

c1 c2 c3 c4 c5 c6 c7 c8  1  0  1  0  1  0  1  2

you can convert to:

c1 c2 c3 c4 c5 c6 c7 c8   b   b   b   c

and table 1 column:

c1 abababac

then can use (after creating gist index):

select c1, c1 <-> 'ababab'  test_trgm   order c1 <-> 'ababab';

example

create sample data

-- create temporary data -- ! note table created in tmp schema (change sql scheme) , deleted if exists ! drop table if exists tmp.test_data;  -- random integer matrix 100*8  create table tmp.test_data (    select        trunc(random()*100)::int input_variable_1,       trunc(random()*100)::int input_variable_2,        trunc(random()*100)::int input_variable_3,       trunc(random()*100)::int input_variable_4,        trunc(random()*100)::int input_variable_5,        trunc(random()*100)::int input_variable_6,        trunc(random()*100)::int input_variable_7,        trunc(random()*100)::int input_variable_8           generate_series(1,100,1) );

transform input data text

drop table if exists tmp.test_data_trans;  create table tmp.test_data_trans ( select     input_variable_1 || ';' ||    input_variable_2 || ';' ||    input_variable_3 || ';' ||    input_variable_4 || ';' ||    input_variable_5 || ';' ||    input_variable_6 || ';' ||    input_variable_7 || ';' ||    input_variable_8 trans_variable     tmp.test_data );

this give 1 variable trans_variable 8 dimensions stored:

trans_variable 40;88;68;29;19;54;40;90 80;49;56;57;42;36;50;68 29;13;63;33;0;18;52;77 44;68;18;81;28;24;20;89 80;62;20;49;4;87;54;18 35;37;32;25;8;13;42;54 8;58;3;42;37;1;41;49 70;1;28;18;47;78;8;17

instead of || operator can use following syntax (shorter, more cryptic):

select     array_to_string(string_to_array(t.*::text,''),'') trans_variable     tmp.test_data t

add index

create index test_data_gist_index on tmp.test_data_trans using gist(trans_variable);

test distance note: i've selected 1 row table - 52;42;18;50;68;29;8;55 - , used changed value (42;42;18;52;98;29;8;55) test distance. of course, have different values in test data, because random matrix.

select     *,     trans_variable <->  '42;42;18;52;98;29;8;55' distance,    similarity(trans_variable, '42;42;18;52;98;29;8;55') similarity,     tmp.test_data_trans  order    trans_variable <-> '52;42;18;50;68;29;8;55';

you can use distance operator <-> or similiarity function. distance = 1 - similarity

Search This Blog

KHS

Postgresql k-nearest neighbor (KNN) on multidimensional cube -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -