Postgresql k-nearest neighbor (KNN) on multidimensional cube -
i have cube has 8 dimensions. want nearest neighbor matching. i'm totally new postgresql. read 9.1 supports nearest neighbor matching on multidimensions. i'd appreciate if give complete example:
how create table 8d cube ?
sample insert
lookup - exact matching
lookup - nearest neighbor matching
sample data:
for simplicity sake, can assume values range 0-100.
point1: (1,1,1,1, 1,1,1,1)
point2: (2,2,2,2, 2,2,2,2)
look value: (1,1,1,1, 1,1,1,2)
this should match against point1 , not point2.
refs:
https://en.wikipedia.org/wiki/k-d_tree#nearest_neighbour_search
postgresql supports distance operator <->
, understand it, can used analyzing text (with pg_trgrm module) , geometry data type.
i not know how can use more 1 dimension. maybe have define own distance function or somehow convert data 1 column text or geometry type. example if have table 8 columns (8-dimensional cube):
c1 c2 c3 c4 c5 c6 c7 c8 1 0 1 0 1 0 1 2
you can convert to:
c1 c2 c3 c4 c5 c6 c7 c8 b b b c
and table 1 column:
c1 abababac
then can use (after creating gist
index):
select c1, c1 <-> 'ababab' test_trgm order c1 <-> 'ababab';
example
create sample data
-- create temporary data -- ! note table created in tmp schema (change sql scheme) , deleted if exists ! drop table if exists tmp.test_data; -- random integer matrix 100*8 create table tmp.test_data ( select trunc(random()*100)::int input_variable_1, trunc(random()*100)::int input_variable_2, trunc(random()*100)::int input_variable_3, trunc(random()*100)::int input_variable_4, trunc(random()*100)::int input_variable_5, trunc(random()*100)::int input_variable_6, trunc(random()*100)::int input_variable_7, trunc(random()*100)::int input_variable_8 generate_series(1,100,1) );
transform input data text
drop table if exists tmp.test_data_trans; create table tmp.test_data_trans ( select input_variable_1 || ';' || input_variable_2 || ';' || input_variable_3 || ';' || input_variable_4 || ';' || input_variable_5 || ';' || input_variable_6 || ';' || input_variable_7 || ';' || input_variable_8 trans_variable tmp.test_data );
this give 1 variable trans_variable
8 dimensions stored:
trans_variable 40;88;68;29;19;54;40;90 80;49;56;57;42;36;50;68 29;13;63;33;0;18;52;77 44;68;18;81;28;24;20;89 80;62;20;49;4;87;54;18 35;37;32;25;8;13;42;54 8;58;3;42;37;1;41;49 70;1;28;18;47;78;8;17
instead of ||
operator can use following syntax (shorter, more cryptic):
select array_to_string(string_to_array(t.*::text,''),'') trans_variable tmp.test_data t
add index
create index test_data_gist_index on tmp.test_data_trans using gist(trans_variable);
test distance note: i've selected 1 row table - 52;42;18;50;68;29;8;55
- , used changed value (42;42;18;52;98;29;8;55
) test distance. of course, have different values in test data, because random matrix.
select *, trans_variable <-> '42;42;18;52;98;29;8;55' distance, similarity(trans_variable, '42;42;18;52;98;29;8;55') similarity, tmp.test_data_trans order trans_variable <-> '52;42;18;50;68;29;8;55';
you can use distance operator <-> or similiarity function. distance = 1 - similarity
Comments
Post a Comment