suffix tree - how to find all the possible longest common subsequence from the same position -
i trying find possible longest common subsequence same position of multiple fixed length strings (there 700 strings in total, each string have 25 alphabets ). longest common subsequence must contain @ least 3 alphabets , belong @ least 3 strings. if have:
string test1 = "abcdeug"; string test2 = "abxdopq"; string test3 = "abydnpq"; string test4 = "hzsdwpq"; i need answer be:
string[] answer = ["abd", "dpq"]; my 1 problem needs fast possible. trying find answer suffix tree, solution of suffix tree method ["ab","pq"].suffix tree can find continuous substring multiple strings.the common longest common subsequence algorithm cannot solve problem. have idea on how solve low time cost? thanks
i suggest cast known computational problem before try use algorithm sounds might want.
here suggestion: convert graph problem. each position in string create set of nodes (one each unique letter @ position amongst strings in collection... 700 nodes if 700 strings differ in same position). once have created nodes each position in string go through set of strings looking @ how 2 positions share more 3 equal connections. in example first @ position 1 , 2 , see 3 strings contain "a" in position 1 , "b" in position 2, add directed edge between node "a" in first set of nodes of graph , "b" in second group of nodes (continue doing pairs of positions , combinations of letters in 2 positions). each combination of positions until have added necessary links.
once have final graph, must longest path; recommend looking @ wikipedia article here: longest path. in our case have directed acyclic graph , can solve in linear time! preprocessing should quadratic in number of string positions since imagine alphabet of fixed size.
p.s: sent me email biclustering algorithm working on; not yet published available sometime year (fingers crossed). interest though :)
Comments
Post a Comment