r - find all disjoint (non-overlapping) sets from a set of sets -
my problem: need find disjoint (non-overlapping) sets set of sets.
background: using comparative phylogenetic methods study trait evolution in birds. have tree ~300 species. tree can divided subclades (i.e. subtrees). if 2 subclades not share species, independent. i'm looking algorithm (and r implementation if possible) find possible subclade partitions each subclade has greater 10 taxa , independent. each subclade can considered set , when 2 subclades independent (do not share species) these subclades disjoint sets.
hope clear , can help.
cheers, glenn
the following code produces example dataset. subclades list of possible subclades (sets) i'd sample x disjoint sets, length of set y.
################################### # example dataset ################################### library(ape) library(phangorn) library(treesim) library(phytools) ##simulate tree n.taxa <- 300 tree <- sim.bd.taxa(n.taxa,1,lambda=.5,mu=0)[[1]][[1]] tree$tip.label <- seq(n.taxa) ##extract monophyletic subclades get.all.subclades <- function(tree){ tmp <- vector("list") nodes <- sort(unique(tree$edge[,1])) <- 282 for(i in 1:length(nodes)){ x <- descendants(tree,nodes[i],type="tips")[[1]] tmp[[i]] <- tree$tip.label[x] } tmp } tmp <- get.all.subclades(tree) ##set bounds on maximum , mininum number of tips of subclades include min.subclade.n.tip <- 10 max.subclade.n.tip <- 40 ##function replace trees of tip length exceeding max , min na replace.trees <- function(x, min, max){ if(length(x) >= min & length(x)<= max) x else na } #apply testntip across subclades tmp2 <- lapply(tmp, replace.trees, min = min.subclade.n.tip, max = max.subclade.n.tip) ##remove elements list na, ##all remaining elements subclades number of tips between ##min.subclade.n.tip , max.subclade.n.tip subclades <- tmp2[!is.na(tmp2)] names(subclades) <- seq(length(subclades))
here's example of how might test each pair of list elements 0 overlap, extracting indices of non-overlapping pairs.
finddisjointpairs <- function(x) { ## form 2-column matrix enumerating pairwise combos of x's elements ij <- t(combn(length(x),2)) ## function tests 0 overlap between pair of vectors aredisjoint <- function(i, j) length(intersect(x[[i]], x[[j]])) == 0 ## use mapply test overlap between each pair , extract indices ## of pairs no matches ij[mapply(aredisjoint, ij[,1], ij[,2]),] } ## make reproducible data , test function on set.seed(1) <- replicate(sample(letters, 5), n=5, simplify=false) finddisjointpairs(a) # [,1] [,2] # [1,] 1 2 # [2,] 1 4 # [3,] 1 5
Comments
Post a Comment