r - find all disjoint (non-overlapping) sets from a set of sets -

February 15, 2012

my problem: need find disjoint (non-overlapping) sets set of sets.

background: using comparative phylogenetic methods study trait evolution in birds. have tree ~300 species. tree can divided subclades (i.e. subtrees). if 2 subclades not share species, independent. i'm looking algorithm (and r implementation if possible) find possible subclade partitions each subclade has greater 10 taxa , independent. each subclade can considered set , when 2 subclades independent (do not share species) these subclades disjoint sets.

hope clear , can help.

cheers, glenn

the following code produces example dataset. subclades list of possible subclades (sets) i'd sample x disjoint sets, length of set y.

################################### # example dataset ###################################      library(ape)     library(phangorn)     library(treesim)     library(phytools)      ##simulate tree      n.taxa <- 300     tree <- sim.bd.taxa(n.taxa,1,lambda=.5,mu=0)[[1]][[1]]     tree$tip.label <- seq(n.taxa)      ##extract monophyletic subclades      get.all.subclades <- function(tree){     tmp <- vector("list")     nodes <- sort(unique(tree$edge[,1]))     <- 282     for(i in 1:length(nodes)){     x <- descendants(tree,nodes[i],type="tips")[[1]]     tmp[[i]] <- tree$tip.label[x]     }     tmp     }     tmp <- get.all.subclades(tree)      ##set bounds on maximum , mininum number of tips of subclades include      min.subclade.n.tip <- 10     max.subclade.n.tip <- 40       ##function replace trees of tip length exceeding max , min na      replace.trees <- function(x, min, max){     if(length(x) >= min & length(x)<= max) x else na     }       #apply testntip across subclades      tmp2 <- lapply(tmp, replace.trees, min = min.subclade.n.tip, max = max.subclade.n.tip)      ##remove elements list na,      ##all remaining elements subclades number of tips between  ##min.subclade.n.tip , max.subclade.n.tip      subclades <- tmp2[!is.na(tmp2)]      names(subclades) <- seq(length(subclades))

here's example of how might test each pair of list elements 0 overlap, extracting indices of non-overlapping pairs.

finddisjointpairs <- function(x) {     ## form 2-column matrix enumerating pairwise combos of x's elements     ij <- t(combn(length(x),2))         ## function tests 0 overlap between pair of vectors     aredisjoint <- function(i, j) length(intersect(x[[i]], x[[j]])) == 0          ## use mapply test overlap between each pair , extract indices      ## of pairs no matches     ij[mapply(aredisjoint, ij[,1], ij[,2]),] }  ## make reproducible data , test function on set.seed(1) <- replicate(sample(letters, 5), n=5, simplify=false)     finddisjointpairs(a) #      [,1] [,2] # [1,]    1    2 # [2,]    1    4 # [3,]    1    5

Search This Blog

KHS

r - find all disjoint (non-overlapping) sets from a set of sets -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -