search - Efficiently returning a field of all query hits in Lucene -


i have large lucene index, , queries can hit 5000 documents or so. storing application metadata in field in lucene (apart text contents), , need small metadata field 5000 hits. currently, code looks this:

mapfieldselector field = new mapfieldselector("metadata"); scoredoc[] hits = searcher.search(query, null, 10000).scoredocs; (int = 0; < hits.length; i++) {     int index_doc_id = hits[i].doc;     document hitdoc = searcher.doc(index_doc_id, field); // expensive esp disk-based lucene index     metadata = hitdoc.getfieldable("metadata").stringvalue(); } 

however, terribly slow because each call searcher.doc() pretty expensive. there way "batch" fetch of field hits may more responsive? or other way make work faster? (the thing inside scoredoc appears lucene doc id, understand should not relied upon. otherwise have maintained lucene doc id -> metadata map on own.) thanks!

update: trying use fieldcache's this:

string metadatas[] = org.apache.lucene.search.fieldcache.default.getstrings(searcher.getindexreader(), "metadata"); 

when open index, , upon query:

int ldocid = hits[i].doc; string metadata = metadatas[ldocid];  

this working me.

you're best bet on improving performance, reduce stored data as can. if have large content field stored in index, setting indexed only, rather stored improve performance. storing content external lucene, fetched after hit found in index, better idea.

there possiblity there exists better way end result looking for. i'm guessing 5000 sets of metadata aren't end result here. analysis may handled more on indexed data in lucene, instead of pulling out of index first. no idea, based on you've provided, whether possible in case, worth look.


Comments

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -