jdbc - Querying Large Datasets in Cassandra -


i experience rdbms programmer. working on scientific research problem involving genomic data. assigned explore cassandra since needed big data, scalable , cheap (free) solution. setting cassandra , loading data seductively trivial , similar experience traditional dbs oracle , mysql. problem finding simple strategy query data since fundamental requirement data repositories. data working mutation datasets contain positional information calculated numerical measures regarding data. set initial static column family looks this:

create table variant ( chrom text, pos int, ref text, alt text, aa text, ac int, af float, afr_af text, amr_af text, int, asn_af text, avgpost text, erate text, eur_af text, ldaf text, mutation_id text, patient_id int, rsq text, snpsource text, theta text, vt text, primary key (chrom, pos, ref, alt) ) bloom_filter_fp_chance=0.010000 , caching='keys_only' , comment='' , dclocal_read_repair_chance=0.000000 , gc_grace_seconds=864000 , read_repair_chance=0.100000 , replicate_on_write='true' , populate_io_cache_on_flush='false' , compaction={'class': 'sizetieredcompactionstrategy'} , compression={'sstable_compression': 'snappycompressor'};  create index af_variant_idx on variant (af); 

as can see there natural primary key of positional data (chrome, pos, ref , alt). data not meaningful querying point of view. more interesting clients extract data 'af' value below value. using java restful services interact database using cql jdbc driver. became apparent directly querying table not work using af since seems select statement must identify row keys want at. found confusing discussions on point decided since distinct values of af below 100 values, built lookup table looks this:

create table af_lookup (   af_id float,   column1 text,   column2 text,   value text,   primary key (af_id, column1, column2)  ) compact storage ,  bloom_filter_fp_chance=0.010000 ,  caching='keys_only' ,  comment='' ,  dclocal_read_repair_chance=0.000000 ,  gc_grace_seconds=864000 ,  read_repair_chance=0.100000 ,  replicate_on_write='true' ,  populate_io_cache_on_flush='false' ,  compaction={'class': 'sizetieredcompactionstrategy'} ,  compression={'sstable_compression': 'snappycompressor'}; 

this meant dynamic table wide rows. populated table based on data stored on static column family. 'af' value key , compound key other table concantenate '-' (i.e.1-129-t-g) , stored string dynamic column name. worked ok still not understand how of these things work together. dynamic column families seem work advertised using cql -2 need utilize function >, <, >=, <=. seems theoretically possible have not found solution in last 4 weeks of trying number of different tools (i tried astyanax jdbc driver).

i have 2 primary problems, first rpc timeout limitation querying these data produce 10 of thousands millions of records. second problem how present these data in html getting data has not been presented (previous - next links). similar way opscenter displays column family record data. doesn't seem possible functional limitations of not being able use >, <, >=, <=. based on experience lack of understanding on part of how product works rather lack of capability of product (databases wouldn't useful if capable of handling writes well).

is there out there has encountered issue , solved before? appreciate sharing example of how implement c* solution using java web services display large number of results have paginated through.

you may want explore , use playorm cassandra can resolve problem of timout limitation , pagination. playorm returns cursor when query , first page reads in first 20 results , displays it, next page can use same cursor in session , picks right left off without rescanning first 20 rows again.
visit http://buffalosw.com/wiki/an-example-to-begin-with-playorm/ see example cursor , http://buffalosw.com/products/playorm/ features , more details playorm


Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -