java - extracting from specific areas using pdfclown -

May 15, 2015

i trying highlight text in pdf 2 columns , problem while extractor extracts text row wise. queried text doesn't matched. thinking if there function in pdfclown can me extract first half of page i.e., first column , second 1 selecting areas.

thanks.

as talk text extraction pdf clown, assume using textextractor class of library.

this class offers numerous attributes helping restrict parsing area:

public void setareas(list<rectangle2d> value); public void setareatolerance(double value); public void setareamode(areamodeenum value);

setareas allows set page areas extract text from, setareatolerance allows add tolerance these areas (essentially enlarging areas value in directions), , setareamode used control whether string must contained area (containment) or merely needs intersect area (intersection) included in scan results.

how these attributes work, can witnessed in textextractor method

public map<rectangle2d,list<itextstring>> filter(     list<? extends itextstring> textstrings,     rectangle2d... areas );

which filters list of text strings on page.

Search This Blog

KHS

java - extracting from specific areas using pdfclown -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -