java - extracting from specific areas using pdfclown -
i trying highlight text in pdf 2 columns , problem while extractor extracts text row wise. queried text doesn't matched. thinking if there function in pdfclown can me extract first half of page i.e., first column , second 1 selecting areas.
thanks.
as talk text extraction pdf clown, assume using textextractor class of library.
this class offers numerous attributes helping restrict parsing area:
public void setareas(list<rectangle2d> value); public void setareatolerance(double value); public void setareamode(areamodeenum value); setareas allows set page areas extract text from, setareatolerance allows add tolerance these areas (essentially enlarging areas value in directions), , setareamode used control whether string must contained area (containment) or merely needs intersect area (intersection) included in scan results.
how these attributes work, can witnessed in textextractor method
public map<rectangle2d,list<itextstring>> filter( list<? extends itextstring> textstrings, rectangle2d... areas ); which filters list of text strings on page.
Comments
Post a Comment