java - extracting from specific areas using pdfclown -
i trying highlight text in pdf 2 columns , problem while extractor extracts text row wise. queried text doesn't matched. thinking if there function in pdfclown
can me extract first half of page i.e., first column , second 1 selecting areas.
thanks.
as talk text extraction pdf clown, assume using textextractor
class of library.
this class offers numerous attributes helping restrict parsing area:
public void setareas(list<rectangle2d> value); public void setareatolerance(double value); public void setareamode(areamodeenum value);
setareas
allows set page areas extract text from, setareatolerance
allows add tolerance these areas (essentially enlarging areas value in directions), , setareamode
used control whether string must contained area (containment
) or merely needs intersect area (intersection
) included in scan results.
how these attributes work, can witnessed in textextractor
method
public map<rectangle2d,list<itextstring>> filter( list<? extends itextstring> textstrings, rectangle2d... areas );
which filters list of text strings on page.
Comments
Post a Comment