freepascal - how to get the real file contents using TFilestream? -


i try file contents using tfilestream:

procedure showfilecont(myfile : string); var tr : string; fs : tfilestream; begin    fs   := tfilestream.create(myfile, fmopenread or fmsharedenynone);     setlength(tr, fs.size);    fs.read(tr[1], fs.size);    showmessage(tr);     fs.free; end; 

i little text file contents only: aaaaaaaj“њРЉtщЂ®8ЈЏvд"Ј¦aИaaaaaaa

  1. and save file (using akelpad) 1251 (ansi) codepege
  2. save 65001 (utf8) codepage.

these files has different size there contents equal - oped them both in notepad , both has same contents

but when run showfilecont proc shows me different results:

  1. aaaaaaaj?Њt?8?v?"?a?aaaaaaa
  2. aaaaaaaj“њРЉtщЂ®8ЈЏvд"Ј¦aИaaaaaaa

questions:

  1. how real file contents using tfilestream?
  2. how explain these 2 files has different size content (in notepad) equeal?

add: sorry, didn't use lazarus fpc , string = utf8string

why files have different size?

because use different encodings. 1251 encoding maps each character single byte. utf-8 uses variable numbers of bytes each character.

how true file contents?

you need use string type matches encoding used in file. so, example, if content utf-8 encoded, best choice, load content utf-8 string. using fpc in mode string utf-8 encoded. in case code in question need.

loading mbcs encoded file code page of 1251, say, more tricky. can load ansistring variable , long system's locale 1251 conversions performed correctly.

but code behave differently when run on machine different locale. , if wanted load text using different mbcs encodings, example 1252, cannot use approach. need load byte array , convert 1252, say, utf-8 store utf-8 in string variable.

in order can use lconvencoding unit lcl. example, can use cp1251toutf8, cp1252toutf8 etc. convert mbcs utf-8.

how can determine file encoding used?

you cannot. can make guess accurate in many cases. in general, impossible identify encoding of array of bytes meant represent text.

it possible take file , rule out encodings. example, not byte streams valid utf-8 or utf-16 text. , can rule out such files. encodings 1251, 1252 etc. byte stream valid. there's no way tell 1251 encoded streams apart 1252 encoded streams 100% accuracy.

the lconvencoding unit has guessencoding sounds may of use.


Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -