c - sizeof character and strlen string mismatch -
as per code, assume each greek character stored in 2bytes. sizeof
returns size of each character 4 (i.e sizeof int
)
how strlen
return 16 ? [making me think each character occupies 2 bytes] (shouldn't 4*8 = 32 ? since counts number of bytes.)
also, how printf("%c",bigstring[i]);
print each character properly? shouldn't read 1 byte (a char) , display because of %c
, why greek character not split in case.
strcpy(bigstring,"ειδικούς");//greek slen = strlen(bigstring); printf("size %d\n ",sizeof('ε')); //printing each character printf("%s of length %d\n",bigstring,slen); int k1 = 0 ,k2 = slen - 2; for(i=0;i<slen;i++) printf("%c",bigstring[i]);
output:
size 4 ειδικούς of length 16 ειδικούς
character literals in c have type
int
,sizeof('ε')
samesizeof(int)
. you're playing fire in statement, bit.'ε'
multicharacter literal, isn't standard, , might come bite you. careful using extensions one. clang, example, won't accept program literal in it. gcc gives warning, still compile it.strlen
returns 16, since that's number of bytes in string before null-terminator. greek characters 16 bits long in utf-8, string looks like:c0c0 c1c1 c2c2 c3c3 c4c4 c5c5 c6c6 c7c7 0
in memory,
c0c0
, example, 2 bytes of first character. there single null-termination byte in string.the
printf
appears work because terminal utf-8 aware. are printing each byte separately, terminal interpreting first 2 prints single character, , on. if changeprintf
call to:printf("%d: %02x\n", i, (unsigned char)bigstring[i]);
you'll see byte-by-byte behaviour you're expecting.
Comments
Post a Comment