objective c - Find and replace long words in an NSString? -
i'm trying write method search nsstring, determine if individual word within string on 6 characters long , replace word other word (something arbitrary 'hello').
i starting long paragraph , need end single nsstring object format , spacing has not been affected find , replace.
why answer?
there couple of subtle problems simple solutions using componentsseparatedbystring:
:
- punctuation not handled word delimiters.
- whitespace other space character (newline, tab) dropped.
- on long strings lot of memory wasted.
- it's slow.
example
assuming substitution word of "–" string ...
“essentially,” d.h.c. concluded,
”bokanovskification consists of series of arrests of development.”
... result in ...
– d.h.c. – – of series of – of –
... while correct output be:
“–,” d.h.c. –,
”– – of series of – of –.”
solution
fortunately there's better, yet simple solution in cocoa: -[nsstring enumeratesubstringsinrange:options:usingblock:]
it provides fast iteration on substrings defined options
argument. 1 possibility nsstringenumerationbywords
enumerates substrings real words (in current locale). detects individual words in languages don't use delimiters (spaces) separate words, japanese.
comparing solutions
here's simple demo project works on jargon file (1.6 mb, 237,239 words). compares 3 different solutions:
- componentsseparatedbystring: 270 ms
- enumeratesubstringsinrange: 125 ms
- stringbyreplacingoccurrencesofstring, described @monolo: 200 ms
implementation
the core of replacement loop:
nsmutablestring *result = [nsmutablestring stringwithcapacity:[originalstring length]]; __block nsuinteger location = 0; [originalstring enumeratesubstringsinrange:(nsrange){0, [originalstring length]} options:nsstringenumerationbywords | nsstringenumerationlocalized | nsstringenumerationsubstringnotrequired usingblock:^(nsstring *substring, nsrange substringrange, nsrange enclosingrange, bool *stop) { if (substringrange.length > maxchar) { nsstring *charactersbetweenlongwords = [originalstring substringwithrange:(nsrange){ location, substringrange.location - location }]; [result appendstring:charactersbetweenlongwords]; [result appendstring:replaceword]; location = substringrange.location + substringrange.length; } }]; [result appendstring:[originalstring substringfromindex:location]];
caveat
as pointed out monolo proposed code uses nsstring
's length determine number of characters of word. that's questionable approach, least. in fact string's length
specifies number of code fragments used encode string, value defers human assume number of characters.
as term "character" has different meanings in various contexts , op didn't specify kind of character count use leave code was. if want different count please refer documentation discusses topic:
- apple's string programming guide, characters , grapheme clusters
- unicode faq: how characters counted when measuring length or position of character in string?
Comments
Post a Comment