jquery - Text Matching not working for Arabic issue may be due to regex for arabic -

i have been working add functionality multilingual website have highlight matching tag keywords.

this functionality works english version doesn't not fire arabic version.

i have set sample on jsfiddle

sample code

    function highlightkeywords(keywords)     {                 var el = $("#article-detail-desc");         var language = "ar-ae";         var pid = 32;         var issueid = 18;          $(keywords).each(function()         {            // var pattern = new regexp("("+this+")", ["gi"]); //breaks html             var pattern = new regexp("(\\b"+this+"\\b)(?![^<]*?>)", ["gi"]); //looks match outside html tags             var rs = "<a class='ad-keyword-selected' href='http://www.alshindagah.com/ar/search.aspx?language="+language+"&pageid="+pid+"&issue="+issueid+"&search=$1' title='seach website for:  $1'><span style='color:#990044; tex-decoration:none;'>$1</span></a>";             el.html(el.html().replace(pattern, rs));         });     }     highlightkeywords(["you","الهدف","طهران","سيما","حاليا","hello","34","english"]);  //popup tooltip article keywords      $(function() {         $("#article-detail-desc").tooltip({         position: {             my: "center bottom-20",             at: "center top",             using: function( position, feedback ) {             $( ).css( position );             $( "<div>" )             .addclass( "arrow" )             .addclass( feedback.vertical )             .addclass( feedback.horizontal )             .appendto( );         }         }         });     });

i store keywords in array & match them text in particular div.

i not sure problem due unicode or what. in respect appreciated.

there 3 sections answer

why it's not working
an example of how approach in english (meant adapted arabic clue arabic)
a stab @ doing arabic version (me) hasn't clue arabic :-)

why it's not working

at least part of problem you're relying on \b assertion, (like counterparts \b, \w, , \w) english-centric. can't rely on in other languages (or even, really, in english — see below).

here's definition of \b in the spec:

the production assertion :: \ b evaluates returning internal assertiontester closure takes state argument x , performs following:

let e x's endindex.

call iswordchar(e–1) , let a boolean result.

call iswordchar(e) , let b boolean result.

if a true , b false, return true.

if a false , b true, return true.

return false.

...where iswordchar defined further down meaning 1 of these 63 characters:

a  b  c  d  e  f  g  h   j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z  b  c  d  e  f  g  h   j  k  l  m  n  o  p  q  r  s  t  u  v  w  x  y  z 0  1  2  3  4  5  6  7  8  9  _

e.g., 26 english letters a z in upper or lower case, digits 0 9, , _. (this means can't rely on \b, \b, \w, or \w in english, because english has loan words "voilà", that's story.)

a first example using english

you'll have use different mechanism detecting word boundaries in arabic. if can come character class includes of arabic "code points" (as unicode puts it) make words, use code bit this:

var keywords = {     "laboris": true,     "laborum": true,     "pariatur": true     // ...and on... }; var text = /*... text work on... */; text = text.replace(     /([abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)([^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)?/g,     replacer);  function replacer(m, c0, c1) {     if (keywords[c0]) {         c0 = '<a href="#">' + c0 + '</a>';     }     return c0 + c1; }

notes on that:

i've used class [abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz] mean "a word character". you'd have change (markedly) arabic.
i've used class [^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz] mean "not word character". same previous class negation (^) @ outset.
the regular expression finds series of "word characters" followed optional series of non-word characters, using capture groups ((...)) both.
string#replace calls replacer function full text matched followed each capture group arguments.
the replacer function looks first capture group (the word) in keywords map see if it's keyword. if so, wraps in anchor.
the replacer function returns possibly-wrapped word plus non-word text followed it.
string#replace uses return value replacer replace matched text.

here's full example of doing that: live copy | live source

<!doctype html> <html> <head> <meta charset=utf-8 /> <title>replacing keywords</title> </head> <body>   <p>lorem ipsum dolor sit amet, consectetur adipisicing elit, sed eiusmod tempor incididunt ut labore et dolore magna aliqua. ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>    <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script>   <script>     (function() {       // our keywords. there lots of ways can produce       // map, here i've done literally       var keywords = {         "laboris": true,         "laborum": true,         "pariatur": true       };        // loop through our paragraphs (okay, have one)       $("p").each(function() {         var $this, text;          // we'll use jquery on `this` more once,         // grab wrapper         $this = $(this);          // text of paragraph         // note strips off html tags,         // real-world solution might need loop         // through text nodes rather act         // on full text @ once         text = $this.text();          // replacements         // these character classes match javascript's         // definition of "word" character ,         // english-centric, you'd change         text = text.replace(           /([abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)([^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)?/g,           replacer);          // update paragraph         $this.html(text);       });        // our replacer. define separately rather       // inline because use more once             function replacer(m, c0, c1) {         // word in our keywords map?         if (keywords[c0]) {           // yes, wrap           c0 = '<a href="#">' + c0 + '</a>';         }         return c0 + c1;       }     })();   </script> </body> </html>

a stab @ doing arabic

i took @ stab @ arabic version. according arabic script in unicode page on wikipedia, there several code ranges used, of text in example fell primary range of u+0600 u+06ff.

here's came with: fiddle (i prefer jsbin, used above, couldn't text come out right way around.)

(function() {     // our keywords. there lots of ways can produce     // map, here i've done literally     var keywords = {         "الهدف": true,         "طهران": true,         "سيما": true,         "حاليا": true     };      // loop through our paragraphs (okay, have two)     $("p").each(function() {         var $this, text;          // we'll use jquery on `this` more once,         // grab wrapper         $this = $(this);          // text of paragraph         // note strips off html tags,         // real-world solution might need loop         // through text nodes rather act         // on full text @ once         text = $this.text();          // replacements         // these character classes use primary         // arabic range of u+0600 u+06ff, may         // need add others.         text = text.replace(             /([\u0600-\u06ff]+)([^\u0600-\u06ff]+)?/g,             replacer);          // update paragraph         $this.html(text);     });      // our replacer. define separately rather     // inline because use more once           function replacer(m, c0, c1) {         // word in our keywords map?         if (keywords[c0]) {             // yes, wrap             c0 = '<a href="#">' + c0 + '</a>';         }         return c0 + c1;     } })();

all did english function above was:

use [\u0600-\u06ff] "a word character" , [^\u0600-\u06ff] "not word character". may need add of other ranges listed here (such appropriate style of numerals), again, of text in example fell ranges.
change keywords 3 of yours example (only 2 of seem in text).

to very non-arabic-reading eyes, seems work.

Search This Blog

KHS