jquery - Text Matching not working for Arabic issue may be due to regex for arabic -
i have been working add functionality multilingual website have highlight matching tag keywords.
this functionality works english version doesn't not fire arabic version.
i have set sample on jsfiddle
sample code
function highlightkeywords(keywords) { var el = $("#article-detail-desc"); var language = "ar-ae"; var pid = 32; var issueid = 18; $(keywords).each(function() { // var pattern = new regexp("("+this+")", ["gi"]); //breaks html var pattern = new regexp("(\\b"+this+"\\b)(?![^<]*?>)", ["gi"]); //looks match outside html tags var rs = "<a class='ad-keyword-selected' href='http://www.alshindagah.com/ar/search.aspx?language="+language+"&pageid="+pid+"&issue="+issueid+"&search=$1' title='seach website for: $1'><span style='color:#990044; tex-decoration:none;'>$1</span></a>"; el.html(el.html().replace(pattern, rs)); }); } highlightkeywords(["you","الهدف","طهران","سيما","حاليا","hello","34","english"]); //popup tooltip article keywords $(function() { $("#article-detail-desc").tooltip({ position: { my: "center bottom-20", at: "center top", using: function( position, feedback ) { $( ).css( position ); $( "<div>" ) .addclass( "arrow" ) .addclass( feedback.vertical ) .addclass( feedback.horizontal ) .appendto( ); } } }); });
i store keywords in array & match them text in particular div.
i not sure problem due unicode or what. in respect appreciated.
there 3 sections answer
why it's not working
an example of how approach in english (meant adapted arabic clue arabic)
a stab @ doing arabic version (me) hasn't clue arabic :-)
why it's not working
at least part of problem you're relying on \b
assertion, (like counterparts \b
, \w
, , \w
) english-centric. can't rely on in other languages (or even, really, in english — see below).
here's definition of \b
in the spec:
the production assertion
:: \ b
evaluates returning internalassertiontester
closure takesstate
argumentx
, performs following:
- let
e
x
'sendindex
.- call
iswordchar(e–1)
, leta
boolean
result.- call
iswordchar(e)
, letb
boolean
result.- if
a
true
,b
false
, returntrue
.- if
a
false
,b
true
, returntrue
.- return
false
.
...where iswordchar
defined further down meaning 1 of these 63 characters:
a b c d e f g h j k l m n o p q r s t u v w x y z b c d e f g h j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 _
e.g., 26 english letters a
z
in upper or lower case, digits 0
9
, , _
. (this means can't rely on \b
, \b
, \w
, or \w
in english, because english
has loan words "voilà", that's story.)
a first example using english
you'll have use different mechanism detecting word boundaries in arabic. if can come character class includes of arabic "code points" (as unicode puts it) make words, use code bit this:
var keywords = { "laboris": true, "laborum": true, "pariatur": true // ...and on... }; var text = /*... text work on... */; text = text.replace( /([abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)([^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)?/g, replacer); function replacer(m, c0, c1) { if (keywords[c0]) { c0 = '<a href="#">' + c0 + '</a>'; } return c0 + c1; }
notes on that:
- i've used class
[abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz]
mean "a word character". you'd have change (markedly) arabic. - i've used class
[^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz]
mean "not word character". same previous class negation (^
) @ outset. - the regular expression finds series of "word characters" followed optional series of non-word characters, using capture groups (
(...)
) both. string#replace
callsreplacer
function full text matched followed each capture group arguments.- the
replacer
function looks first capture group (the word) inkeywords
map see if it's keyword. if so, wraps in anchor. - the
replacer
function returns possibly-wrapped word plus non-word text followed it. string#replace
uses return valuereplacer
replace matched text.
here's full example of doing that: live copy | live source
<!doctype html> <html> <head> <meta charset=utf-8 /> <title>replacing keywords</title> </head> <body> <p>lorem ipsum dolor sit amet, consectetur adipisicing elit, sed eiusmod tempor incididunt ut labore et dolore magna aliqua. ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p> <script src="http://code.jquery.com/jquery-1.9.1.min.js"></script> <script> (function() { // our keywords. there lots of ways can produce // map, here i've done literally var keywords = { "laboris": true, "laborum": true, "pariatur": true }; // loop through our paragraphs (okay, have one) $("p").each(function() { var $this, text; // we'll use jquery on `this` more once, // grab wrapper $this = $(this); // text of paragraph // note strips off html tags, // real-world solution might need loop // through text nodes rather act // on full text @ once text = $this.text(); // replacements // these character classes match javascript's // definition of "word" character , // english-centric, you'd change text = text.replace( /([abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)([^abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz0123456789_]+)?/g, replacer); // update paragraph $this.html(text); }); // our replacer. define separately rather // inline because use more once function replacer(m, c0, c1) { // word in our keywords map? if (keywords[c0]) { // yes, wrap c0 = '<a href="#">' + c0 + '</a>'; } return c0 + c1; } })(); </script> </body> </html>
a stab @ doing arabic
i took @ stab @ arabic version. according arabic script in unicode page on wikipedia, there several code ranges used, of text in example fell primary range of u+0600 u+06ff.
here's came with: fiddle (i prefer jsbin, used above, couldn't text come out right way around.)
(function() { // our keywords. there lots of ways can produce // map, here i've done literally var keywords = { "الهدف": true, "طهران": true, "سيما": true, "حاليا": true }; // loop through our paragraphs (okay, have two) $("p").each(function() { var $this, text; // we'll use jquery on `this` more once, // grab wrapper $this = $(this); // text of paragraph // note strips off html tags, // real-world solution might need loop // through text nodes rather act // on full text @ once text = $this.text(); // replacements // these character classes use primary // arabic range of u+0600 u+06ff, may // need add others. text = text.replace( /([\u0600-\u06ff]+)([^\u0600-\u06ff]+)?/g, replacer); // update paragraph $this.html(text); }); // our replacer. define separately rather // inline because use more once function replacer(m, c0, c1) { // word in our keywords map? if (keywords[c0]) { // yes, wrap c0 = '<a href="#">' + c0 + '</a>'; } return c0 + c1; } })();
all did english function above was:
- use
[\u0600-\u06ff]
"a word character" ,[^\u0600-\u06ff]
"not word character". may need add of other ranges listed here (such appropriate style of numerals), again, of text in example fell ranges. - change keywords 3 of yours example (only 2 of seem in text).
to very non-arabic-reading eyes, seems work.
Comments
Post a Comment