sublimetext3 - Translate accented to unaccented characters in Sublime Text snippet using regex -


i'm writing st3 snippet inserts \subsection{} label. label created converting header text conform latex standards labels using (rather lengthy) regular expression:

${1/(?:([ \t_]+)?|\b)(?:([ÅÄÆÁÀÃ])?|\b)(?:([åäæâàáã])?|\b)(?:([ÉÈÊË])?|\b)(?:([éèëê])?|\b)(?:([ÌÌÎÏ])?|\b)(?:([íìïî])?|\b)(?:([Ñ])?|\b)(?:([ñ])?|\b)(?:([ÖØÓÒÔÖÕ])?|\b)(?:([öøóòôõ])?|\b)(?:([ÜÛÚÙ])?|\b)(?:([üûúù])?|\b)/(?1:-)(?2:a)(?3:a)(?4:e)(?5:e)(?6:i)(?7:i)(?8:n)(?9:n)(?10o)(?11:o)(?12:u)(?13:u)/g} 

actually, longer. if add groups like, st3 crashes when execute snippet.

${1/(?:([ \t_]+)?|\b)(?:([ÅÄÆÁÀÃ])?|\b)(?:([åäæâàáã])?|\b)(?:([Ç])?|\b)(?:([ç])?|\b)(?:([ÉÈÊË])?|\b)(?:([éèëê])?|\b)(?:([ÌÌÎÏ])?|\b)(?:([íìïî])?|\b)(?:([Ñ])?|\b)(?:([ñ])?|\b)(?:([ÖØÓÒÔÖÕ])?|\b)(?:([öøóòôõ])?|\b)(?:([ÜÛÚÙ])?|\b)(?:([üûúù])?|\b)(?:([Ý])?|\b)(?:([ÿý])?|\b)/(?1:-)(?2:a)(?3:a)(?4:c)(?5:c)(?6:e)(?7:e)(?8:i)(?9:i)(?10:o)(?11:o)(?12:n)(?13:n)(?14:u)(?15:u)(?16:y)(?17:y)/g} 

is there more efficient way of doing this? preferably 1 won't cause st3 crash ;)

edit: here example strings:

flygande bæckasiner søka hwila på mjuka tuvor Åke staël hade en överflödig idé  

and results (with current, working regex):

flygande-backasiner-soka-hwila-pa-mjuka-tuvor ake-stael-hade-en-overflodig-ide 

but replace characters (ÇçÝÿý) unaccented counterparts (ccyyy) e.g.

comment ça va 

becomes

comment-ca-va 

i don't know syntax, suspect problem comes many optional groups combined lot of alternatives cause complex processing.

so can try design pattern this, , can add other groups of letters in same way (take @ unicode table find character ranges):

${1/([ \t_]+)|([À-Å])|([à-å])|([È-Ë])|([è-ë])|([Ì-Ï])|([ì-ï])|([Ò-ÖØ])|([ò-öø])|([Ù-Ü])|([ù-ü])|(Æ)|(æ)|(Œ)|(œ)|(Ñ)|(ñ)/(?1:-)(?2:a)(?3:a)(?4:e)(?5:e)(?6:i)(?7:i)(?8:o)(?9:o)(?10:u)(?11:u)(?12:ae)(?13:ae)(?14:oe)(?15:oe)(?16:n)(?17:n)/g} 

if lookahead feature available can improve pattern prevent non-accented characters tested each alternatives:

${1/(?=[ \t_À-ÆÈ-ÏÑ-ÖØ-Üà-æè-ïñ-öø-üŒœ])(?:([ \t_]+)|([À-Å])|([à-å])|([È-Ë])|([è-ë])|([Ì-Ï])|([ì-ï])|([Ò-ÖØ])|([ò-öø])|([Ù-Ü])|([ù-ü])|(Æ)|(æ)|(Œ)|(œ)|(Ñ)|(ñ))/(?1:-)(?2:a)(?3:a)(?4:e)(?5:e)(?6:i)(?7:i)(?8:o)(?9:o)(?10:u)(?11:u)(?12:ae)(?13:ae)(?14:oe)(?15:oe)(?16:n)(?17:n)/g} 

note: Æ (aelig) must transliterated ae (the same Œ => oe)


Comments

Popular posts from this blog

node.js - Mongoose: Cast to ObjectId failed for value on newly created object after setting the value -

[C++][SFML 2.2] Strange Performance Issues - Moving Mouse Lowers CPU Usage -

ios - Possible to get UIButton sizeThatFits to work? -