sublimetext3 - Translate accented to unaccented characters in Sublime Text snippet using regex -
i'm writing st3 snippet inserts \subsection{}
label. label created converting header text conform latex standards labels using (rather lengthy) regular expression:
${1/(?:([ \t_]+)?|\b)(?:([ÅÄÆÁÀÃ])?|\b)(?:([åäæâàáã])?|\b)(?:([ÉÈÊË])?|\b)(?:([éèëê])?|\b)(?:([ÌÌÎÏ])?|\b)(?:([íìïî])?|\b)(?:([Ñ])?|\b)(?:([ñ])?|\b)(?:([ÖØÓÒÔÖÕ])?|\b)(?:([öøóòôõ])?|\b)(?:([ÜÛÚÙ])?|\b)(?:([üûúù])?|\b)/(?1:-)(?2:a)(?3:a)(?4:e)(?5:e)(?6:i)(?7:i)(?8:n)(?9:n)(?10o)(?11:o)(?12:u)(?13:u)/g}
actually, longer. if add groups like, st3 crashes when execute snippet.
${1/(?:([ \t_]+)?|\b)(?:([ÅÄÆÁÀÃ])?|\b)(?:([åäæâàáã])?|\b)(?:([Ç])?|\b)(?:([ç])?|\b)(?:([ÉÈÊË])?|\b)(?:([éèëê])?|\b)(?:([ÌÌÎÏ])?|\b)(?:([íìïî])?|\b)(?:([Ñ])?|\b)(?:([ñ])?|\b)(?:([ÖØÓÒÔÖÕ])?|\b)(?:([öøóòôõ])?|\b)(?:([ÜÛÚÙ])?|\b)(?:([üûúù])?|\b)(?:([Ý])?|\b)(?:([ÿý])?|\b)/(?1:-)(?2:a)(?3:a)(?4:c)(?5:c)(?6:e)(?7:e)(?8:i)(?9:i)(?10:o)(?11:o)(?12:n)(?13:n)(?14:u)(?15:u)(?16:y)(?17:y)/g}
is there more efficient way of doing this? preferably 1 won't cause st3 crash ;)
edit: here example strings:
flygande bæckasiner søka hwila på mjuka tuvor Åke staël hade en överflödig idé
and results (with current, working regex):
flygande-backasiner-soka-hwila-pa-mjuka-tuvor ake-stael-hade-en-overflodig-ide
but replace characters (ÇçÝÿý) unaccented counterparts (ccyyy) e.g.
comment ça va
becomes
comment-ca-va
i don't know syntax, suspect problem comes many optional groups combined lot of alternatives cause complex processing.
so can try design pattern this, , can add other groups of letters in same way (take @ unicode table find character ranges):
${1/([ \t_]+)|([À-Å])|([à-å])|([È-Ë])|([è-ë])|([Ì-Ï])|([ì-ï])|([Ò-ÖØ])|([ò-öø])|([Ù-Ü])|([ù-ü])|(Æ)|(æ)|(Œ)|(œ)|(Ñ)|(ñ)/(?1:-)(?2:a)(?3:a)(?4:e)(?5:e)(?6:i)(?7:i)(?8:o)(?9:o)(?10:u)(?11:u)(?12:ae)(?13:ae)(?14:oe)(?15:oe)(?16:n)(?17:n)/g}
if lookahead feature available can improve pattern prevent non-accented characters tested each alternatives:
${1/(?=[ \t_À-ÆÈ-ÏÑ-ÖØ-Üà-æè-ïñ-öø-üŒœ])(?:([ \t_]+)|([À-Å])|([à-å])|([È-Ë])|([è-ë])|([Ì-Ï])|([ì-ï])|([Ò-ÖØ])|([ò-öø])|([Ù-Ü])|([ù-ü])|(Æ)|(æ)|(Œ)|(œ)|(Ñ)|(ñ))/(?1:-)(?2:a)(?3:a)(?4:e)(?5:e)(?6:i)(?7:i)(?8:o)(?9:o)(?10:u)(?11:u)(?12:ae)(?13:ae)(?14:oe)(?15:oe)(?16:n)(?17:n)/g}
note: Æ
(aelig) must transliterated ae
(the same Œ
=> oe
)
Comments
Post a Comment