Quantcast
Channel: Adobe Community : Popular Discussions - InDesign Scripting
Viewing all articles
Browse latest Browse all 15932

Filter text.contents (removing special characters)

$
0
0

Hi guys,

 

I want to extract a string from a bunch of text (here a selection for example). This text is xml tagged.

 

If I do selection[0].contents, it captures the text and all the special characters (XML tags, carriage return). I can check something is "wrong" cause contents.length is greater than expected (John(space)Smith > 10 characters but contents.length > 14). I am not really surprised cause I knew this behaviour.

 

So I tried to filter it to remove any content which is not an alphanumeric character but here is where I fail.

If I use GREP with contents.match(/[\w]+/g), it's quite perfect. But if the contents has diacritics, this pattern fails to catch them.

So I could include them in the pattern but it's really probable I miss a lot.

 

So my question is "how to extract the pure text from the contents ensuring I get all the diacritics if any but without carrying special characters ?

 

TIA Loiccontents.jpg


Viewing all articles
Browse latest Browse all 15932

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>