Recently had an issue where a series of XML files were refusing to import into a program due to invalid characters; invalid from the importing software’s perspective. I needed to find characters that weren’t in the normal ASCII character set. Simple solution to this is a search for the following pattern:
[^\x00-\x7F]
That will give you matches on characters that fall outside of ASCII characters 0-127, which is the range of things causing me problems, including 2-byte chars.