.NET Framework - Detect non-standard characters in string

Asked By sippyucon on 24-Oct-07 11:09 AM
Hi

I have a project to take a MS Word doc and reformat the text into text files
that are
built into my App.

The only issue I have is some time there are some characters in MS Word that
are not printable when viewed in Notepad. I usually catch by looking at the
text in my App. Usually the problem is
an extra long hyphen --
a dagger +

Usually when I debug the string I see a squareblock in the string

Is there someway to trap the characters that will be not printable/viewable
in say notepad????

Thanks




PRSoC replied on 24-Oct-07 11:26 AM
You could probably use Char.IsSymbol() in this case

--
Browse http://connect.microsoft.com/VisualStudio/feedback/ and vote.
http://www.peterRitchie.com/blog/
Microsoft MVP, Visual Developer - Visual C#
Nicholas Paldino [.NET/C# MVP] replied on 24-Oct-07 11:41 AM
I would just check against each numeric character value to see if the
character is outside the range of ASCII characters.  Most likely, what is
happening is that the text is being placed on the clipboard as unicode, but
then when you try to paste it into notepad (which is using ASCII), it does
it's best by using the square character to indicate that it couldn't perform
a conversion.

--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
PRSoC replied on 24-Oct-07 12:03 PM
I don't know how the OP has configured notepad or Word ; but notepad supports
Unicode.

The "square character" could be the glyph that is displayed for a Unicode
character not supported by the current font.  Char.IsSymbol should still
catch it, at least in the case of dagger and em dash.  I don't know what most
fonts are like for support of "printable" characters; but it does depend on
the font what is "printable/viewable".

--
Browse http://connect.microsoft.com/VisualStudio/feedback/ and vote.
http://www.peterRitchie.com/blog/
Microsoft MVP, Visual Developer - Visual C#
Anthony Jones replied on 24-Oct-07 12:32 PM
files
that
the
printable/viewable

You need to use an Encoding object obtained via the Encoding.GetEncoding
static method.  This method allows you to specify the EncoderFallBack class
to use (this defaults to the EncoderReplacementFallback which simply
replaces un-encodable chars with ?).

By supplying the EncoderExceptionFallback object instead then when using the
Encoding to convert your content any out-of-band characters will cause an
EncoderFallbackException to be thrown.

The EncoderFallbackException has properties that you can use to discover
what character caused the problem and where it is.

--
Anthony Jones - MVP ASP/ASP.NET
wawan replied on 24-Oct-07 09:10 PM
I agree with Anthony here.

Some more references:

System.Text.Encoding Should be Avoided
http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx

http://msdn2.microsoft.com/en-us/library/tt6z1500(VS.80).aspx

Hope this helps.


Regards,
Walter Wang (wawang@online.microsoft.com, remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.