IHTMLDocument2
(1)
HttpWebResponse
(1)
HttpWebRequest
(1)
ApartmentState
(1)
WebBrowser
(1)
Asycronous
(1)
Documet
(1)
Induvidual
(1)

Multi-Threaded App

Asked By Robert Sheppard
14-Feb-08 03:42 PM
I am new to C# and am trying to build a multi-threaded web crawler. I want
to crawl many sites all at once. I know how to use IHTMLDocument2 to parse
the document object but I want to launch multiple threads to parse each
induvidual web page.

With the WebBrowser control I can start parsing when I get the
Documet_Complete event but how can I do this with each web site on a
different thread? How are the Document_Complete events
handled in a multi-threaded environment?

This is an Asycronous operation and so I cannot see how it can be
done.

Robert, This would be difficult in this situation.

Asked By Nicholas Paldino [.NET/C# MVP]
14-Feb-08 04:18 PM
Robert,

This would be difficult in this situation.  You couldn't use the
WebBrowser control, because it needs to be tied to a UI thread.

You could use MSHTML through COM interop.  However, you would have to
make sure that every thread that you use MSHTML on is set up so that the
ApartmentState for that thread is STA.  I am not sure about this, but I also
believe you would have to pump messages in order for the events to work
correctly.

Needless to say, it's a better idea in this case to use
HttpWebRequest/HttpWebResponse and then take the content from those and set
the content of a new MSHTML instance in your thread to the content
downloaded.  This way, you don't have to wait for MSHTML to download the
document, and you can work with it right away.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com

Thanks... I will look at HttpWebRequest/HttpWebResponse.

Asked By Robert Sheppard
14-Feb-08 05:54 PM
Thanks... I will look at HttpWebRequest/HttpWebResponse. The old VB6 crawler
that I am porting from was using the WebBrowser control, which works fine
but very slow. Let me stress SLOW.
Thanks again for the help.

also
set
want
parse

Robert, Do you have a specific need to parse the entire document, or are

Asked By Nicholas Paldino [.NET/C# MVP]
14-Feb-08 10:19 PM
Robert,

Do you have a specific need to parse the entire document, or are you
looking for specific parts?  If you don't need to parse the entire document,
and what you are looking to scrape from the HTML is specific, then using
HttpWebRequest and HttpWebResponse will probably simplify things
considerably.


--
- Nicholas Paldino [.NET/C# MVP]
- mvp@spam.guard.caspershouse.com
Post Question To EggHeadCafe