I have decided to write a short post on how to automate page download from the Visual Studio using Browser control after my long (longer than one day 🙂 ) very hard research on the internet. Couple of days ago I have decided to start my new personal project which is offline feed reader named ‘Crocodile’ (more on this in the near future) and after googling here and there I have decided to write the app in C#. The platform and language decision was mainly based on the fact that I couldn’t find anything like ‘Browser control’ in Java and mastering another language was also very tempting.
Browser control in VS is just stripped down version of IE and allows easy access to many features of normal IE browser straight from C# code. Everything looked very nice and things went smooth for a couple of development hours until I tried to implement automatic page saving. Using only methods exposed by browser control we can download the page by first displaying the “Save As..” dialog which is definietly not what I wanted.
I have decided that my new shiny ‘Crocodile’ will download the pages together with all goodness/badness like CSS, JS etc to see the pages exactly like they look online and I definietly don’t want to answer ‘OK’ to every single feed I want to download (about 100 a day).
I have found a simple solution on one of the VB or C# forums (don’t remember where exactly):
CDO.MessageClass message = new CDO.MessageClass(); message.CreateMHTMLBody(i.Link, CDO.CdoMHTMLFlags.cdoSuppressObjects, "", ""); ADODB.Stream st = message.GetStream(); st.SaveToFile("file.mht", ADODB.SaveOptionsEnum.adSaveCreateOverWrite);
All you have to do is add reference to Microsoft CDO and you are ready to go! There is a number of CdoMHTMLFlags to experiment with to suppress CSS stylesheets, images etc.
The drawback of this approach is the MHT format which can be used only by IE.