wtorek, 15 grudnia 2015

Web Image Downloader - HtmlParser

Before I started today, I decided to go back for a moment and look at my tests so far. Everything seems fine, except for the WebRetriever class I've extracted yesterday. The tests for it were part of HeaderRetrieverTest and there was also one case missing. Let me quickly show you the changes.
First thing was to merge WebRequestCreatorForContentStub with WebRequestCreatorForHeaderStub. You can see that I return three stub requests. Two last of them are related to test my actual implementations of HeaderRetriever and ContentRetriever which I have already tested. Now the one with error simulates situation when response comes back with an error code, lets say NotFound. That was the case I was missing. The other test just makes sure that if StatusCode is alright the abstract method is able to return.
To complete those two tests another stubs were required
Alright. Enough enhancements for now. Lets get back to the main quest. Today I'll take care of following step or second part of it as the first one is complete - content is being read from the website and strings matching img pattern extracted.
To accomplish this I'll decided to go easy way and use HtmlAgilityPack library instead of scratching up anything on my own.
There are just four cases for now that I want to check so here they are:
And making that pass will assure me that I'm really close to start building models. There will be one more step before that. You can notice the _link2 is relative. Some webpages provide with full urls while others will show just relative paths. To download anything I need to have absolute path to the resource, but it's nothing to worry this time. Just make sure it comes back from my parser.
Thanks to HtmlAgilityPack implementing this class is piece of cake. Both SelectSingleNode and SelectNodes take XPath expression to look for the nodes, you can read about it here. It was helpful to take a look into comments on those methods. I discovered that if no nodes are found then they return just null that made check at GetImageLinks() easier as I didn't have to verify if the collection is empty or not. Handling those exceptions will allow me later to apply default title for website or just discard if from further process notifying user that there's nothing to download there...And that made me realise that it will come in handy if I derive from HtmlWebException and make two new ones that will tell me directly whats missing, so I can avoid any further checks when this occurs.
With those two wonderful exceptions and four more green lights on my path I can say that todays goal is accomplished.

Brak komentarzy:

Prześlij komentarz