JackHammer Web Mining Component
JackHammer - the lean and mean web mining machine!
Writing screen scrapers and spiders that consume large amounts of bandwidth, guess passwords, grab information from a site and use it somewhere else may well be a violation of someone’s rights and could eventually land you in trouble. Before writing a screen scraper, first see if the website offers an RSS feed or an API for the data you seek. If not and you have to use a scraper, first check the websites' policies regarding the use of automated tools. I am not advocating screen scraping and you can't say that I didn't warn you so, with all that being said...
Go scrape some web pages!
JackHammer is a non-visual, slimmed down, partial rewrite of the MozNET library. JackHammer doesn't support custom XPCom components nor the ease of access to most DOM elements likeMozNET does. Instead, JackHammer is focused on web automation and data mining. In place of the non-supported features both HtmlAgilityPack andFizzlerEx have been fully integrated into the component. In addition, there are extension methods that allow you to go between DOMElement instances and HtmlNode instances and so forth. Also integrated is the XulRunner 184.108.40.206 runtime, al'a the MozNET AIO library. JackHammer is a single library component and contains everything it needs to function. After all the dieting we have one Hell of a lean, mean, web ripping machine!
JackHammer was born out of the necessity for a component that is easy to automate but also, offers the ability to collect web data in the easiest manner possible. The "Utah-nian marriage" ofMozNET, HtmlAgilityPack andFizzlerEx just seemed like the most natural progression. I'm almost wondering why I, nor anyone else for that matter, haven't thought of it before. Since MozNET is fully useable in a non-visual environment a non-visual version should be available to address the need for such a component to use in data mining other web sites - hence the birth of JackHammer (although non-visual, like MozNET it still requires a message pump).
Now, I know what you *purists* may be thinking.. I've gone and *stolen* the code for both HtmlAgilityPack and FizzlerEx but, alas, that is not the case.
I haven't changed a single line of code in either library (Edit: I fixed and notated a bug in HtmlAgilityPack.HtmlNode -> Ancestors() method). I've simply meshed them into JackHammer and made interaction between them and JackHammer as easy as apple pie. The only modification, if you want to call it that, made to the meshed libraries is the fact that I've condensed all the classes in the Fizzler.Systems.HtmlAgilityPack namespace into a single file. The namespace names have even remained the same. The source code for the meshed libraries will be made freely available; the Fizzler.Systems.HtmlAgilityPack combined class file will be made freely available as well. The source code for the original, meshed, libraries remains under their respective licenses -MS-PL and LGPL. Features At A Glance
Check the new help manual for more information, specifically the JScript class description.
* These are extension methods of the DOMNode class and provide support to go from DOMNode to HtmlNode.
* These are implemented as direct members of the DOMNode class.
JackHammer also retains the new Click() method on DOMElement based instances as well as the DrawToBitmap method and its overloads. Essentially, anything that isn't needed for automation or data collection has been removed. At the present time, JackHammer weighs in at a scant 10,800KB. Considering that it's packing the weight of the XulRunner runtime, I'd have to say it's quite slender. When it comes down to it, you, like me, will be left wondering "Why the !*?#& hasn't someone done this before!?".
Select Your Preferred Build:
JackHammer updates are free forever!
This download includes a basic, full-source, example project. The example project is written in WPF/C# but, JackHammer may be used with any .NET language. The included library is only useable with the bundled example project; although, it can still be used to freely access both HtmlAgilityPack and FizzlerEx, meshed, libraries when referenced in a 3rd party (your) application.The example project, like the one bundled with MozNET, may be modified and experimented with 'till your heart's content. It simply may not be used as a commercial application nor redistributed as your own.
Unlike MozNET, there isn't a two week trial build for JackHammer. If you wish to evaluate the library in your own project it is recommended to try MozNET and base your needs upon that. Although JackHammer doesn't have the extravagant feature set like MozNET, you will get a good idea of how JackHammer will work in your own program - keep in mind that JackHammer is focused on data scraping and isn't intended for use as a "Web Browser".. This download includes both HtmlAgilityPack,
FizzlerEx and the condensed Fizzler.Systems.HtmlAgilityPack namespace file.
The Se7enSoft.JackHammer.dll library is compiled targeting the Microsoft .NET Framework 3.5 Client Edition.You will receive a key to unlock the component for use in your own applications upon the completed purchase transaction of the compiled binary.The key used in the example application will not work outside of that project.