Skip to Content

boilerpipeR

Interface to the boilerpipe Java library by Christian Kohlschutter (http://code.google.com/p/boilerpipe/)
Mario Annau [aut, cre]
Apache License (== 2.0)
Generic Extraction of main text content from HTML files; removal of ads, sidebars and headers using the boilerpipe Java library. The extraction heuristics from boilerpipe show a robust performance for a wide range of web site templates.
Versions
Package Version Released
boilerpipeR 1.0 2 years 10 hours ago
0
Your rating: None
0
Your rating: None