Sisyphus repository
Last update: 23 may 2019 | SRPMs: 17691 | Visits: 13583288
en ru br
ALT Linux repos

Group :: Development/Java
RPM: boilerpipe

 Main   Changelog   Spec   Patches   Sources   Download   Gear   Bugs and FR  Repocop 

Current version: 1.2.0-alt1_12jpp8
Build date: 4 february 2019, 23:53 ( 15.4 weeks ago )
Size: 66.96 Kb

Home page:

License: ASL 2.0
Summary: Boilerplate Removal and Fulltext Extraction from HTML pages

The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.

The library already provides specific strategies
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate.

Current maintainer: Igor Vlasenko

List of contributors

List of rpms provided by this srpm:

  • boilerpipe
  • boilerpipe-javadoc
    design & coding: Vladimir Lettiev aka crux © 2004-2005, Andrew Avramenko aka liks © 2007-2008
    current maintainer: Michael Shigorin