HTML::TextCruft CPAN module released

We have extracted a piece of the Media Cloud code base and released it as HTML::TextCruft — a stand alone CPAN module. HTML::TextCruft is the first part of the code to extract article text from HTML and remove ads, navigation, and other cruft.

Media Cloud has always been free and open source but since it is a large code base not everyone is able to install it. By releasing this piece as a separate module, we hope that its functionality will be more accessible to the wider community.

More information on HTML::TextCruft is available on its CPAN page.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *