D2.6 Crawler ready tagging tools
The LoCloud Crawler Ready Tagging Tools (henceforth CRTT) are a set of experimental tools for automatically extracting structured metadata from HTML mark-up loaded from web documents. The objective is to verify if the crawling/indexing method applied by the mainstream search engines could be a viable, simplified supplement to the comprehensive Europeana ingestion process. To this end, the CRTT have been validated using small institutions as a test case.
This deliverable describes the rationale, technology, validation testing and next steps for the LoCloud CRTT.