Web archiving support and consultancy
We support our clients to create their own web and social media archiving centre by installing, customising and maintaining the Webternity platform.
Cloud based web and social media archiving
Our clients are able to create and manage their own web archiving repositories through our online platform.
On demand web content analytics
We support our clients to develop valuable insights from their repositories, on terms of their interest.
The ephemeral nature of web and social media sites, especially of blogs (comments, online discussions etc.) leave them at substantial risk of being lost. Memory Institutions (libraries, museums, archives) and organisations are researching for ways to ensure long-term preservation and reuse of web content.
In Webternity we have the answer: we have developed an exciting system to harvest, preserve, manage and reuse web content. The system is performing an intelligent harvesting operation which retrieves and parses hypertext as well as all other associated content (images, linked files, etc.) from websites. The parsing action is able to render the captured content into structured data, expressed in XML; it does this in accordance with the our data model.
The result of this action is carving semantic entities out of web content on an unprecedented micro-level. Author names, comments, subjects, tags, categories, dates, links, and many other elements are expressed within a hierarchical structure. This content is imported into the Webternity repository (based on CERN’s Invenio platform), a public-facing web archiving mechanism which provides facilities to preserve, view, interrogate and reuse the content to an unprecedented degree of detail.
Anthony Minas Krasakis
State-of-the-art web and social media archiving
The Webternity platform is one of the major results of the BlogForever project. It is a simple state-of-the-art web archiving platform for preserving web content, ensuring its authenticity, integrity, completeness, usability, and long term accessibility as a valuable cultural, social, and intellectual resource. The platform is composed of two major components:
- the Webternity spider component, and
- the Webternity repository.
The Webternity spider is responsible for crawling all the necessary web data and characteristics designated for preservation. Special emphasis is given to real-time crawling and extraction of web data in order to deliver original content for preservation as well as a structured feed of enriched XML containing metadata and semantics.
The Webternity repository is based on CERN’s Invenio platform. Our repository is a public-facing web archiving mechanism which provides facilities to preserve, organise, view, query & reuse the web content to an unprecedented degree of detail.