WebART activities in 2014: search, research & the unarchived Web

posted Oct 14, 2014
2014 has been an eventful year for WebART: several papers were accepted at major conferences, and we have presented at a multitude of events. This post summarizes some of our work until September 2014.

In May 2014, we presented work at the IIPC General Assembly: "Scholarly use and issues of web archives". Anat Ben-David presented joint work on Search as Research, and the methodological implications of using searchable Web archives for research. A paper on this topic is published in a special issue of the Alexandria journal on web archiving [1].

Secondly, we have explored the contents of the Dutch Web archive, and moved from studying just the archived pages, to studying the unarchived Web. By using the archives´contents and the link structure, we categorized the archived contents, but also provided estimates on the size of the unarchived parts of the Dutch Web. In addition to that, we created representations of unarchived contents using aggregated anchor text (text describing the links in the archive). This work has been documented in a short paper for the SIGIR 2014 conference [2], and a full paper at the IEEE/ACM Joint conference on Digital Libraries 2014 [3]. In the latter conference, our paper has been shortlisted for the best paper award. 

On the system and data side, joint work related to efficient retrieval and suggestions has been accepted to the ECIR conference [4,5]. Furthermore, research on support of cognitive search stages in search systems and interfaces was presented at the Information Interaction in Context (IIiX) conference 2014 [6], and awarded with a best presentation award.

The next event on the agenda is the "Webarchivering in Nederland" (Web archiving in the Netherlands) symposium, which includes international and Dutch speakers presenting about the topic of Web archiving and the use of Web archives. The presentation of WebART at this event is entitled "Enabling Scholarly Research in the KB Web Archive".

