news‎ > ‎

WebART activities in 2014: search, research & the unarchived Web

posted Oct 14, 2014, 5:24 AM by Hugo Huurdeman   [ updated Oct 31, 2014, 6:14 AM ]
2014 has been an eventful year for WebART: several papers were accepted at major conferences, and we have presented at a multitude of events. This post summarizes some of our work until September 2014.

In May 2014, we presented work at the IIPC General Assembly: "Scholarly use and issues of web archives". Anat Ben-David presented joint work on Search as Research, and the methodological implications of using searchable Web archives for research. A paper on this topic is published in a special issue of the Alexandria journal on web archiving [1].

Secondly, we have explored the contents of the Dutch Web archive, and moved from studying just the archived pages, to studying the unarchived Web. By using the archives´contents and the link structure, we categorized the archived contents, but also provided estimates on the size of the unarchived parts of the Dutch Web. In addition to that, we created representations of unarchived contents using aggregated anchor text (text describing the links in the archive). This work has been documented in a short paper for the SIGIR 2014 conference [2], and a full paper at the IEEE/ACM Joint conference on Digital Libraries 2014 [3]. In the latter conference, our paper has been shortlisted for the best paper award. 

On the system and data side, joint work related to efficient retrieval and suggestions has been accepted to the ECIR conference [4,5]. Furthermore, research on support of cognitive search stages in search systems and interfaces was presented at the Information Interaction in Context (IIiX) conference 2014 [6], and awarded with a best presentation award.

The next event on the agenda is the "Webarchivering in Nederland" (Web archiving in the Netherlands) symposium, which includes international and Dutch speakers presenting about the topic of Web archiving and the use of Web archives. The presentation of WebART at this event is entitled "Enabling Scholarly Research in the KB Web Archive".

[1] Anat Ben-David, Hugo C. Huurdeman. Web Archive Search as Research: Methodological and Theoretical Implications. Alexandria Journal, Volume 25, No. 1 (2014). Manchester University Press. In press.
[2] Thaer Samar, Hugo C. Huurdeman, Anat Ben-David, Jaap Kamps, Arjen P. de Vries. Uncovering the Unarchived Web. In Proceedings of the 37th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York NY, 2014.
[3] Hugo C. Huurdeman, Anat Ben-David, Jaap Kamps, Thaer Samar, and Arjen P. de Vries. Finding pages on the unarchived web. In DL'14: Proceedings of the Digital Library Conference. ACM Press, New York NY, 2014. Nominated best paper award.
[4] Hannes Mühleisen, T. Samar, J. Lin, A.J. de Vries. Column Stores as an IR Prototyping Tool. Advances in Information Retrieval - 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014.
[5] A. Bellogin Kouki, T. Samar, Arjen P. de Vries, A. Said. Challenges on Combining Open Web and Dataset Evaluation Results: The Case of the Contextual Suggestion Track. Proceedings of European Conference on Information Retrieval (ECIR 2014), Lecture Notes in Computer Science, 2014.
[6] Hugo C. Huurdeman and Jaap Kamps. From multistage information-seeking models to multistage search systems. In IIiX'14: Proceedings of the Fifth Information Interaction in Context Conference. ACM Press, New York NY, 2014.