This week is my last week at the British Library. After more than eight years, the time has come to say goodbye.
I first of all think about the many kind and talented people I have worked with, especially those in my team. I have learnt so much from them and will miss them a lot. I know, out of habit in the next few weeks, I would dial their numbers, trying to pick their brain as I used to, just to realise that I can no longer do so.
I also think about the decisions we’ve made, things we’ve achieved and work that I have not managed to do.
In the early days of web archiving, we focused on learning the trade and building capability, knowing one day we’d expected to archive the entire UK web domain. We also started to realise that there was more than one way of using web archives. We explored this with the help of a group of pioneering researchers, many of them historians, who saw value in archived websites and regarded them a useful evidence for understanding life in the late 20th and early 21st century.
We had to work very hard to scale up and archive the UK web after Non-print Legal Deposit became effective in April 2013. Having completed two UK domain crawls and made available approximately six billion resources (sadly only within the Legal Deposit Libraries reading rooms), the challenge of scalability remains and will only get bigger. I sometimes worry about where this is all going: as our collection rapidly grows, does it reach a point where the data become totally unmanageable or unusable?
My concern is directly related to the hardest challenge for memory organisations: the chronicle lack of resources or the constant struggle of deciding where to spend the never-enough resources. This dilemma even applies to relatively well-funded institutions such as the British Library, where there is no question about the strategic importance of web archiving. Collecting content, developing software, requesting permission for open access, seeking external funding and working on research projects…When everything seems a priority, one can only fire fight and respond to the most urgent operational needs. For many of us, this means collecting content, before it disappears off the web.
Perhaps something drastic has to happen for us to understand truly what it means to work at web scale. We may need to fundamentally change the way we do web archiving and this will involve hard decisions, accepting that we cannot do things perfectly and that we no longer can do things manually.
Both undertaking web archiving and using web archives ought to be easier. While modern browsers are capable of obtaining and displaying most web content, our purpose-built crawler and replaying software leave many gaps; Web archives lack accessibility or flexibility, and frustrate researchers… We urgently need better tools (not necessarily developed by ourselves). While many of us still have to address limited use and justify the cause of web archiving, an obvious and powerful thing to do would be to pull together and highlight the content in all web archives which has disappeared from the live web since the 1990’s. Consider the former my top priority and the latter my number one wish.
I will be joining the Internet Archive later in the month, as Director of Global Web Services, to expand collaboration with libraries and archives around the world and help advance the practices of web archiving.
The work will continue.