Access and Scholarly Use of Web Archives

[This is the manuscript of the article published in Alexandria, Volume 25, No. 1/2(2014), pp. 113-127
http://manchester.metapress.com/content/5g4270j8l6472pl8/?p=bb47b9c8113747d38a9e9daf250bf446&pi=7. It contains small variations from the published version. That particular issue of Alexandria was devoted to web archiving and co-edited by Richard Gibby and myself.]

ABSTRACT

This article provides an overview of current access arrangements of web archives. Using data from a user survey of the UK Web Archive, it attempts to analyse the reasons for limited scholarly use of web archives. It also examines the key characteristics of scholarly use of digital sources and translates these into a set of requirements for web archives. It discusses how currently web archives fail to meet the evolving scholarly requirements, offering some thoughts on a way forward, from the perspective of a national research library and a web archive service provider.

KEYWORDS

Web archives, copyright, legal deposit, national libraries, scholarly use, digital humanities.

INTRODUCTION

Since the mid-1990s efforts have been developed around the world to archive the World Wide Web. Ainsworth et. al. (2012) estimate that between 35% -90% of the web has been archived at least once.   A survey of web archiving initiatives conducted by the Portuguese Web Archive in 2011 identified 42 web archiving initiatives across 26 countries (Gomes et. al., 2011). The current membership of the International Internet Preservation Consortium (IIPC) (http://netpreserve.org/) amounts to 48, indicating increased web archiving effort worldwide in recent years. Many of the IIPC members are national libraries, who carry out regular broad crawls of their respective national Web domain. These are intended to capture a snapshot of the state of the entire national domain (or a subset) at a given point in time, resulting in large scale national web archives. The Internet Archive, also a member of the IIPC, operates the largest web archive to date, with content going back to as early as 1996. The Wayback Machine (http://archive.org/web/) contains nearly 400 billion web pages and covers the global web.

Web archiving institutions have traditionally focused their attention on developing the processes and technology for data collection. The use of web archives has not been a key element of the strategy. They struggle to identify specific patterns of scholarly use, making it difficult to assess whether scholarly requirements are met.

Large scale national web archives often have restricted access, especially those collected under the protection of legislative frameworks such as Legal Deposit or exemptions of copyright law. In order to gain access, users are often required to be physically present at the web archiving institutions’ premises. A good example is the Danish national web archive, which contains over 280TB data collected since 2005. The access condition is so restrictive that fewer than 30 researchers have so far accessed the archive online (Schostag and Fønss-Jørgensen, 2012). Some web archives are completely ‘dark’, ie, they cannot be accessed by anyone expect occasionally by staff for curatorial purposes.

There are also web archives which provide online access to their collections, generally under licence or through research projects. As a result, there have been increasing interactions with scholars in recent years, with a number of research groups emerging which devote effort and attention to web archives.

ACCESS TO WEB ARCHIVES

Access to web archives is problematic at two levels: it is often restricted by legal requirements on one hand, in exchange for reproducing copyrighted material for the purpose of cultural heritage, and by the (single) envisaged use case on the other.

Websites are copyrighted and archiving them without permissions breaches the copyright law. Frameworks such as Legal Deposit are exemptions to copyright law and provide legislative permission for memory institutions to collect publications systematically at scale, for the benefit of future generations. Current or ongoing access to Legal Deposit content has not been an intended element of such legislation as this could potentially damage the interests of copyright owners.  A common practice therefore is to limit access to publications collected under such frameworks, as recommended by the UNESCO in their guidelines for Legal Deposit legislation:

‘Considering that it is widely recognized, at both the national and international level, that a copyright owner has an exclusive right to communicate a protected work to the public and that most electronic publications need to be “communicated to the public” in order to be seen and read, the deposit copy of such electronic publications might require a specific exception allowing access to the clientele of the national legal deposit institution’ (Larivière, 2000, p. 14).

While library users are not unfamiliar with similar access restrictions which have been applied to printed Legal Deposit publications, due to the ubiquitous and open nature of the web, there is an expectation of being able to access archived websites 24/7, just as the live web.  It is difficult to make users understand why archived websites cannot be accessed online, as they have been and are quite often still publically available.  The misalignment between legal requirements and user expectation is a difficult problem for web archives as the choice seems to be between the comprehensiveness of the archive and online access, not both.  Those opening up access either seek permissions from the IPR holders, which often involves high costs, or decide to take and manage a certain level of risk.  Making archived websites publically available, even under licence, is for example regarded as republishing in the UK thus transfers certain legal risks, such as libel, from the original publishers to archiving institutions.  Among the 29 Web Archives listed on the website of the IIPC (http://netpreserve.org/resources/member-archives),  nineteen (66%) have full or partial online access, most being permission-based, small-scale selective web archives.

The predominant use case envisaged for web archives is that they consist of historical documents (web pages) used for reference. Researchers access previous states of individual web pages and websites in a web archive, which are selected, described and grouped together by curators, in the same way as printed books and journals. This static, ‘document-centric’ approach dominates the current web archiving practices in many ways, in how content is collected and presented, and in the design of user interfaces where the main way of navigation is browsing individual websites page by page.  Among the 29 IIPC members’ archives, URL search is the standard, universal access method. This requires users to know the URL of the website they are looking for.  For many archives, full-text search is the next challenge on the roadmap. The table below provides an overview of the search and browse functions provided by the IIPC members’ web archives.

Table 1: Search and browse function offered by web archvies

Search or browse functions Number of archives offering the function
URL search 26
Keyword search 15
Full-text search 11
Thematic collections 11
Subject browsing 9
Alphabetical browsing 14

The common user interface to web archives works well with small, curated collections but does not scale up and provide the users with a functional way to use larger collections. In addition, it generally does not display any contextual information other than the ‘text’, i.e. the web pages as they were captured by the web crawler, which are often incomplete, and do not include the linked content.

Some web archives have started to develop alternative access methods in the last few years which focus on web archive collections in their entirety. The UK Web Archive described in the next section is an example. This demonstrates a shift of focus in web archiving, from human access to machine access and from the level of single webpages or websites to the entire web archive collection. Using visualisation and data analytic techniques, new ways have been developed to view web archives, offering opportunity to unlock embedded patterns and trends, relationships and contexts, which are not possible by consulting websites individually. This echoes the recommendation made by Thomas et. al. (2010) to ‘move away from costly and time-consuming attempts to identify a priori the content (the ‘needles’) likely to be of interest to web researcher’ towards ‘collecting the haystacks’, placing less effort on selection and collection strategies, more on ‘ways for users rapidly to survey, annotate, contextualise, and visualise those repositories, and to find and select the thematic elements of interest to them’.

ACCESS AND USE: THE UK WEB ARCHIVE

The Open UK Web Archive is the selective archive provided by the British Library and partners. It currently contains over 14,000 UK websites and over 60,000 instances of these websites archived since 2004. The majority of the content in the archive was collected before April 2013, under licence. Supported by the Legal Deposit Act and Regulations the British Library started archiving the UK Web at scale since April 2013. In comparison with the Legal Deposit UK Web Archive, the Open UK Web Archive will continue to grow but remain selective and small, containing the highlights and an overview of the larger web archive collection.

The UK Web Archive offers standard as well as a few innovative access methods ‘outside the box’ to improve the user experience, including:

  • a Google-like N-gram search, which visualises occurrence of search terms or phrases over time (http://www.webarchive.org.uk/ukwa/ngramia/).
  • A full-text search at single website level, replacing the original search function on the website which ceases to function once archived. This is extremely useful for large sites, which will otherwise cost a lot more time to browse and find the right content the users are looking for.
  • The Mementos service, which allows users to look up which archives across the world hold copies of any particular web page. It also provides basic visualisations of this information, including breakdowns of how many copies each archival organisation holds (http://www.webarchive.org.uk/ukwa/info/mementos). This is a first step towards linking and integrated search across physically dispersed web archives.

Google Analytics is used to track the usage of the UK Web Archive and help analyse traffic and content. The table and graphs below show the audiences overview of 2012 and 2013. We notice a significant increase of usage since April 2013, when the non-print Legal Deposit Regulations became effective and the UK Web Archive was frequently mentioned in the media. Two special collections, containing almost exclusively police authorities and National Health Service websites which were discontinued or taken offline due to administrative changes or reform, also contributed to the increase of use. With no exception, the most frequently accessed content of the UK Web Archive are websites which have disappeared from the live web.

Table 2: Statistics of visits to the UK Web Archive in 2012 and 2013

Time Visits Unique Visitors Page Views Avg. Visit Duration % New Visits
1 Jan 2012 – 31 Dec 2012 170,432 133,892 692,878 00:02:19 77.36%
1 Jan 2013 – 31 Dec 2013 510,715 347,403 1,444,367 00:02:14 67.55%

Acces1Figure 1: Usage statistics of the UK Web Archive, January – December 2012

Access2Figure 2: Usage statistics of the UK Web Archive, January – December 2013

It would be useful to compare usage statics among web archives. Unfortunately there is very little published about access statistics. ISO’s Technical Committee ISO/TC 46, Information und Dokumentation, Subcommittee SC 8, Qualität – Statistik und Leistungsbewertung, recently devoted some attention to this topic and published a Technical Report on Statistics and Quality Indicators for Web Archiving. [1] The proposed core statistics for measuring and benchmarking collection usage of web archives include measures such as number of pages viewed and the duration of per visit. The Technical Report also includes a set of quality indicators related to the accessibility of usage of web archives, including the percentage of resources accessible to end users and the percentage of library users (onsite or online) using the Web archive (pp. 32-43).

While being useful for evaluating and comparing the performance of web archives, some of these would be of little relevance for ‘dark archives’.

SCHOLARLY FEEDBACK ON THE UK WEB ARCHIVE

Google Analytics provides useful and detailed statistics but it cannot individually identify users, and may even include visits to the website by robots.  Therefore we cannot be certain that visits to the UK Web Archive are all for scholarly purposes.

A user survey was commissioned in 2012 to gather scholarly perspective on the UK Web Archive. We have received feedback on the Archive’s perceived research value, and particularly on the content and access mechanisms which should be further developed to support research use. The findings of the survey are still very much valid despite being from 2012.

The feedback came from 94 users, divided into two groups: those who already use the Archive for research (26%) and those who have not used the Archive (74%). The overwhelming majority are from Arts and Humanities or Social Sciences disciplines. The participants were interviewed over the telephone and a small group also undertook a second phase where they searched the Archive based on specific case studies, detailing each step of the search and results.

Table 3: Participants in the UK Web Archive Survey (2012)

Subject Non-users Users
Arts and Humanities 33 10
Social Sciences 27 11
Science Technology Medicine 4 3
Total 64 24
Unclassified 6

All participants appreciated the potential scholarly value of the Archive. Those interested in web history, statistics and digital preservation research value the Archive particularly highly.

Many first-time users are unsure about the usefulness of the visualisation tools, especially the N-gram search. However a small group of users are extremely enthusiastic about this. There is more interest in visualisation tools from existing users, suggesting the need to add better explanations about the functions and features of the Archive.

Special Collections were thought by all users to be useful. However, users would like to understand our selection criteria and how the themes for Special Collections are established. There is a desire to see more Special Collections and the facility to nominate themes. ‘UK politics’ and ‘Contemporary British History’ are the two broad themes which have been suggested.

All users expressed the requirement for including more images and rich media, as well as more blogs. The selective nature of the Archive seems to impact the perception of those using it for the first time, in that they could not find content relevant to their research. The most valuable lesson learnt from the survey is that relevance of content determines whether researchers use a web archive. A selective web archive will only please some researchers but disappoint others. It is also clear that there is still a significant target group within the research community yet to be reached.

SCHOLARLY USE OF SOURCES: CHARACTERISTICS

Offering web archives as scholarly sources requires understanding of scholarly practices and requirements. The definition of ‘scholarly sources’ has been changing significantly and is no longer just underpinned by the traditional ‘peer-review’ criterion.  In addition to formal publications in books and journals, scholarly sources also include works written by scholars in the press or online, in blogs, and uploaded to institutional repositories.  There is an additional and much large field of sources which may be analysed or used by scholars, for example content generated on social networks. When these sources are delivered by the same access mechanism, the web, the boundary becomes blurred.

Any source used for scholarly purposes, i.e. to advance knowledge by answering research questions, can be defined as a scholarly source. In the context of web archiving, it is more helpful to discuss and analyse the characteristics of scholarly use of digital sources, rather than arguing what web resources are ‘scholarly’ and what are ‘popular’.

New methods of scholarship are also emerging, which go beyond digital texts or documents (digitised or born-digital). The conjunction of the term ‘digital’  and traditional scholarship gives rise to computationally engaged research, teaching and publication, which is redrawing boundaries between the humanities, the arts, the social sciences and the natural sciences (Burdick et.al. 2012). Digital humanities, for example, is a field that has quickly gained momentum and is becoming a mainstream research approach where scholars are starting to take advantage of the possibilities offered by technology. The web has and will play a central role in the less text-based and multi-media driven approach to scholarship. Will web archives, historical records of the web, do the same?

Some fundamental characteristics of scholarly use can be observed, most of which also apply to printed sources.

Availability or accessibility makes the top of the list. There is a general expectation from scholars to be able to access digital sources online, anywhere, anytime, and to carry out basic operations such as search, following hyperlinks, and copying and pasting. This also applies to the use of referenced sources, necessary to allow independent assessment of the strength, validity and reliability of authors’ arguments.  In addition to accessibility and ease of use, scholarly sources should be of good quality, conforming to a certain level of expectation. In the context of web archiving, quality mainly refers to the extent to which the archival copy resembles the live version of the website. Quality can be defined according to the following attributes, developed from Pinsent et al. (2010):

  1. Completeness of capture: whether the intended content has been captured as part of the harvesting process.
  2. Intellectual content: whether the intellectual content (as opposed to styling and layout) can be replayed in the Access Tool (user interface).
  3. Behaviour: whether the archival copy can be replayed including the behaviour present on the live site, such as the ability to browse between links interactively.
  4. Appearance: look and feel of a website.

While ‘text’ used to be the object of research in many disciplines, its primacy no longer exists. Scholars are not only interested in ‘texts’, but also ‘paratexts’, defined by Gérard Genette as ‘accompaniments’ that ‘surround or prolong the text’ (Genette and Maclean 1991, pp. 261-272). Niels Brugger applied this concept to websites and argues paratexts are different from texts in form and function, and play a crucial role in textual coherence of a website (Brügger, 2010). Paratexts of websites are things like header and footer of a web page, referencing words or phrases pointing to the text itself, drop-down menu, site map, breadcrumb, title of the webpage in the browser window.

Another literary concept which moves beyond the established cannon of ‘text’ is that of ‘distant reading’’, put forwarded by Franco Moretti. He argues that ‘…distance… is a condition of knowledge: it allows you to focus on units that are much smaller or much larger than the text: devices, themes, tropes—or genres and systems. And if, between the very small and the very large, the text itself disappears, well, it is one of those cases when one can justifiably say, Less is more. If we want to understand the system in its entirety, we must accept losing something’ (Moretti 2000).

If we think of the conventional access method to web archives as support for close reading of individual text, distant reading allows us to focus on the ‘big picture’, or as Moretti puts it, to learn ‘how not to read them [texts]’ (Moretti 2000). Similar to ‘paratext’, this provides a relevant and interesting theoretical basis for using web archives as scholarly sources. The role of web archives is not to pass judgement on scholarly approaches but to provide services supporting scholars who read texts differently.

While information can be organised and managed in many different ways in a repository or library, by discipline, by geographical area, or by format, scholars do not welcome arbitrarily imposed   boundaries on sources relevant and specific to their research questions. They often have the need or prefer to assemble own research corpora and have the flexibility to apply own methods of analysis.

Another recent and significant change of scholarly practice is the analyse of and use of social media content for research. Social media is the collective name given to Internet-based or mobile applications which allow users to form online networks or communities based on common interest, social or ideological orientations.  Such applications take many forms but their main purpose is to support interaction and communication among the members of a community, including the creation and exchange of user-generated content. Twitter and Facebook are key examples of large social networking platforms, which aggregate many forms of media into one place and are used globally for business, research and personal communications.  Social media have become increasingly prevalent in people’s lives and also important sources for scholars to understand our time.  Content generated on social media is increasingly being used for scholarly research in recent years. Their almost indispensable role can be observed in humanities and social science research (Hockx-Yu, 2013).

REQUIREMENTS FOR WEB ARCHIVES

It is not difficult to map the key characteristics described in the previous section and arrive at a set of very basic requirements for web archives. It is also useful to discuss how the current state of play meet these requirements.

Table 4: Requirements for web archives

Characteristics of Scholarly use Requirements for web archives
Availability Accessible online, supporting basic operations such as search, browse, copy and paste. Geographically dispersed web archives must find ways to link to each other and support seamless access so that the web is not arbitrarily divided into national domains for researchers.
Text, paratext and context Web archives should not just replay archived websites (text) but also provide access to information which has been traditionally omitted as integral parts of the archive, such as collection policy and scope, crawl configuration, craw logs and any other contextual information relevant to the text.
Persistence and citability Web archives need to commit to long-term availability and enable persistence identification of the sources within. Support and services that improve citability include persistent identifiers, agreed standards for citing archived websites and integration with common bibliographical management tools such as Zotero.
Quality Archival versions should represent as much as possible as the live website in terms of completeness, intellectual content, behaviour and look and feel. Content in web archive currently have many quality issues and mostly do not included advanced content such as videos.  Once archived, interactions on the live version are also lost. There must be information about what is missing of the ‘originals’ to help researchers interpret incomplete sources.
Close and distant readings Web Archives should not limit to the currently dominating single, linear, document-centric access method. There should be means to view and manipulate sources at different levels of granularity, to allow researchers to mix-match and reassemble corpora, even to archive research corpora on demand.  Web archives as ‘big data’ must be further explored to develop data analytics and visualisations.
Format-independent sources Integration with other digital and printed holdings such as books and electronic journals is also desirable so that researcher can cross-search all reference all sources relevant to the research questions in hand.
Analysis  of social network data for research Web archives would typically contain some social media content but there is currently no systematic archiving arrangement for this. Researchers have to collect and assembly their own corpora, involving many challenges such as data ownership and privacy. Web archive should with together with researchers to put in place arrangements with services providers and continue to develop technical and legal solutions for archiving social network data.

The web is a fast evolving, interactive, multi-dimensional, open and highly participatory and interlinked collective system. Web archiving institutions have done a great job and have preserved a (partial) snapshot of the early history of the Web.  However, legal and technical issues have contributed to the current state of web archiving, which in essence deconstructs the web and turns it into static, flat, exclusive, individual systems with boundaries and limitations. Both Brügger (2012) and Hokcx-Yu (2011) discussed extensively the many issues and challenges related to web archiving, the former from the scholarly perspective of using archived web as historical sources and the latter providing an account of a national institution tasked with archiving one of the largest Top Level Domains in the world. Many of the requirements described above are yet to be met. It is not surprising that scholars do not use web archives when sources are still on the live web. The fact that web archives hold copies of resources which are no longer on the live web is currently the most compelling use case for web archives.

A WAY FORWARD

As the web changes, so do scholarship practices and methods. Providers of web archives need to respond to these changes to stay relevant.

It needs to be recognised that web archiving institutions have been attempting to engage with scholars to understand their requirements, and for many scholars there is a great deal of unfamiliarity with web archives and lack of common methods and tools for using them. The interaction between web archive providers and scholars has been a process of understanding and knowing each other over time.  A general trend with three phases can be observed:

Phase 1: Building collections

In the early stages of interaction, scholars are involved in scoping collections, selecting and describing websites relevant to research interests. This effort often ended up with the creation of specific, if sometimes narrow, topical collections. An example of this is the ‘French in London’ special collection in the Open UK Web Archive, containing websites selected by Saskia Huc-Hepher around the subject of the French community in London (http://www.webarchive.org.uk/ukwa/collection/63275098/page/1) .  Saskia is a Senior Lecturer and a PhD candidate at the University of Westminster. The collection is a fundamental component of her thesis.

Phase 2: Formulating research questions

Interactions with scholars in this phase often take the form of brain-storming sessions, workshops and projects, where researchers are made aware of web archives and asked the question: which research questions might web archives help you answer? The IIPC for example funded a project in 2011 to understand the potential types of research that could be done using web archives. The Analytical Access to the Domain Dark Archive (AADDA) project funded by the UK Joint Information Systems Committee (JISC) (http://www.jisc.ac.uk/) worked with researchers in researchers in the arts, humanities and social sciences to obtain feedback on the feasibility of using large scale web archives at an analytical level.  The project is an example of much more bilateral interaction. It not only enhanced the scholars’ understanding of the UK web domain dataset, but also provided concrete requirements for the development of an interface and tools which will allow researchers to use it effectively. While this approach helps encourage new users and uses of web archives, and new modes of engaging with researchers, it however suffers from requiring scholars to define the unknown, and is also time and resource-consuming.

Phase 3: independent use of web archives

This type of scholarly interactions has just begun to emerge. It is the desired ‘go-to’ state, where interfaces to web archives already meet the most common scholarly requirements. Scholars are able to use web archives without having to depend on (personal) interactions with the providers. This requires user interfaces to be self-explanatory, jargon-free and to contain base-line information about the archive, including information such as the scope of the archive, its coverage and lacunae, how it was collected, and how a particular website was crawled. Archived websites can also be served as datasets, ready for researchers to download and manipulate using their own tools and methods.

While recognising the current limitations of web archives, it is also important to realise that some of these limitations are necessary and that without them it would have been impossible to establish large scale national web archive collections. It is also clear that the archived web is not a replica of and cannot compete with the live web, due to the many legal and technical obstacles which will take time to improve. What archiving institutions can do, however, is to think about and focus on what differentiates web archives from the live web. There are a few areas where ‘quick gain’ can be achieved which will bring web archives closer to what is required by researchers.

  • Web archives hold the only copies of some web resources that have vanished from the live web. These should be highlighted and grouped together for easy access.
  • Web archives hold periodic snapshots of individual websites showing evolution and change over time. Again this should be highlighted and made more obvious.
  • Web archives in their entirety are comprehensive historical datasets which lend themselves to opportunities for analytical access.

It has become clear that there is not just one way of using web archives. Narrow, pre-selective collections will only meet the requirements of small groups of researchers and disappoint the most. Large scale, national collections with limited access methods will equally fail to meet scholarly requirement by being in danger of ‘one size fits nobody’.

The development of analytical access to web archives in recent years, which addresses researchers’ requirements beyond web archives as individual texts, is an area with huge potential yet to be fully explored. This however, does not remove the basic requirements for web archives to capture and bring together good quality websites, which otherwise may disappear from the live web.  Texts are still a fundamental building block of any web archive. Having explored data analytics at the UK Web Archive in recent years, we are fully aware that it is not the panacea. We have also earnt a number of challenging issues:

  • There is observed scepticism or suspicion from researchers about ‘hidden’ algorithms behind the analysis.
  • Biases in the data, and how data was collected leading to variances in analytical outputs.
  • There is a need to manage expectations. Analysis and visualisation may be finished products or destinations but could also be the first steps, leading researchers to new questions, new areas of attention or requiring detailed access to individual items which participated in the analysis.
  • The aggregation and analysis of large scale datasets containing personal information may lead to ethical and privacy issues.

In the UK, Legal Deposit legislations have allowed large scale domain level archiving of the UK Web. Key challenges for the British Library are not only to collect periodically the web resources from one of the largest domains in the world, but also to continue with the momentum in developing scholarly use of the web archives, fulfilling the statutory obligation of collecting national heritage as well as the role of a world-class research library.

REFERENCES

Ainsworth, Scott G., AlSum, Ahmed, SalahEldeen, Hany, Weigle, Michele C., Nelson, Michael L. (2012) ‘How much of the web is archived?’, http://arxiv.org/abs/1212.6177 (visited 24.04.14).

Brügger, Niels (2010) Website Analysis: Elements of a conceptual architecture. Aarhus: The Centre for Internet Research.

Brügger, Niels (2012) ‘Web History and the Web as a Historical Source’. Zeithistorische Forschungen, 9(2), http://www.zeithistorische-forschungen.de/site/40209295/default.aspx (visited 24.04.14).

Burdick, Anne, Dricker, Johanna, Lunenefeld, Peter, Prsner, Todd, Schnapp, Jeffrey (2012) ‘Digital humanities’, p. 122, http://mitpress.mit.edu/sites/default/files/titles/content/9780262018470_Open_Access_Edition.pdf (visited 24.04.14).

Genette, Gerard, Maclean, Marie (1991) ‘Introduction to the paratext’. New Literary History, 22(2), pp. 261-272, http://www.jstor.org/stable/469037 (visited 24.04.14).

Gomes, Daniel, Miranda, João, Costa, Miguel (2011) ‘A survey on web archiving initiatives’, http://dl.acm.org/citation.cfm?id=2042590 (visited 24.04.14).

Hockx-Yu, Helen (2011) ‘The past issue of the web’, http://journal.webscience.org/440/ (visited 24.04.14).

Hockx-Yu, Helen (2013) ‘Archiving Social Media’, Off the Record, 6, pp.13-15, http://www.archives.org.uk/images/documents/SNfP/otr6.pdf (visited 24.04.14)

‘Information and documentation — statistics and quality Indicators for web archiving’, http://netpreserve.org/sites/default/files/resources/SO_TR_14873__E__2012-10-02_DRAFT.pdf (visited 24.04.14).

Larivière, Jules (2000) ‘Guidelines for legal deposit legislation’, http://unesdoc.unesco.org/images/0012/001214/121413eo.pdf (visited 24.04.14).

Moretti, Franco (2000) ‘Conjectures on world literature’. New Left Review, 1, http://newleftreview.org/II/1/franco-moretti-conjectures-on-world-literature (visited 22.04.14).

Pinsent, E., Davis, R., Ashley, K., Kelly, B., Guy, M. and Hatcher, J. (2010) ‘PoWR: The Preservation of Web Resources Handbook’, http://jiscpowr.jiscinvolve.org/wp/files/2008/11/powrhandbookv1.pdf.

Schostag, Sabine, Fønss-Jørgensen, Eva (2012) ‘Webarchiving: legal deposit of internet in Denmark. a curatorial perspective’. Microform & Digitization Review, 41, pp. 110–120, http://www.degruyter.com/view/j/mdr.2012.41.issue-3-4/mir-2012-0018/mir-2012-0018.xml (visited 24.04.14).

Thomas, Arthur, Meyer, Eric T.,  Dougherty, Meghan, Van den Heuvel, Charles, Madsen, Christine McCarthy, Wyatt, Sally (2010) ‘Researcher engagement with web archives: challenges and opportunities for investment’, http://www.researchgate.net/publication/228295361_Researcher_Engagement_with_Web_Archives_Challenges_and_Opportunities_for_Investment (visited 24.04.14).

NOTES

[1] A formally published, nearly identical and paid-for version of the report can be found at http://www.iso.org/iso/home/store/catalogue_tc/catalogue_detail.htm?csnumber=55211.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s