Page 94 - AC/E Digital Culture Annual Report 2014
P. 94
AC/E digital culture ANNUAL REPORT 2014THE INNER CIRCLE ‐ QUALITY ASSURANCE AND ANALYSISwebsites that were preventable by the webmasters, and he was surprised in talking to the webmasters how little they new about the inner workings of their websites (personal correspondence and conversa‐ tion with David Crawford, July 2012). To try and pre‐ vent data capture surprises, Archive‐It allows part‐ ners to use a test crawl feature that produces re‐ ports on data captured without actually capturing any data. This option allows institutions to see what they would have archived without using their resour‐ ces unnecessarily. The recent Archive‐It partner sur‐ vey shows that 69% of respondents always or often run test crawls when adding new seeds or starting a new collection.3d. Quality Assurance and AnalysisAfter institutions capture data from their desired sites, they should review what they archived and assess its quality and completeness. This can be do‐ ne through reports generated by crawlers or by clic‐ king through the archives themselves by way of an access tool like the Wayback software. The process of web archiving can include trial and error. Like most aspects of web archiving, no single best practi‐ ce for quality assurance has emerged among institu‐AC/Etions that archive the web. However, there are some common trends among Archive‐It partners in terms of the types of crawl information they review.Archive‐It survey data shows that a majority of part‐ ners often or always review their post‐crawl reports generated as part of the service. This is due to the fact that institutions tend to be interested in how much material and exactly what kind of material they are collecting when they start a web archiving program. Findings from the 2012 summer survey of Archive‐ It partners show that 68% of responding institutions review their host reports on a regular basis. Only 11% rarely or never do so. Reviewing reports can take time, and reviewers need to know what anomalies to look for. Three survey respon‐ dents said that the lack of staff/resources make it difficult to analyze reports after every crawl. In 2011 the service implemented a QA tool and the ability to run a patch crawl on top level Url’s that had not cap‐ tured completely the first time around. The respon‐ se has been positive and the service has been wor‐ king on extending the QA tool capabilities. At the time of this writing there is little anecdotal knowled‐ ge about exactly how Archive‐It partners perform quality assurance on their crawls; and it is one of our objectives to learn more about this area as partner’s needs become more tangible.CONCLUSIONS AND NEXT STEPSThe web archiving life cycle model is one step on the road to creating a set of best practices for creating and maintaining a web archiving program. After mo‐ re than seven years of running the service and wor‐ king with forward thinking partners, it is clear to the Archive‐It team that the web does remain “a mess” and that it is in all of our best interests to continue to work together to find solutions to capturing and dis‐ playing web content. As technology continues to develop and as information is increasingly published exclusively online, more institutions of all sizes will need to be archiving web content. Many of the Ar‐ chive‐It partners have been pioneers in web archi‐WHERE WE ARE HEADING: DIGITAL TRENDS IN THE WORLD OF CULTURETHEME 7: THE WEB ARCHIVING LIFE CYCLE MODEL CURRENT PAGE...94