Revealing watermarks – a remote collaboration between Conservation and Imaging

As for so many people, lockdown has meant huge changes to our working lives. As the conservation and imaging staff on a digitisation project, many aspects of our work rely on physical access to collection items, studios, and equipment, and at first it seemed difficult to reimagine a work life so rooted in practical tasks. While this moment of reimagining felt simultaneously exciting and confusing, one thing it provided was the chance to reallocate time. With the removal of ‘business as usual’ came a rare opportunity to dedicate time normally reserved for the essential to the wider elements of what constitutes ‘work’. One way we managed to navigate this was through a collaborative project based on the watermarks from some of the documents we have been digitising for the Qatar Digital Library (QDL): a series of ship’s journals from the East India Company’s earliest voyages (1605-1705).

The idea began within the conservation team (Heather Murphy and Camille Dekeyser) who initially intended to use these watermarks to trace historic routes of the paper trade and commerce within Europe. We had hoped to use the watermarks to uncover specifics about papers and documents, such as their date and location of manufacture, but quickly found that establishing these details depends on a wide range of variables. It became obvious that the project could grow in multiple directions. As well as revealing the watermarks’ value for academic research, we wanted to highlight other enticing elements: their curious symbols, aesthetic appeal, and ability to appear and disappear. This rich combination of factors seemed worth investigating, to see if we could help people experience these often hidden parts of the collection (especially in their digital form, where watermarks are invisible most of the time).

The first step was to make our own watermarks out of wire, and trial these by making paper. After researching how to make a mould and deckle, we were able to sew the watermarks onto the frame and begin making our first watermarked papers. This proved both fun and instructive, so much so that we went on to run a papermaking workshop for other colleagues at the British Library.


It felt logical that a project with multiple elements would benefit from multiple inputs, so we sought out collaborators from among our talented colleagues. Before anything else, we needed to create good quality images of the watermarks which could be easily viewed. Until then, we had been working from handmade tracings, which we had been compiling, researching, and comparing with online databases.

These were a helpful starting point, but lacked accuracy and clarity. With this in mind, we began collaborating with Senior Imaging Technician Jordi Clopés Masjuan and Senior Imaging Support Technician Matt Lee, to discuss the practicalities of creating clearer images. Jordi suggested creating a series of images through which the watermarks could be ‘revealed’: one image capturing the watermarks as they appear on the digitised image (almost or completely invisible), and another showing them illuminated by backlighting.

Although the imaging studio we use is equipped with high quality lights, sensors, lenses, etc., the technique Jordi used to capture the watermarks was quite simple. We first designed and made a triangular structure from vivak (commonly used for exhibition mounts and stands), which enabled us to support the page safely and ensure that it would not move during the capture.

Using a tripod to avoid any movement, we took two consecutive images using only one light for each image: the first was strategically placed behind the camera (to light the ‘original’ view of the folio) and the other behind the document as a backlight (to highlight the watermark). It was crucial that neither the camera nor the document moved, in order to create two images for an exact comparison. Once captured, Jordi worked with the images in Adobe Photoshop to accentuate key points of contrast. While the first image needed no editing, the second required custom adjustments to the levels, curves, saturation, and brightness to reinforce the watermarks.

Jordi then further suggested that we could overlay the two images online, using a digital tool that would allow the user to slide one of the images across to reveal the other, enabling an interactive comparison.

Fortunately, the watermark images were captured prior to the first national lockdown in March 2020. Working from home, Matt imported these digital images into an iPad and traced the outline of the watermarks using the Procreate drawing and painting app. The task was time-intensive, but proved a welcome distraction.

These digital drawings gave the watermark designs a more tangible form and enabled us to compare and categorise them by type. The fleur-de-lis is one of several common motifs, while another features a jug (see below). Compiling the different iterations has revealed subtle differences in the design, shape, and lettering.

In these earlier examples, it is harder to discern the origins of the design, but they often draw on imagery related to trade guilds and religious symbols, as well as incorporating lettering and abbreviations. Many early watermarks can appear almost identical, but exhibit many small differences which are likely imperceptible unless you know what to look for. This may result from distortions caused by wear and tear to the moulds, but could also be due to early papermaking techniques which used a pair of moulds, or double mould, to create pairs of watermarks referred to as ‘twins’. Even these may contain minor differences, perhaps because they were created by different workers, and/or were placed on opposite sides of the mould. A design might have been reversed on different sides of the mould, or placed differently in relation to laid and chain lines. Some include abbreviations of names and initials, or differences in countermarks. Matt’s drawings of the different variations in our watermark designs offer a great way of studying and comparing their motifs.

With these digital tracings, we decided to add a third ‘view’ to Jordi’s interactive comparison tool, incorporating Matt’s drawings to further illuminate the watermarks.

As the GIF shows, Jordi was able to combine our images with this ‘slider’ tool, allowing people to unveil the invisible watermarks by moving the arrows. Our hope is to incorporate this into the QDL, along with contextual articles about the watermarks, but integrating such a tool requires considerable back-end coding, and at the time of writing it has not yet been possible.

The latest addition to this work emerged from conversations with a close friend Eva Sbaraini about her work in 3D printing, when we decided to collaborate to investigate potential uses for 3D printing within conservation. We started by trialing a 3D print of one of the designs in an attempt to give these partially hidden images a physical form.

From these first tests, we are hopeful that 3D-printed watermarks could be used as tactile visual objects for tours, demonstrations, presentations, or workshops, and have been eagerly gathering input from colleagues across different specialisms on other applications. Moving forward, we see possible uses in the realms of teaching, learning, and engagement.

We have also sewn 3D-printed watermarks onto our mould, to test them in the papermaking process. This has allowed us to adapt and study elements of existing designs.

This 3D-printed watermark is an enlarged replica from one that appears in our collection. It was created by converting Matt’s vector image into an SGV file from which to 3D print.

We have even created and 3D printed our own entirely new and intricate design, which is next in line for a papermaking trial. It is made up of the initials of everyone involved in this project.

This collaboration has taught us about each other’s distinct specialisms, and is a remarkable testament to what can be achieved together while working remotely. We have seen the project move from practical, physical elements into the digital realm, and from the digital creations back into new physical manifestations. When we are back in our respective studios at the British Library, we plan to continue working on digitising the watermarks of other series, perhaps finding more ways to make these available for audiences to study and enjoy.

Further reading:

To read more about the process we followed in digitising the watermarks, see the blogpost ‘Making Watermarks Visible’, written for the British Library Digital Scholarship blog.

banner Archivoz english

Alternative Angles: Considering different approaches to 2D digitisation

Working in digitisation as an Imaging Technician at the British Library, I know the digitisation process is typically a standardised and uniformed procedure. Metamorfoze, the National Programme for the Preservation of Paper Heritage in the Netherlands, provides a set of standards and guidelines to adhere to when digitising. Aspects of the guidelines include colour accuracy, exposure and white balance. All images must be checked against the guidelines’ criteria in order to be classed as what Metamorfoze defines as a ‘digital copy’. The aim is to ensure reliable images are produced during the process.

It is incredibly important for heritage organisations to comply with these standardised criteria. Digitisation is now a huge part of the heritage sector and its core ethos of making a digital copy of a physical item is dependent on the same criteria being met across the sector. Deviation from these criteria runs the risk of unreliable representations and defeats the purpose of digitally preserving archives. 

All of these requirements are an essential part of my role in the sector. Various techniques, some more obvious than others, are used. Working with the blinds down to control the light and regular equipment tests, such as sensor tests, are some of the more routine requirements. Selecting the correct piece of equipment for each item you are digitising is also essential. This can be dependent on whether the item is made up of loose-leaf or bound material, how wide a bound item can be opened and what condition the item is in.

It can also involve using unusual pieces of equipment. I have used scanners with curious names such as the ‘Dragon‘ and ‘Cobra.’ I use münchener bücherfingers, or ‘Munich fingers’, to hold down pages where the normal glass plate I use would be inappropriate. Dog grooming tables have proved to be incredibly useful for digitising too, with their ability to raise and lower the level of the table to suit each item. Foam blocks, weights and velcro straps can also be found in our digitisation studio.

While we make very mindful choices on how we digitise, we also consider carefully what we digitise as well. In trying to digitise the experience of looking at or reading an item, we photograph; the book bindings; the top, bottom and edge of books; and any blank pages. Anything that we find folded up, e.g. maps, is photographed both in its folded and unfolded state. 

The practises I have listed above are often routed in the assumption that the majority of people viewing the digital copy will want to read it or look at it from a 2D perspective. However, we should not assume all online users are academics. The experience of viewing any item from in an archive is a sensory encounter and there are many different facets to a collection that are often forgotten. While working in the heritage sector, items have interested me for very different reasons. These include the pastel colours of a series of governmental papers or the particular texture of certain documents. How do you digitise or record these aspects?

Let us look at texture in more detail. Could a description be created detailing its texture or could a photograph be used to show it off? Standard lighting set-ups prioritise capturing words and images clearly. But altering the lighting set-up allows us to capture the texture of the page. This is also the case for gold foil detail often found in manuscripts.  Standard lighting set-ups can fail to capture it, presenting it as murky and brown instead. But altering the position of the lights and camera can bring this detail to life. This sensitive, multi-faceted approach to digitising can reveal aspects of the item that would be very noticeable to the user if they were holding it in real life.

A more unusual example is Hans Holbein’s ‘The Ambassadors’. Although paintings are 3D objects, they are digitised in a way that represents them in a 2D form.  An important feature of ‘The Ambassadors’ is the anamorphic skull. When you first look at it, the skull appears as a strange and abstract grey shape spread across the bottom of the canvas – it seems completely out of place until you step to the side of the frame. This intentionally distorted image is reliant on the ability of the perspective of the viewer to change, as the skull reveals itself when viewed from an oblique angle. How can you digitise this movement from straight-on to side-on, so that online users can experience the same process of realisation that the visitors to The National Gallery enjoy? To digitise this would require deviation from the way that paintings – or indeed any flat surface – are usually captured.

One organisation which has begun to explore this issue is the Science Museum Group. Their ‘One Collection’ project has seen vast amounts of their collections digitised. A quick browse of their collections website demonstrates that they are taking a more creative approach to digitisation. They have a huge variety of objects and documents, from thimbles to steam trains, which goes some way to explain the reasoning behind their process. However, some of their 2D items have been photographed in a similar way to their objects and they address some of the issues I have discussed above. One criticism is that, as shown in the examples below, the capture of the physicality of the documents has been prioritised over their readability. Therefore, it is arguable as to what extent they are ‘digital copies’.  Yet, they do provide a range of examples of images that could be captured and included alongside digital copies to fully represent a 2D item.

This image from the Geoffrey Perry archive provides a better idea of how disparate items interact with each other within the same collection. Such ‘group’ images appear frequently on the Science Museum Group collections website. They provide useful information about the physical aspects of a group of documents whilst also displaying visually interesting information about how colours and graphic design interact within one archive. 

Image by Science Museum Group (unaltered)

Returning to our example of texture, the documents here have been laid out in a way that gives information on the texture and transparency of the paper. In comparison, the photographs of building plans shown below have completely rejected the standard digitisation procedure, prioritising the communication of detail and the delicate nature of the larger documents over their legibility.

In conclusion, although digital copies of 2D works usually consist of one or two images, a sensitive approach to digitisation can provide a more realistic digital representation of documents, maps, photographs and artwork. The inclusion of additional images that show other aspects or details of the item, such as its texture, can ensure that the digital form truly reflects the original object. This, in turn, leads to a more engaging and interesting experience for online users.



Images by Science Museum Group, copyright in  CC BY-NC-SA 4.0

Archives ‘on the go’: ‘What Was Here?’ uses technology to bring content from the ‘research room’ into the wider world, for self-directed exploration.


In June 2019, the East Riding Archives (Beverley, East Yorkshire, England) officially launched its new app, called ‘What Was Here?’. This marked the culmination of a 4-year journey in which we had sought to find new ways of engaging audiences with archives in a digital age.

It was June 2015 when I first conceived the idea for a mobile app that allows people to view what a place looked like, while stood in that location, using archive photographs.  I was on my way to work when I passed a beautiful meadow and recognised it as the location of an image from our collections, which featured some buildings that are no longer standing.  Immediately, I thought not many people will realise ‘what was here’ and, in that moment, with smartphone in hand, an idea was born.


Generally, if someone is interested in viewing material preserved in an archive it is necessary to either visit in person or request that copies be posted or emailed.  Whilst some items may be online, it can sometimes take diligent research to identify relevant web resources and items of interest.  This arguably creates barriers to access, primarily physical, but also cognitive, that can cause archival material to be the preserve of the discerning researcher and preclude many from ever seeing historic items they would otherwise have found fascinating.  One of the key drivers of ‘What Was Here?’ has always been to remove those barriers and appeal to broader audiences on the premise that we have a fundamental curiosity about the past, which makes archives relevant to everyone.

Concept & functionality

The ‘What Was Here?’ app involves archive photographs plotted onto a Google Maps base map, allowing users to go and explore points of interest and compare past with present by viewing the historic image from where it was taken.  This self-directed exploration element is combined with guided heritage trails that include route maps, directions, and GPS push notifications.  An augmented reality feature in trails, called ‘Camera View’, generates an enhanced comparative experience by using the device camera to overlay and align the historic image with the modern scene and toggle the transparency with a slider. If users are particularly fond of an image they also have an option to ‘Buy Prints’, linking back to our e-commerce website ‘East Riding Photos’ ( and so facilitating the purchase of copies of archival content as gifts, souvenirs, or wall décor for local businesses.  The platform intentionally supports a range of potential uses including tourism, education, family history, exercise, reminiscence, or simply general interest.  Having a commercial offer available is also important to the user experience as it enhances engagement and sense of ownership.

The photographic element marks Phase 1 of the app’s rollout, with Phase 2 currently in development.  This 2nd phase focuses on historic maps overlaid onto the base map in layers of ‘time’ according to date e.g. 1700s, 1800s and is again based on the use of a transparency slider to phase between the past and present for instant visual comparison.


From the outset, my concept has been to provide access via a base map, with archive items geo-referenced onto it.  Historic photographs and maps were earmarked as baseline content because I felt these lent themselves most readily to the geo-referencing aspect and intuitively believed them to be most popular with mainstream audiences.

One of the key challenges was identifying suitable photographs with no known copyright restrictions and plotting their coordinates.  This was achieved with over 1400 images across the East Yorkshire region. Obviously, it is not possible to be 100% accurate with every image, which is partly why a ‘Contribute’ feature was added to the platform, allowing users to make suggestions for refinement to the map coordinates.  This should also allow us to tap into the rich photographic collections of private individuals by encouraging the donation of their material.  Alongside the photographic content, the proposed method of presenting archival maps has produced the most significant developmental challenge as the source material is high resolution and data-heavy, so needs to be condensed and packaged in a user-friendly manner.  Once this work is complete it promises an exciting new way of accessing maps from the Archives.

The process of moving from concept to development, and finally to delivery has taken four years, the first three of which were spent in convincing stakeholders that my concept was a viable solution.  Our Archives service operates within a local government setting, where budgets are often constrained, so a rigorous procedure was followed in order to win corporate approval, before any consideration could be given to procurement and development.  This was a test of my persistence and belief in the concept, but in this digital age, where access to information is driven by engagement with apps and the internet, I consider it reticent for an archive not to have the means of engaging users digitally.  It was the belief that this was vital for service provision, as much as my passion for the concept, that saw this project through to its fruition.


The ‘What Was Here?’ concept appeals to our fundamental curiosity about the past and its relationship with our surroundings, placing archives at the centre of that user experience.  However, as with any digital innovation, the level of public awareness of the app’s availability is vital to its success.  In its first six weeks, the ‘What Was Here?’ app received over 1000 downloads on Play Store (Android) in which it trended at No.10 in the ‘Travel & Local’ category, placing it above some major commercial apps.  This is an encouraging start, providing affirmation that the concept has popular appeal, and statistics from the App Store (iOS) have yet to be added to this figure.  In relative terms, the marketing has been on a low budget, and small scale, so with future plans for increasing the promotion it is anticipated that growth in uptake could follow.  My hope is that historic photographs and maps will help the app to gain traction with a mainstream audience and allow for a diverse range of archival content to be hosted on the platform, including audio and video, with other heritage organisations getting involved and ultimately expanding the base map.  Enjoyment and learning should be at the heart of the user experience, and it is a pleasure to consider that we are using archives to deliver that to people in the wider world.  Conversely, the ‘What Was Here?’ app is merely the tip of the iceberg in terms of archival content held in the repository, so it should also act as a useful advertisement, pointing people towards resource availability in the research room.  With this technology, we now have the ability and opportunity to transform how people engage with archives, creating a mainstream tool for learning and exploration.

‘What Was Here?’ is available to download free on Google Play and the App Store (search ‘what was here’).  For more information, visit


Featured image was taken by Samuel Bartle

A Blockchain For Archives: Trust Through Technology

At a time when the fragility and vulnerability of digital records are increasingly evident, maintaining the trustworthiness of public archives is more important than ever.

Video and sound recordings can be manipulated to put words into mouths of people who never said them, photographs can be doctored, content added to or removed from videos and recently, AI technology has “written” news articles that can mimic any writer’s style. All of these media and many other “born-digital” formats will come to form the public record. If archives are to remain an essential resource for democracy, able to hold governments to account, the records they hold must be considered trustworthy.

But is this really a problem for archives?

Until recently, this has not been a concern for archives. People trust archives, especially public archives. We are seen as experts, preserving and providing access to our holdings freely and over a lengthy period (since 1838 in the case of The National Archives in the UK). We could rest on our laurels. But the challenges to our practice brought by digital technologies have to lead us to question whether this institutional or inherited trust is enough when faced with the forces of fakery that have emerged in the 21st century.

In 2017, The National Archives of the UK, partnered with the Centre for Vision, Speech and Signal Processing (CVSSP) at the University of Surrey and Tim Berners-Lee’s non-profit Open Data Institute, started to research how a new technology could be harnessed to serve on the side of archives. The ARCHANGEL project is investigating how blockchain can provide a genuine guarantee of the authenticity of the digital records held in archives. A way of publicly demonstrating our trustworthiness by proving that the digital records held in archives are authentic and unchanged.

Often considered synonymous with Bitcoin, blockchain is the technology that underpins a number of digital currencies but it has the potential for far wider application. At root, it is the digital equivalent of a ledger, like a database but with two features that set it apart from standard databases. Firstly, the blockchain is append only, meaning that data cannot be overwritten, amended or deleted; it can only be added. Secondly, it is distributed. No central authority or organisation has sole possession of the data. Instead, a copy of the whole database is held by each member of the blockchain and they collaborate to validate each new block before it is written to the ledger. As a result, there is no centralised authority in control of the data and each participant has an equal status in the network: equal responsibility, equal rights and an equal stake.

As with any new technology, there are issues to be researched and resolved. The most common criticism is that 51% of the participants could collude to change the data written on the blockchain. This is less likely in the case of ARCHANGEL because it is a permissioned blockchain. This means that every member has been invited and their identity is known, unlike bitcoin networks where many of the members are anonymous.

A more practical issue that arose early on was around what information could be shared on an immutable database that would be available to the public, to prove that they were unchanged from the point of receipt by the archives. Every public archive holds records closed due to their sensitive content. This sensitivity sometimes extends to their filenames or descriptions so adding these metadata fields to the blockchain would not be appropriate. We settled on a selection of fields that included an archival reference and the checksum, a unique alphanumeric string generated by a mathematical algorithm that changes completely if even one byte is altered in the file. In this way, a researcher can compare the checksum of the record they download against the checksum on the blockchain (written when the record was first received, potentially many years previously) and see for themselves that the checksums match. As archives sometimes convert formats in order to preserve or present records to the public, the project has also developed a way of generating a checksum based on the content of a video file rather than its bytes. This enables the user to check that the video has not been altered for unethical reasons while in the archive’s custody.

So, the ARCHANGEL blockchain enables an archive to upload metadata that uniquely identifies specific records, have that data sealed into a “block” that cannot be altered or deleted without detection, and share a copy of the data with each of the other trusted members of the network for as long as the archives (some of the oldest organisations in the world) maintain it.

In the prototype testing, we found that the key to engaging other archives is in emphasising the shared nature of the network. Only by collaborating with partners can the benefits of an archival blockchain be realised by any of us. It is blockchain’s distributed nature that underpins the trustworthiness of the system; that enables it to be more reliable, more transparent and more secure, and therefore effective in providing a barrier against the onslaught of synthetic content.

At the same time, the effort of the organisations to make the prototype work demonstrates their trustworthiness: in wanting to share the responsibility for proving the authenticity of the records they hold, they demonstrate their expertise and honesty.

The arms race with the forces of fakery that archives find themselves in is the reason why The National Archives is thinking about trust. We do not want people to trust archives only because of their longevity and expertise. Instead, we want to demonstrate their trustworthiness. We want to provide what Baroness Onora O’Neill said was needed in the BBC Reith Lectures in 2002:

“In judging whether to place our trust in others” words or undertakings, or to refuse that trust, we need information and we need the means to judge the information.” O’Neill, A Question of Trust

This is what we think blockchain gives us as a profession: by being part of a network of trusted organisations which assure the authenticity of each other’s records, we demonstrate the trustworthiness of all of our records.



The ARCHANGEL Project would like to acknowledge the funding received from the ESPRC Grant Ref EP/P03151X/1.


Header image: ‘Crown copyright 2019 courtesy of The National Archives’

Further details:

The project website is here:

For a more detailed paper about the project see:

The journey from a records management system to a digital preservation system

“People have had a lot of trouble getting stuff out of RecordPoint.”

This sentence was a little worrying to hear. It was 2015, and our archive was contemplating digital preservation for the first time. We didn’t really know what it was, or how it worked. Neither did anyone else: the idea of having a “digital preservation system” received blank stares around the office. “Is it like a database? Why not use one of our CMS’s instead? Why do we need this?”

And so it was that I realised I was in over my head and needed outside help. I looked up state records offices to find out what they were doing, and realised there is such a thing as the job title “Digital Preservation Officer”. I contacted one of these “Digital Preservation Officers” to get on the right path.

The Digital Preservation Officer’s knowledge in that early conversation was invaluable, and helped us get over those early hurdles. She explained the basics: why digital preservation is important for an archive. How to get started. Breaking down jargon. Convincing non-archivists that yes, it is necessary. And – the importance of figuring out what you want to preserve.

“We will need to preserve digital donations,” I listed, “and digitizations of our physical inventory. Plus, I manage our digital records management system, RecordPoint – if we’re serious about our permanent records we will need to preserve those as well.” (The international digital records management system standard, ISO 16175 Part 2, says that “long-term preservation of digital records… should be addressed separately within a dedicated framework for digital preservation or ‘digital archiving’”.)

It was at this point that the Digital Preservation Officer replied with the quote that began this article.

I don’t think she was quite right – getting digital objects and metadata out of RecordPoint was quite easy. The challenge, it turned out, would be getting the exported digital objects into our digital preservation system, Archivematica.

In the image shown below, the folders on the left represent the top level of a RecordPoint export of two digital objects. The folders on the right are what Archivematica expects in a transfer package.

In the example above, there are three folders for ‘binaries’ (digital objects) and two folders for ‘records’ (metadata). Immediately something doesn’t make sense – why are there three binary folders for two objects?

The reason is that the export includes not only the final version of the digital object but also all previous drafts. In my example there is only a single draft, but if a digital object had 100 drafts, they would all be included here. This is great for compliance, but not so great for digital preservation where careful appraisal is necessary. The priority when doing an ‘extract, transform, load’ (ETL) from RecordPoint to Archivematica would be to ensure that the final version of each binary made it across to the ‘objects’ folder on the right.

An Archivematica transfer package should not only consist of digital objects themselves, of course – you are not truly preserving digital objects unless you also preserve their descriptive metadata. This is why the ‘metadata’ folder on the right exists: you can optionally create a single CSV file, ‘metadata.csv’, which contains the metadata for every digital object in the submission as a separate line. Archivematica uses this CSV file as part of its metadata preservation process.

In contrast, RecordPoint creates a metadata file for every one of the digital objects it exports. If you wanted to pull metadata across into the metadata CSV file for the Archivematica submission, you would need to go through every single metadata XML in the export and copy and paste each individual metadata element. Based on a test, sorting the final record from the drafts and preparing its metadata for Archivematica might take two to four minutes per record. Assuming we have 70,000 records requiring preservation, the entire process of transforming these records manually would take over 6,000 hours. Although technically possible, this is too much work to be achievable, and there would be a high likelihood of errors due to the tedious, detail-oriented work.

Fortunately, I knew the R programming language. R is used by statisticians to solve data transformation problems – and this was a data transformation problem! I created an application using a tool called R Shiny, providing a graphical interface that sits on the Archivematica server. I creatively called it RecordPoint Export to Archivematica Transfer (RPEAT). After running a RecordPoint export, you select the export to be transformed from a drop-down list in RPEAT and select the metadata to be included from a checklist. RPEAT then copies the final version of each digital object from the export into an ‘objects’ folder and trawls through each XML file to extract the required metadata. Finally, RPEAT creates a CSV file that contains all of the required metadata, and moves it into the ‘metadata’ folder. Everything is then ready for transfer into Archivematica.

Pushing 212 records exported from RecordPoint through RPEAT, selecting the correct metadata from the checklist, and doing some quick human quality assurance took 7 minutes. Scaled up, transforming all 70,000 records this way would take fewer than 39 hours. RPEAT reduces the time taken to prepare records for Archivematica by over 99% compared to manual processes.

The advice that the Digital Preservation Officer provided all those years ago was invaluable, and I think in particular the warning on “getting stuff out of RecordPoint” was pertinent – but I wish to expand on her point. The challenge is not unique to RecordPoint – the challenge is ETL in general. At a meeting of Australia and New Zealand’s digital preservation community of practice, Australasia Preserves, in early 2019, other archivists shared their struggle to do ETL from records management systems into their digital archive. This ability is an important addition to the growing suite of technical skills valuable to us digital preservation practitioners.


International Organisation for Standardisation. (2011). Information and documentation —

Principles and functional requirements for records in electronic office environments — Part 2:  Guidelines and functional requirements for digital records management systems  (ISO 16175-2). Retrieved from

Header image

Artem Sapegin on Unsplash

Translating for a Digital Archive

The Qatar Digital Library

Since 2012, the British Library has been working with the Qatar Foundation and Qatar National Library to create and maintain the Qatar Digital Library. Launched in 2014, this free, bilingual portal hosts a growing archive of previously un-digitised material primarily from the BL’s collections. Focusing on content relevant to the history and culture of the Persian Gulf, items include India Office Records, maps, visual arts, sound and video, and personal papers. The portal also features selected Arabic scientific manuscripts. Alongside these items, the QDL also offers expert articles to help contextualise the collections.

As part of the BL’s translation team, I work to produce and edit the Arabic language content for the QDL. While the collection items themselves are displayed solely in their original language, all of the portal’s supporting and descriptive content is translated, as are the expert articles, meaning that the catalogue can be searched and used just as easily in Arabic as in English.

The bilinguality of the portal has been a key part of increasing the visibility and accessibility of the collections. Users of the QDL are just as likely to access the site in Arabic as they are in English, if not even more so: the most frequently visited individual page on the site is the Arabic homepage and users more often land on one of the Arabic pages than the English ones. Moreover, the terms users enter to search the collections are just as often written in Arabic as they are in English. Consequently, we have a responsibility to maintain the same high stands and make sure that all of the QDL’s features function equally well in both languages.

Our Toolbox: Translation management software

Like many large-scale translation projects, ours involves multiple translators, and several rounds of proofing and quality checks to ensure accuracy and consistency. To manage this, we use a piece of software called memoQ that includes two essential tools: a translation memory (TM) and a term base (TB). The TM functions as a bilingual database of previously translated segments of text; it works by storing pairs of original source-language content alongside its approved translation. When a new text is imported, memoQ breaks it into smaller segments on the basis of punctuation and line breaks, and automatically conducts a search for exact and partial matches. These are then presented to the translator for approval and/or review.

Caption: A segment in memoQ with an exact match (100%) in the TM

Caption: A segment in memoQ with a partial match (85%) in the TM

While a human expert still has the final say on whether to accept any suggestion from the TM, frequently only a minor edit is needed to make the old translation suitable for the new context. This serves the double purpose of saving time and maintaining consistency across the catalogue as a whole. Translation memories tend to prove their worth the larger they are and the more repetition there is in the content. Having grown over the years since the start of the project, our TM now routinely recognises a third of content in a new file, and often much more.

While the TM grows organically over time by compiling and storing translation segments, the term base is maintained manually. It works as a glossary for key terms, allowing us to suggest preferred equivalents for individual words or phrases, and/or to blacklist translations that should be avoided. As the TB is visible to all parties at all stages of translation and proofing, it helps to ensure the consistency of these terms in Arabic.

Caption: A segment in memoQ with terms recognised by the TB highlighted in blue

Caption: Terms recognised by the TB, with approved translations in blue and forbidden ones in black

Authorities: making the most of memoQ

The TB has proved especially useful when it comes to translating authority files. An authority record serves to identify and describe a person, corporate body, family, place name, or subject term that is featured in a catalogue description. Each term is authorised and unique. As every record and every expert article on the QDL is linked to at least one authority file, they form an index through which users can search for all the content related to a specific term.

Caption: Authorities displayed as filters on the QDL

Caption: Authorities displayed at the end of a record on the QDL

To be effective, authorities must be reproduced in exactly the same way for every record. For the English side of the portal, they are extracted from the same central database each time, with no opportunity for them to mutate or change before arriving on the portal – but not so with the Arabic!

For every record, the linked authorities are included as part of the English text to be translated, no matter how many times they may have been translated in the past. This repetition of the process creates an opportunity for discrepancies to creep in. If, for instance, there are several new records, all linked to the same new authority, that are sent to several different translators, it is not only possible but quite likely that each translator will produce a valid but slightly different version of the term in Arabic. If the same records then also go to different proof readers, there is a good chance that the discrepancies will slip through unnoticed, rendering the Arabic authority much less useful than the English equivalent, as any one variant will not be linked to all the related content.

After spending much time and energy on trying (and sometimes failing) to catch these discrepancies at the end of the proofing process, we now make sure to pre-translate any new authority and add it to the TB, along with a unique identifying number (arkID), before sending the related files for translation. This means that when the term appears for translation, it is displayed in the TB along with its arkID, adding an extra means of checking whether this is the approved and appropriate translation for this specific context. Once confirmed and thereby added to the TM, it registers as a 101% match, meaning that there is an exact match not only in the text, but also in the metadata.

Caption: Authority term with arkID displayed in TB, registering as 101% match in memoQ

Cataloguing for Translation

Working in-house at the BL alongside the cataloguers allows the translation team to understand and appreciate their processes and standards, and has also allowed us to show them the impact of their decisions and choices on translation. Over time, we have developed guidelines to help them create the English records with translation in mind. For example, where possible, the cataloguers now use stock phrases for repeated content, leading to a much higher hit rate in the TM, and they understand that their use of punctuation can make a big difference to the likelihood of a match appearing.

Caption: Stock phrase with multiple TM hits in memoQ

Caption: List of correspondents written using punctuation marks to help break the text into smaller translation segments in memoQ

Small changes like this help to streamline the translation process, so we can focus on maintaining the QDL’s high standards across the Arabic side of the portal and make sure the content is just as accessible in either language.

Translation in Digitisation

In my work as a freelancer, I have found more often than not that clients arrive at translation as something of an afterthought. It is frustratingly common to find that they have budgeted neither the time nor the funds required for the work – the deadline tends to be yesterday, and the fee mere pennies. Pleasingly, this is not the case working on this project, where translation has been built into the process from the beginning and is understood to take time, thought, research, and expertise. Moreover, the decision to have an on-site team, working in the same office as the cataloguers, affords a rare opportunity to consult the specialists about their writing when queries inevitably arise, and to reciprocate by sharing our linguistic, cultural, and technical knowledge. We could of course always do more in our efforts to create bi- and multilingual resources for ever wider audiences, and with more and more institutions planning and investing in digitisation, there are deeper and broader questions about how, for whom, and in which languages we do so. Bilinguality has been a vital part of the QDL’s success in opening up the collections to new users and ought to be part of the ongoing discussions in digitisation.

See further:

Banner: Brief Principles of the Arabic Language ‎[F-1-14] (14/184), Qatar National Library, 10680, in Qatar Digital Library. Author: Filippo Guadagnoli. ©Qatar National Library. Usage Terms: Creative Commons Attribution Licence

memoQ Images:  ©memoQ.

QDL Images: ©Qatar National Library. Terms: Creative Commons Attribution Licence



‘[Insta]Poetry is not a luxury’: On the Urgency of Archiving the Diverse Voices of Social Media

In today’s digital age, verse has gone viral. Since 2013, young poets have turned to photo-sharing platforms like Instagram to self-publish their texts and gain a wide readership.

Fitting their words into the square Instagram picture frames, paying attention to font and sizing, writing texts that are accessible both in style and content, instapoets are making poetry visually appealing, relatable, scrollable, accessible, and portable to followers.

The repercussions have been significant: recent UK and US surveys reveal that poetry is on the way to becoming a bestselling literary genre. In 2017, a report by the National Endowment of the Arts (the largest survey of American adults’ participation in the arts) and UK statistics from book sales monitor Nielsen BookScan showed that the number of poetry readers had nearly doubled since 2012. The biggest increase was among young adults, especially young women and people of colour – a significant shift from the traditional middle-aged male poetry buyer, who now represents only 18% of poetry readership in the UK. In 2017, the top-selling poetry collection was Milk and Honey by 26-year-old Punjabi-Sikh Canadian “instapoet” Rupi Kaur, which has sold 3.5 million copies and been translated into 40 languages. Eleven other instapoets made it into the top twenty bestselling poets list that year. While these figures indicate the financial success of instapoetry in book format, they can only hint at the size of its online readership. Therefore, to counter this focus on the physical object of the published book, I want to initiate a discussion on social media that is responsive to the ways that the poetry itself appears online in order to think about alternative ways of archiving “beyond the book”.

Despite the media attention instapoetry has received since its beginnings, the academic and literary world has remained suspicious, and continues to question its literary legitimacy. Very little academic material has been published on the topic, and apart from a very small number of papers on Rupi Kaur’s writing, instapoetry is treated as a footnote. Is this simply the result of the slow-paced process of conventional academic publishing or might this also be because writing an academic article on instapoetry would give it some cultural and academic legitimacy? The most virulent criticism was voiced in a 2018 article by British poet Rebecca Watts, who condemned “social media’s dumbing effect,” compared instapoets to social media influencers, and denounced “the open denigration of intellectual engagement and rejection of craft” that she argued is characteristic of the genre. Her concerns undoubtedly reflect those of many poets and academics who worry that poetry is no longer being taken seriously. Yet there are historical parallels. In 1856, as more and more middle class women started writing for a living, George Eliot wrote an essay entitled “Silly Novels by Lady Novelists,” lamenting the affectation and clichés of women writers, and the “absence of rigid requirements” of the novel form which, she contended, “constitutes the fatal seduction of novel-writing to incompetent women” (Eliot, 324). Being a new genre, the novel was accessible and socially acceptable for women writers, as it had relatively low status, was easy to read, bore “no long and intimidating tradition of ‘great masters’”, and “did not demand a knowledge of the classics, of rhetoric, or of poetic devices” (Eagleton, 60). The similarities between the novel then, and instapoetry now, are striking, as this new generation of “outsiders” is carving out a space to be heard outside mainstream publishing.

Most successful instapoets are young, feminist, from minority or immigrant backgrounds, and many of them are women, and/or queer, and/or working-class, and/or with disabilities. They often resort to Instagram after being turned down by publishers; indeed a report by the UK development agency Spread the Word in 2015 found that Western publishing tends to favour books by white male writers, resulting in the BAME writer community being underrepresented. Instagram thus offers a creative outlet for sections of the population who are usually barred from more traditional forms of poetry. In this sense, Watts’ piece voices a general sense of uncertainty that is created by instapoetry’s flouting of hidebound poetic traditions. Clearly this raises questions as to what exactly qualifies as “literature”, and more crucially, who has the authority to define a text as “literary”. In an essay entitled “Poetry Is Not A Luxury” written in 1977, African-American feminist poet Audre Lorde suggests that our definitions of literariness are a white Western male invention. Poetry, she writes, “is a revelatory distillation of experience, not the sterile word play that, too often, the white fathers distorted the word poetry to mean” (Lorde, 37). Challenging the emphasis on intellect over emotion in Western culture [reiterated by Watts, who contrasts the media-celebrated “honesty” of instapoets with actual “poetic craft”], Lorde convincingly argues that for minority women, “poetry is not a luxury”. Instead, it is vital for survival as it allows women of colour to access and communicate their emotions. It is a socially meaningful medium through which to think of new ways of being and to initiate “tangible action” towards social transformation. For instapoets such as Rupi Kaur, Lang Leav, Nayyirah Waheed, Yrsa Daley-Ward and Amanda Lovelace, “honesty” is the way they raise awareness of political matters like sexuality, abuse, gender, race and immigration. The debate about the “literariness” of instapoetry has overshadowed a more important fact: through social media, minorities and women not only have a creative voice but are also being heard, and are experimenting with language, rhetoric and form in innovative ways.

I first became interested in instapoetry while assessing the British Library’s holdings of contemporary North American migrant narratives as part of a PhD placement. Taking a literary perspective initially focusing solely on books, I started investigating their practices regarding online writing. I was surprised to learn that the British Library cannot acquire and collect websites or Instagram pages as easily as it can a book. The UK Web Archive, hosted at the Library, is confined to UK-based or UK-associated websites (either hosted on a UK domain or authored by UK residents) which it can collect through non-print legal deposit. It can also pull information from web pages that have no UK-based domain if there is a clear link with the UK, but permission is needed from the creators of the content and rights need to be cleared. Social media content is even more difficult to archive, as many sites also block attempts to “harvest” content. The legal restrictions are a real barrier for the archive, leading me to wonder whether the UK web archive in its current form might not be inadequate. As instapoets regularly delete older posts, their work is lost forever if it is not collected in time. A search on the UK Web Archive and the American Internet Archive (known as the Wayback Machine) suggests little trace remains of the years 2014-2015, the period when instapoetry peaked. If we think of a book as a final product, then an Instagram page can be viewed as a digital manuscript, the followers’ comments as editor’s feedback, and each deleted post as a draft that ends up in the bin. I prefer to see the published book and the Instagram page as two separate entities, especially given that instapoets continue to use Instagram even after publication. Rupi Kaur’s bestseller Milk and Honey may find its place in an archive someday, but then only half her work will have been recorded, as her books and her Instagram page simply cannot be compared with each other. Instead of one per page, poems on Instagram are interspaced with pictures and positioned on a board, rather like the page of a comic. The concept of time is also different: the reader starts with the most recent poem and scrolls down (or back) in time, and this creates a sense of ongoing progression that cannot be reproduced in a conventional book with a beginning and an end. 

Instapoetry evidently poses challenges to current academic practices regarding interpretation and archiving. Unfortunately, the archiving system in its current form continues to prioritize literature published in books over self-published online writing, and although the reasons are primarily technical and legal, this contributes significantly to the marginalization of minority voices.

Cover photo: Rupi Kaur’s instagram page, picture taken by Matteo D’Ambrosio

Works Cited
  • Eagleton, M. (1989). Gender and Genre. In C. Hanson (Ed.), Re-reading the Short Story, London: Macmillan Press, pp.55-68.
  • Eliot, G. (1967). Silly Novels by Lady Novelists. In T. Pinney (Ed.), Essays of George Eliot, London: Routledge, pp. 300-324.
  • Lorde A. (1984). Poetry is Not a Luxury. Sister Outsider, Freedom CA: The Crossing Press, pp. 36-39.
  • Watts, R. (2018). The Cult of the Noble Amateur. PN Review 239, 44 (3).