Google Cloud Vision

Sally Carter and Becky Brumbill, 2 September 2021

When the collection of questionnaires was first assessed it became clear that there were too many documents for volunteers to transcribe during the length of the project. We considered simply digitising the remaining documents and adding them to the Museum’s Collections Online website, but we really wanted the content of the questionnaires to be searchable. We therefore concluded that a blended solution where we supplemented the work of volunteer transcribers with an automated solution might be the best way forward.

After discussions with our software developers, we decided to look at the Google Cloud Vision API as a possible solution to automating the transcription of the documents. This is a powerful machine learning tool that, among other things, can read printed and hand-written text, and create a metadata set that is indexed and therefore searchable by members of the public. It also records the properties of the document and assesses the nature of the text transcribed.

Proposed Workflow

Currently our object records are stored in our Collections databases and our images are stored in our Digital Asset Management System (DAMs). Images and object records are automatically extracted into a middleware product called the CIIM, designed by Knowledge Integration. Within the CIIM the object records are linked to their images and pushed through to Collections Online.

We needed to create a workflow pipeline that allowed us to identify the required images in our DAMs (iBase), push them through to Google Cloud Vision in the correct order and then store the enhanced media records in the CIIM so that the marked-up document and the transcription metadata can be displayed alongside the image online.

Software developments

None of the developers we work with had any experience of using Google Vision in this way so there was a certain amount of experimentation required to refine the workflow.

We soon discovered that to access the required functionality we needed to upgrade to the latest version of the CIIM middleware and expand our metadata framework, with new corresponding lookup fields within iBase.

Currently, all images are stored within the DAMs on one level, with no hierarchical relationships that allow us to define relationships such as a book and its pages.  In addition to this we have only ever been able to identify the ‘main image’ for groups of images, with no functionality in place to sequence multiple pages. We therefore needed to develop a new framework dedicated to pages or ‘connected items’ as we have termed them. When the software developments are complete, we will be able to link pages via the accession number and store a dedicated set of metadata specifically for the connected items.

Results

The workflow is now in place to automatically select images from the DAMs and push them through Google Cloud Vision and back into the middleware.

We have put tags in place to identify which items should be sent to Google Cloud Vision and in what order they should be sequenced, allowing us to test the pipeline and confirm ‘proof of concept’.

CIIM pushes these marked images through Google Vision in batches. We have clearly defined limits on amounts to push through (both in the CIIM and our Google Cloud Vision settings) that prevent us exceeding the free service offered by Google.

This data is then modelled and published on the CIIM User Interface.

Initial results for the transcribed documents are mixed. The documents contain both typed and handwritten text in both Welsh and English, and some contain small sketches. Google Vision coped well with printed text in both languages, reasonably well with English handwritten but less well with Welsh handwritten.

Google Cloud Vision created metadata for both the text and the document itself, and this latter mark-up metadata was enormous, with thousands of lines of code. When displaying this within the CIIM middleware we need to decide how much of the metadata should be made available within the record itself, and where to store the rest.

Volunteer transcribers

Whilst all the software development work was going on a group of volunteer transcribers started working through batches of documents and producing transcriptions in Word document format. It is not possible to edit the Google Cloud Vision documents so the work the volunteers are doing cannot be integrated into the enhanced metadata.

Instead, we are pasting the transcriptions directly into our back-end CMS in the ‘Description’ field. This text is then pulled through directly into Collections Online where it can be displayed alongside the document images.

Next steps

The next phase of the project will be to display the Google Cloud Vision enhancements through Collections Online so that the text can be searched using our standard search interface. We also need to ensure that both the volunteer transcriptions and the automated transcriptions can be viewed and searched in the same way.

We will deliver zoom images using IIPImage, a server system for web-based streamed viewing and zooming of high-resolution images. These images will be IIIF compliant, the universal image standard, allowing us to utilise one of the many available IIIF web viewers to display the Google Cloud Vision enhancements through Collections Online. In addition, we are keen to see if Google Cloud Vision’s transcriptions improve over time. We are experimenting with specifying the language of the document by turning on ‘languageHints’ rather than just enabling auto-language detection. Will this improve the results for the Welsh handwritten material? We are also keen to see if pushing thousands of Welsh documents through the Machine Learning interface has any effect on the quality of the transcription. We plan to re-run the first 100 documents through Google Vision again at the end of the project to compare the quality of the transcription and identify if there are any noticeable improvements.

Long term this could be a hugely important development allowing us to digitise thousands of pages of documents held within our collections and automatically transcribe and make the text searchable in both Welsh and English.

Collecting Covid: revisiting collecting methods of the past

Sioned Williams, 1 September 2021

In March 2020, museums across the world started to capture stories and collect objects relating to the Covid-19 pandemic. At Amgueddfa Cymru-National Museum Wales, we launched a digital questionnaire in May 2020 as a first step towards creating a national Covid collection to be archived at St Fagans National Museum of History. This method of collecting information through questionnaires is rooted in the Museum’s history, dating back to 1937.

We soon realised that the continuity in collecting methods of the past also offered an opportunity to revisit some of the early questionnaire responses and to inspire new collecting initiatives in post-Covid Wales. To enable this work, we applied for funding to the Museums Association Esmée Fairbairn Collections Fund which allowed us to start on a 12-month project to digitise the historic questionnaires and experiment with new models of collecting through engaging with communities.

We began the project from September 2020, at a time when Covid restrictions had started to ease. The first task was to digitise the hundreds of pages of responses from the historic questionnaires and answer books.

The earliest questionnaire was published by the Museum in December 1937 and sent out to 493 respondents across Wales. This may have been inspired by the Mass Observation project of 1937 recording the everyday lives of people across Britain. Launched in a decade largely defined by economic hardship and unemployment, the ‘Questionnaire of Folk Culture’ asked participants to provide information about a variety of subjects such as the domestic, public and cultural life of their local area. It also encouraged people to send photographs and drawings and to become regular informants to assist with developing a collection which formed the basis for the Welsh Folk Museum at St Fagans, established in 1948. After that. questionnaires and ‘answer books’ were regularly used by the Museum as a form of capturing information on a range of subjects, up until the 1980s.

After the first batch of questionnaires had been digitised, we made a call-out to volunteer organizations in various parts of Wales, inviting people to transcribe the handwritten responses. By this time (December 2020), Wales was in yet another lockdown and it seemed a good time as any for the Museum to offer e-volunteering roles as part of its programme for the first time. We recruited eleven volunteers, all Welsh-speakers or leaners as the material was mostly in Welsh. The digitised images were sent directly to each volunteer to be transcribed and where possible, the volunteers received material of interest to their locality. We met over Zoom and Teams to discuss the work and to share experiences. To date, the volunteers have contributed at least 180 hours of their time and we are extremely grateful to them for their input. The transcribed material will eventually be seen alongside the digitised questionnaire responses through the museum’s online collections database.

The next stage of the project involved creating a new Covid questionnaire for 2021. As Wales had been through a fire-break, lockdown and mass-vaccination programme we felt that there was a lot of information we still needed to capture that wasn’t included in the 2020 questionnaire. This time though we wanted to consult with our community partners first to help us with co-producing the questions so that we could be more inclusive. With the help of our partners, our aim is to try and reach those communities that may have not taken part in the 2020 questionnaire and who have been particularly affected by the pandemic.

With the launch of the 2021 Collecting Covid questionnaire in June, we are now trying to get as many people to take part so that everyone’s voices are captured and recorded as part of a ‘national memory’ for Wales. We also hope to engage with local history societies from some of the areas that responded to the Museum’s original call to action, to invite them to research some of the local stories. What we hope to achieve through this project is a blueprint for future collecting and engagement by learning from the collecting methods of the past enabling agile, rapid and responsive collecting for the future.

Recording everyday life in 20th Century Wales

Lowri Jenkins, 1 September 2021

For the past eight months I have been gathering information about the questionnaires sent out by St Fagans National Museum of History in various decades during the twentieth century. The questionnaires focus on documenting everyday life in Wales and collecting information about what respondents knew about various subjects relating to the social and cultural life of the Welsh people. They open-up a fascinating insight into how people viewed the world around them and what they did in their everyday lives, as well as describing items they used for work or had in their homes. The responses are often very detailed about local knowledge. The earliest questionnaires in the collection are from 1937. Many respondents included sketches of the items they were describing and took time to give explanations about how they were used.

One respondent in the 1937 collection was W. Beynon Davies who gave information on the area of Dyffryn Aeron and central Cardiganshire. In this sketch he describes two different types of Gambo cart used in Duffryn Aeron, one with sides and the other with lower sides for different uses. G. Elfed Jones of Maesteg also gives many interesting facts about domestic items and interesting comments about his area including a sketch of a box iron and rubbings of a Crown Copper Penny dated 1811. Islwyn Ffowc Elis, who later became a Welsh author, was a school child at the time of the first questionnaires, includes pencil sketches of a goffering iron and other household items used in his home.

Other fascinating versions are questionnaires sent out in 1957 in collaboration with Anglesey Rural Community Council and the Welsh Folk Museum as St Fagans was known at the time. They listed in detail a typical day of work and eating patterns of rural workers. Questions included what time did respondents get up in the morning? What time did they have breakfast? What times of day did they work? What type of work did they do? What type of meals did they eat and what were the foods or meals called? It gives an unique picture of the hard work and long days often worked by rural workers.  Frances Grace Hughes of Llanfachreth records her working day would start at 5am, and ˈbedtime would be at 9pm. Servants would rise first and then the master and mistress of the house would wake. Before breakfast, the stables and cow sheds would be cleaned. Breakfast would be at 6am which would consist of bread and milk. Lunch would be at 12pm and consist of potatoes and bacon. Various tasks would be completed throughout the day including milking at 7am and 5pm and the evening meal would at 8pm. After supper they would amuse themselves by singing and telling stories or preparing meals for the next day.

In 1958, Lewis Williams, who was living in Treharris, vividly remembers Christmas traditions he enjoyed as a child in the Corris area and gives a detailed account of all the activities undertaken by the community including Calennig, and also the tradition of ˈcigaˈ during the hard winter of 1894 where neighbouring farms would give a donation of meat to struggling families.

During the 1970s and 1980s the questionnaires focused on dialect and language and housekeeping and food preparation. On the subjects of foods and cooking respondents were asked about the area they were giving information about e.g. on a farm, in a rural area, in a slate quarrying area or in a coal mining area. Respondents were asked to give information about foods eaten in different seasons, on different days and at every meal.

This is just a flavour of the information held in this collection and a window onto a period of Welsh life. The responses to the questionnaires and answer books can all be seen here.

E-volunteer guest blog

Rhodri Edwards, 1 September 2021

Hello, I’m Rhodri Edwards and I’ve been e-volunteering with the National Museum Wales since January 2021. I’m currently studying A Levels in History, English Literature and Geography at Aberaeron Comprehensive School. I’ve been helping the History and Archaeology Department at St Fagans National Museum of History to transcribe some questionnaires that detail life in Wales from the 1930s to the 1980s. Everyone at the National Museum Wales has been extremely helpful and friendly, especially my supervisor Sioned Williams who always responds quickly to any queries I have and kindly offers lots of advice and support. Sioned gave a training session when I began which introduced us to the volunteering work, giving helpful tips such as using the word [sic] behind any spelling, punctuation and grammar mistakes and putting question marks after any words that we are unsure of, which allows us to work effectively, independently and at our own pace. Through transcribing some of the questionnaires, I’ve learned a lot about the social history and heritage of my local area such as the agricultural and farming way of life, farmhouse features, what types of food farming families often produced and ate, for example dairy products like cream and cheese. Reading about people’s experiences has brought history to life for me and it is interesting to see how Welsh dialect has changed since the 1930s. I enjoy volunteering and working with the National Museum Wales, attending Zoom and Microsoft Teams meetings where I listen to the experiences of other volunteers about what they have learned and discovered when they have been transcribing the historical questionnaires. Meeting other people working and volunteering for the National Museum Wales, joining events thoughtfully arranged by the National Museum Wales such as the virtual parties on Eventbrite and receiving a volunteering pack have all helped make me feel a valued part of the National Museum Wales. I’ve developed my proofreading skills by transcribing questionnaires and I’ve gained a greater insight into the subject of History which has provided me with valuable experience. Thank you very much to everyone for all their help and support, I appreciate the opportunity to be a volunteer at the National Museum Wales.

Introducing the Art Detectives: sitter of Augustus John painting identified by online network

Jennifer Dudley, 17 August 2021

Amgueddfa Cymru is home to almost 1,400 paintings and drawings by Augustus John (1878-1961). A prolific portraitist, John painted many notable figures such as the poet and writer Dylan Thomas and the musician Guilhermina Suggia. He also made frequent sketches – in both pencil and oil paint – of unnamed people he encountered in everyday life. One such work in our collection has recently had its sitter identified thanks to the crowd-sourced resource Art Detective, where art lovers and experts can discuss artworks in public UK collections.

The work in question depicts a distinctive looking woman with cropped hair and a full fringe, sporting an inquisitive expression on her face. While the model’s dress and lower body is loosely sketched out, her face is richly detailed, suggesting that she was known to the artist.

A discussion about this painting was launched on Art Detective after Dr. Margot Schwass wrote in to share her research into Greville Texidor (1902-1964) and her belief that this is the “lost” Augustus John portrait of the author and world traveler. Schwass comments that: “When I chanced across an image of the portrait in the Amgueddfa Cymru collection, I knew straight away that it was Greville”. This prompted a lively and well-researched discussion among other Art Detective users, leading to our curatorial team being utterly convinced that this is in fact a portrait of Texidor, who, it was uncovered, worked as John’s secretary in the early 1920s.

We would like to thank Dr. Schwass for contributing her research and helping us learn more about this work in our collection. Her 2019 book All the Juicy Pastures is the first to tell the story of Texidor's extraordinary life.

You can read more about Art UK’s Art Detective Network here.