A Platform for AI-Assisted Archival Metadata Generation

Rim, Kyeongmin; King, Owen C.; Lynch, Kelley; Verhagen, Marc; Pustejovsky, James

doi:10.1007/978-3-031-93160-4_12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15800))

Included in the following conference series:

International Conference on Human-Computer Interaction

244 Accesses
5 Altmetric

Abstract

This paper presents our latest work on Computational Linguistics Applications for Multimedia Services (CLAMS), a open-source Artificial Intelligence (AI) and machine learning (ML) platform for various cultural institutions in the GLAM sector. CLAMS provides a framework for developing and implementing ML-based computational multimedia analysis tools, and optimizes the processing of audiovisual archival material by seamlessly integrating tools across various media types, including text, audio, video, and images. CLAMS’s primary function, automated content analysis and information extraction, provides archivists with an AI-assisted environment for metadata refinement. This will enable the cataloging of extensive audiovisual collections, which would be impossible to complete manually, thus ultimately increasing the usability of the audiovisual archives and allowing library patrons and media researchers to discover and search the archives more easily.

At the core of CLAMS interoperability is the Multi-Media Interchange Format (MMIF), a structured, JSON-based data abstraction that supports a consistent data exchange layer between different computational analysis tools, including AI and ML applications. This allows annotations from one tool to be easily used by others, enabling complex automated content analysis workflows.

The paper describes specifics of MMIF, the CLAMS platform and ecosystem, and case studies of CLAMS workflows and evaluation schemes using data from the American Archive of Public Broadcasting (AAPB). These use cases illustrate how CLAMS can enhance metadata for mass-digitized multimedia collections, that is often only implicitly available within the digitized media and are largely unsearchable and held in archives and libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

€34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 60.98; Price includes VAT (Germany)

Softcover Book: EUR 79.17; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

') var buybox = document.querySelector("[data-id=id_"+ timestamp +"]").parentNode var buyingOptions = buybox.querySelectorAll(".buying-option") ;[].slice.call(buyingOptions).forEach(initCollapsibles) var buyboxMaxSingleColumnWidth = 480 function initCollapsibles(subscription, index) { var toggle = subscription.querySelector(".buying-option-price") subscription.classList.remove("expanded") var form = subscription.querySelector(".buying-option-form") var priceInfo = subscription.querySelector(".price-info") var buyingOption = toggle.parentElement if (toggle && form && priceInfo) { toggle.setAttribute("role", "button") toggle.setAttribute("tabindex", "0") toggle.addEventListener("click", function (event) { var expandedBuyingOptions = buybox.querySelectorAll(".buying-option.expanded") var buyboxWidth = buybox.offsetWidth ;[].slice.call(expandedBuyingOptions).forEach(function(option) { if (buyboxWidth buyboxMaxSingleColumnWidth) { toggle.click() } else { if (index === 0) { toggle.click() } else { toggle.setAttribute("aria-expanded", "false") form.hidden = "hidden" priceInfo.hidden = "hidden" } } }) } initialStateOpen() if (window.buyboxInitialised) return window.buyboxInitialised = true initKeyControls() })()

Institutional subscriptions

Multimodal Interoperability with the CLAMS Platform

Large Language Models to make museum archive collections more accessible

Article 27 February 2025

Explainable AI, LLM, and digitized archival cultural heritage: a case study of the Grand Ducal Archive of the Medici

Article Open access 25 March 2025

Notes

1.
available at https://vocabulary.clams.ai/.
2.
available at https://apps.clams.ai.
3.
Due to copyright, not all the images are included in the final dataset artifact.
4.
Specifically the model at https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf, 4-bit quantized for processing speed.
5.
https://metaflow.org/.

References

Galaxy Project. https://galaxyproject.org/. Accessed 14 Feb 2025
Aichroth, P., Sieland, M., Cuccovillo, L., Köllmer, T.: The MICO broker: an orchestration framework for linked data extractors. In: Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop Co-Located with the 13th Extended Semantic Web Conference ESWC 2016, vol. 1615 (2016)
Google Scholar
McManus, B.: Investigation of Best Practice in Metadata for Sound, Moving Image & Audiovisual Collections. MPhil, Department of Information Studies, Aberystwyth University (2020)
Google Scholar
Botticelli, P., Roe, B., Troia, L.: The American archive of public broadcasting: media access and preservation. In: Botticelli, P., Mahard, M.R., Cloonan, M.V. (eds.) Libraries, Archives, and Museums Today: Insights from the Field, pp. 39–47. Rowman & Littlefield (2019)
Google Scholar
Dunn, J.W., et al.: Audiovisual metadata platform pilot development (AMPPD), final project report (2021)
Google Scholar
Gallegos, I.O., et al.: Bias and fairness in large language models: a survey. Comput. Linguist. 50(3), 1097–1179 (2024). https://doi.org/10.1162/coli_a_00524
Article Google Scholar
Greenberg, J.: The applicability of Natural Language Processing (NLP) to archival properties and objectives. Am. Archivist 61(2), 400–425 (1998). https://doi.org/10.17723/aarc.61.2.j3p8200745pj34v6
Haslhofer, B., Klas, W.: A survey of techniques for achieving metadata interoperability. ACM Comput. Surv. 42(2), 7:1–7:37 (2010). https://doi.org/10.1145/1667062.1667064
Heid, U., Schmid, H., Eckart, K., Hinrichs, E.: A corpus representation format for linguistic web services: the D-SPIN text corpus format and its relationship with ISO standards. In: Calzolari, N., et al. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Valletta, Malta (2010). https://aclanthology.org/L10-1348/
Hendrycks, D., et al.: Measuring massive multitask language understanding. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=d7KBjmI3GmQ
Jörgensen, C.: The MPEG-7 standard: multimedia description in theory and application. J. Am. Soc. Inform. Sci. Technol. 58(9), 1323–1328 (2007). https://doi.org/10.1002/asi.20571
Article Google Scholar
Kroll, M., Kraus, K.: Optimizing the role of human evaluation in LLM-based spoken document summarization systems. In: Interspeech 2024, pp. 1935–1939 (2024). https://doi.org/10.21437/Interspeech.2024-2268
Lewis, S.C., Zamith, R., Hermida, A.: Content analysis in an era of big data: a hybrid approach to computational and manual methods. J. Broadcasting Electron. Media 57(1), 34–52 (2013). https://doi.org/10.1080/08838151.2012.761702
Article Google Scholar
Liu, H., et al.: LLaVA-NeXT: improved reasoning, OCR, and world knowledge (2024). https://llava-vl.github.io/blog/2024-01-30-llava-next/
Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022). https://openaccess.thecvf.com/content/CVPR2022/html/Liu_A_ConvNet_for_the_2020s_CVPR_2022_paper.html
Llama team: The Llama 3 Herd of Models (2024). https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
Lynch, K., Jiang, B., Lambright, B., Rim, K., Pustejovsky, J.: Video content summarization with large language-vision models. In: 2024 IEEE International Conference on Big Data (BigData), pp. 2456–2463 (2024). https://doi.org/10.1109/BigData62323.2024.10825195
Heller, M.: Frameworks for analyzing the use of generative artificial intelligence in libraries. Comput. Libr. 44(10) (2024). https://www.infotoday.com/cilmag/dec24/Heller--Frameworks-for-Analyzing-the-Use-of-Generative-Artificial-Intelligence-in-Libraries.shtml
Meyer, M., Conroy, M.: See it, be it: what children are seeing on TV. Technical report (2022). https://geenadavisinstitute.org/research/see-jane-2022-tv-see-it-be-it-what-children-are-seeing-on-tv/
Mühling, M., et al.: VIVA: visual information retrieval in video archives. Int. J. Digit. Libr. 23(4), 319–333 (2022). https://doi.org/10.1007/s00799-022-00337-y
Article Google Scholar
Nandzik, J., et al.: CONTENTUS-technologies for next generation multimedia libraries. Multimed. Tools Appl. 63(2), 287–329 (2013). https://doi.org/10.1007/s11042-011-0971-2
Article Google Scholar
Oard, D.W., et al.: Cross-language access to recorded speech in the MALACH project. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 57–64. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46154-X_8
Chapter Google Scholar
Raemy, J.A., Fornaro, P., Rosenthaler, L., Fornaro, P., Rosenthaler, L.: Implementing a video framework based on IIIF: a customized approach from long-term preservation video formats to conversion on demand. In; Archiving Conference, vol. 14, pp. 68–73. Society for Imaging Science and Technology (2017). https://doi.org/10.2352/issn.2168-3204.2017.1.0.68
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1410
Rim, K., Lynch, K., Pustejovsky, J.: Computational linguistics applications for multimedia services. In: Alex, B., Degaetano-Ortlieb, S., Kazantseva, A., Reiter, N., Szpakowicz, S. (eds.) Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 91–97. Association for Computational Linguistics, Minneapolis, USA (2019). https://doi.org/10.18653/v1/W19-2512
Rubin, N.: The PBCore metadata standard: a decade of evolution. J. Digit. Media Manag. 1(1), 55–68 (2012)
Google Scholar
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020). https://openaccess.thecvf.com/content_CVPR_2020/html/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.html
Schmidt, T., et al.: An exchange format for multimodal annotations. In: International LREC Workshop on Multimodal Corpora, pp. 207–221 (2008). https://link.springer.com/chapter/10.1007/978-3-642-04793-0_13
Schweikert, A.: Audiovisual Algorithms: New Techniques for Digital Processing. Master of Arts, Moving Image Archiving and Preservation Program, New York University (2019)
Google Scholar
Soucek, T., Lokoc, J.: TransNet V2: an effective deep network architecture for fast shot transition detection. In: Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, pp. 11218–11221. Association for Computing Machinery, New York (2024). https://doi.org/10.1145/3664647.3685517
Sultana, F., Sufian, A., Dutta, P.: Advancements in image classification using convolutional neural network. In: 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 122–129 (2018). https://doi.org/10.1109/ICRCICN.2018.8718718
Tiribelli, S., Pansoni, S., Frontoni, E., Giovanola, B.: Ethics of artificial intelligence for cultural heritage: opportunities and challenges. IEEE Trans. Technol. Soc. 5(3), 293–305 (2024). https://doi.org/10.1109/TTS.2024.3432407
Article Google Scholar
Verhagen, M., et al.: The LAPPS interchange format. In: Murakami, Y., Lin, D. (eds.) WLSI 2015. LNCS (LNAI), vol. 9442, pp. 33–47. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31468-6_3
Chapter Google Scholar
Weller, A., Bleisteiner, W., Hufnagel, C., Iber, M.: The Future is Meta: Metadata, Formats and Perspectives towards Interactive and Personalized AV Content (2024). https://doi.org/10.48550/arXiv.2407.19590

Download references

Acknowledgments

The authors gratefully acknowledge the generous support of the Andrew W. Mellon Foundation, whose funding made this research possible. We appreciate their commitment to advancing scholarship in this area and enabling the development of the free and open-source resources and methodologies described herein.

Author information

Authors and Affiliations

Department of Computer Science, Brandeis University, Waltham, MA, 02453, USA
Kyeongmin Rim, Kelley Lynch, Marc Verhagen & James Pustejovsky
GBH Media Library and Archives, Boston, MA, 02135, USA
Owen C. King

Authors

Kyeongmin Rim
View author publications
Search author on:PubMed Google Scholar
Owen C. King
View author publications
Search author on:PubMed Google Scholar
Kelley Lynch
View author publications
Search author on:PubMed Google Scholar
Marc Verhagen
View author publications
Search author on:PubMed Google Scholar
James Pustejovsky
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Kyeongmin Rim .

Editor information

Editors and Affiliations

Eindhoven University of Technology, Eindhoven, The Netherlands
Matthias Rauterberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rim, K., King, O.C., Lynch, K., Verhagen, M., Pustejovsky, J. (2025). A Platform for AI-Assisted Archival Metadata Generation. In: Rauterberg, M. (eds) Culture and Computing. HCII 2025. Lecture Notes in Computer Science, vol 15800. Springer, Cham. https://doi.org/10.1007/978-3-031-93160-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-93160-4_12
Published: 25 May 2025
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-93159-8
Online ISBN: 978-3-031-93160-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Platform for AI-Assisted Archival Metadata Generation