Skip to main content

A Platform for AI-Assisted Archival Metadata Generation

  • Conference paper
  • First Online:
Culture and Computing (HCII 2025)

Abstract

This paper presents our latest work on Computational Linguistics Applications for Multimedia Services (CLAMS), a open-source Artificial Intelligence (AI) and machine learning (ML) platform for various cultural institutions in the GLAM sector. CLAMS provides a framework for developing and implementing ML-based computational multimedia analysis tools, and optimizes the processing of audiovisual archival material by seamlessly integrating tools across various media types, including text, audio, video, and images. CLAMS’s primary function, automated content analysis and information extraction, provides archivists with an AI-assisted environment for metadata refinement. This will enable the cataloging of extensive audiovisual collections, which would be impossible to complete manually, thus ultimately increasing the usability of the audiovisual archives and allowing library patrons and media researchers to discover and search the archives more easily.

At the core of CLAMS interoperability is the Multi-Media Interchange Format (MMIF), a structured, JSON-based data abstraction that supports a consistent data exchange layer between different computational analysis tools, including AI and ML applications. This allows annotations from one tool to be easily used by others, enabling complex automated content analysis workflows.

The paper describes specifics of MMIF, the CLAMS platform and ecosystem, and case studies of CLAMS workflows and evaluation schemes using data from the American Archive of Public Broadcasting (AAPB). These use cases illustrate how CLAMS can enhance metadata for mass-digitized multimedia collections, that is often only implicitly available within the digitized media and are largely unsearchable and held in archives and libraries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 60.98
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 79.17
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

') var buybox = document.querySelector("[data-id=id_"+ timestamp +"]").parentNode var buyingOptions = buybox.querySelectorAll(".buying-option") ;[].slice.call(buyingOptions).forEach(initCollapsibles) var buyboxMaxSingleColumnWidth = 480 function initCollapsibles(subscription, index) { var toggle = subscription.querySelector(".buying-option-price") subscription.classList.remove("expanded") var form = subscription.querySelector(".buying-option-form") var priceInfo = subscription.querySelector(".price-info") var buyingOption = toggle.parentElement if (toggle && form && priceInfo) { toggle.setAttribute("role", "button") toggle.setAttribute("tabindex", "0") toggle.addEventListener("click", function (event) { var expandedBuyingOptions = buybox.querySelectorAll(".buying-option.expanded") var buyboxWidth = buybox.offsetWidth ;[].slice.call(expandedBuyingOptions).forEach(function(option) { if (buyboxWidth buyboxMaxSingleColumnWidth) { toggle.click() } else { if (index === 0) { toggle.click() } else { toggle.setAttribute("aria-expanded", "false") form.hidden = "hidden" priceInfo.hidden = "hidden" } } }) } initialStateOpen() if (window.buyboxInitialised) return window.buyboxInitialised = true initKeyControls() })()

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    available at https://vocabulary.clams.ai/.

  2. 2.

    available at https://apps.clams.ai.

  3. 3.

    Due to copyright, not all the images are included in the final dataset artifact.

  4. 4.

    Specifically the model at https://huggingface.co/llava-hf/llava-v1.6-mistral-7b-hf, 4-bit quantized for processing speed.

  5. 5.

    https://metaflow.org/.

References

  1. Galaxy Project. https://galaxyproject.org/. Accessed 14 Feb 2025

  2. Aichroth, P., Sieland, M., Cuccovillo, L., Köllmer, T.: The MICO broker: an orchestration framework for linked data extractors. In: Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop Co-Located with the 13th Extended Semantic Web Conference ESWC 2016, vol. 1615 (2016)

    Google Scholar 

  3. McManus, B.: Investigation of Best Practice in Metadata for Sound, Moving Image & Audiovisual Collections. MPhil, Department of Information Studies, Aberystwyth University (2020)

    Google Scholar 

  4. Botticelli, P., Roe, B., Troia, L.: The American archive of public broadcasting: media access and preservation. In: Botticelli, P., Mahard, M.R., Cloonan, M.V. (eds.) Libraries, Archives, and Museums Today: Insights from the Field, pp. 39–47. Rowman & Littlefield (2019)

    Google Scholar 

  5. Dunn, J.W., et al.: Audiovisual metadata platform pilot development (AMPPD), final project report (2021)

    Google Scholar 

  6. Gallegos, I.O., et al.: Bias and fairness in large language models: a survey. Comput. Linguist. 50(3), 1097–1179 (2024). https://doi.org/10.1162/coli_a_00524

    Article  Google Scholar 

  7. Greenberg, J.: The applicability of Natural Language Processing (NLP) to archival properties and objectives. Am. Archivist 61(2), 400–425 (1998). https://doi.org/10.17723/aarc.61.2.j3p8200745pj34v6

  8. Haslhofer, B., Klas, W.: A survey of techniques for achieving metadata interoperability. ACM Comput. Surv. 42(2), 7:1–7:37 (2010). https://doi.org/10.1145/1667062.1667064

  9. Heid, U., Schmid, H., Eckart, K., Hinrichs, E.: A corpus representation format for linguistic web services: the D-SPIN text corpus format and its relationship with ISO standards. In: Calzolari, N., et al. (eds.) Proceedings of the Seventh International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA), Valletta, Malta (2010). https://aclanthology.org/L10-1348/

  10. Hendrycks, D., et al.: Measuring massive multitask language understanding. In: International Conference on Learning Representations (2020). https://openreview.net/forum?id=d7KBjmI3GmQ

  11. Jörgensen, C.: The MPEG-7 standard: multimedia description in theory and application. J. Am. Soc. Inform. Sci. Technol. 58(9), 1323–1328 (2007). https://doi.org/10.1002/asi.20571

    Article  Google Scholar 

  12. Kroll, M., Kraus, K.: Optimizing the role of human evaluation in LLM-based spoken document summarization systems. In: Interspeech 2024, pp. 1935–1939 (2024). https://doi.org/10.21437/Interspeech.2024-2268

  13. Lewis, S.C., Zamith, R., Hermida, A.: Content analysis in an era of big data: a hybrid approach to computational and manual methods. J. Broadcasting Electron. Media 57(1), 34–52 (2013). https://doi.org/10.1080/08838151.2012.761702

    Article  Google Scholar 

  14. Liu, H., et al.: LLaVA-NeXT: improved reasoning, OCR, and world knowledge (2024). https://llava-vl.github.io/blog/2024-01-30-llava-next/

  15. Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.: A ConvNet for the 2020s. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11976–11986 (2022). https://openaccess.thecvf.com/content/CVPR2022/html/Liu_A_ConvNet_for_the_2020s_CVPR_2022_paper.html

  16. Llama team: The Llama 3 Herd of Models (2024). https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

  17. Lynch, K., Jiang, B., Lambright, B., Rim, K., Pustejovsky, J.: Video content summarization with large language-vision models. In: 2024 IEEE International Conference on Big Data (BigData), pp. 2456–2463 (2024). https://doi.org/10.1109/BigData62323.2024.10825195

  18. Heller, M.: Frameworks for analyzing the use of generative artificial intelligence in libraries. Comput. Libr. 44(10) (2024). https://www.infotoday.com/cilmag/dec24/Heller--Frameworks-for-Analyzing-the-Use-of-Generative-Artificial-Intelligence-in-Libraries.shtml

  19. Meyer, M., Conroy, M.: See it, be it: what children are seeing on TV. Technical report (2022). https://geenadavisinstitute.org/research/see-jane-2022-tv-see-it-be-it-what-children-are-seeing-on-tv/

  20. Mühling, M., et al.: VIVA: visual information retrieval in video archives. Int. J. Digit. Libr. 23(4), 319–333 (2022). https://doi.org/10.1007/s00799-022-00337-y

    Article  Google Scholar 

  21. Nandzik, J., et al.: CONTENTUS-technologies for next generation multimedia libraries. Multimed. Tools Appl. 63(2), 287–329 (2013). https://doi.org/10.1007/s11042-011-0971-2

    Article  Google Scholar 

  22. Oard, D.W., et al.: Cross-language access to recorded speech in the MALACH project. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 57–64. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-46154-X_8

    Chapter  Google Scholar 

  23. Raemy, J.A., Fornaro, P., Rosenthaler, L., Fornaro, P., Rosenthaler, L.: Implementing a video framework based on IIIF: a customized approach from long-term preservation video formats to conversion on demand. In; Archiving Conference, vol. 14, pp. 68–73. Society for Imaging Science and Technology (2017). https://doi.org/10.2352/issn.2168-3204.2017.1.0.68

  24. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. In: Inui, K., Jiang, J., Ng, V., Wan, X. (eds.) Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1410

  25. Rim, K., Lynch, K., Pustejovsky, J.: Computational linguistics applications for multimedia services. In: Alex, B., Degaetano-Ortlieb, S., Kazantseva, A., Reiter, N., Szpakowicz, S. (eds.) Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pp. 91–97. Association for Computational Linguistics, Minneapolis, USA (2019). https://doi.org/10.18653/v1/W19-2512

  26. Rubin, N.: The PBCore metadata standard: a decade of evolution. J. Digit. Media Manag. 1(1), 55–68 (2012)

    Google Scholar 

  27. Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4938–4947 (2020). https://openaccess.thecvf.com/content_CVPR_2020/html/Sarlin_SuperGlue_Learning_Feature_Matching_With_Graph_Neural_Networks_CVPR_2020_paper.html

  28. Schmidt, T., et al.: An exchange format for multimodal annotations. In: International LREC Workshop on Multimodal Corpora, pp. 207–221 (2008). https://link.springer.com/chapter/10.1007/978-3-642-04793-0_13

  29. Schweikert, A.: Audiovisual Algorithms: New Techniques for Digital Processing. Master of Arts, Moving Image Archiving and Preservation Program, New York University (2019)

    Google Scholar 

  30. Soucek, T., Lokoc, J.: TransNet V2: an effective deep network architecture for fast shot transition detection. In: Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, pp. 11218–11221. Association for Computing Machinery, New York (2024). https://doi.org/10.1145/3664647.3685517

  31. Sultana, F., Sufian, A., Dutta, P.: Advancements in image classification using convolutional neural network. In: 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), pp. 122–129 (2018). https://doi.org/10.1109/ICRCICN.2018.8718718

  32. Tiribelli, S., Pansoni, S., Frontoni, E., Giovanola, B.: Ethics of artificial intelligence for cultural heritage: opportunities and challenges. IEEE Trans. Technol. Soc. 5(3), 293–305 (2024). https://doi.org/10.1109/TTS.2024.3432407

    Article  Google Scholar 

  33. Verhagen, M., et al.: The LAPPS interchange format. In: Murakami, Y., Lin, D. (eds.) WLSI 2015. LNCS (LNAI), vol. 9442, pp. 33–47. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31468-6_3

    Chapter  Google Scholar 

  34. Weller, A., Bleisteiner, W., Hufnagel, C., Iber, M.: The Future is Meta: Metadata, Formats and Perspectives towards Interactive and Personalized AV Content (2024). https://doi.org/10.48550/arXiv.2407.19590

Download references

Acknowledgments

The authors gratefully acknowledge the generous support of the Andrew W. Mellon Foundation, whose funding made this research possible. We appreciate their commitment to advancing scholarship in this area and enabling the development of the free and open-source resources and methodologies described herein.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyeongmin Rim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rim, K., King, O.C., Lynch, K., Verhagen, M., Pustejovsky, J. (2025). A Platform for AI-Assisted Archival Metadata Generation. In: Rauterberg, M. (eds) Culture and Computing. HCII 2025. Lecture Notes in Computer Science, vol 15800. Springer, Cham. https://doi.org/10.1007/978-3-031-93160-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-93160-4_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-93159-8

  • Online ISBN: 978-3-031-93160-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics