Digitizing Vietnam: New Directions in Digital Humanities and AI

Weatherhead East Asian Institute · April 28, 2026
Digitizing Vietnam: New Directions in Digital Humanities and AI

On April 18, 2026, scholars, librarians, technologists, and students gathered at Butler Library for a full-day workshop titled “Digitizing Vietnam: Vietnamese Studies in the Age of Digital Humanities and Artificial Intelligence”,  exploring how digital humanities (DH) and artificial intelligence (AI) are reshaping Vietnamese Studies. More than a presentation of tools, the workshop became a sustained inquiry into a deeper question: what happens to a field grounded in language, history, and interpretation when computational systems increasingly mediate its materials?

 

Digitizing Vietnam as an Indra’s Net

 

From the outset, the conversation was framed not around technology alone, but around relationships.

 

DSC06372 (2).JPG

Prof. John Phan shares the conceptual framework of Digitizing Việt Nam, titled "Indra's Net".

 

Prof. John Phan introduced a conceptual lens that would echo throughout the day, describing the emerging ecosystem of digital collections as something like Indra’s Net: a structure in which each node reflects all others, and meaning emerges through connection. The issue, as he emphasized, is not simply access. It is that materials, though digitized, still do not fully “speak to each other.” What is needed is not just aggregation, but a relational infrastructure.

 

It is within this frame that Digitizing Vietnam was introduced, not only as a platform, but as a network built through collaboration.

 

vu minh hoang.jpg

Dr. Hoang Vu elaborates on the Columbia-Fulbright collaboration of the project.

 

In the opening remarks, Dr. Hoang Vu emphasized that the project is fundamentally sustained through partnerships across institutions, organizations, and individuals working at different scales of preservation and research. These include collaborations with Vietnamese institutions such as Viện Hán Nôm and Đại học Khoa học Tự nhiên (Đại học Quốc gia TP.HCM), as well as international partners including the Southeast Asian Digital Library (SEADL), the Vatican Digital Library, and the University of Washington. Alongside institutional partnerships, he underscored the importance of individual scholars, archivists, and contributors whose work enables collections to be digitized, interpreted, and shared. Taken together, these relationships form the underlying infrastructure of the platform, one that extends beyond any single institution.

 

The morning session then unfolded the platform across four intertwined dimensions: collections, research, pedagogy, and outreach. These were not presented as separate modules, but as a continuous system. Collections anchor the work in large-scale digitized corpora, Hán-Nôm texts, periodicals, and multimedia archives. Pedagogy and outreach extend these materials into classrooms and public engagement. Research includes digital tools, OCR pipelines, search systems, and experimental AI interfaces that mediate between them, enabling new forms of interaction.

 

Yet a key point emerged early: digitization is not simply extraction.

 

Nguyen Phuong Tram.jpg

Dr. Tram Nguyen highlights the central principle of the project: "to be human-centered".

 

As Dr. Tram Nguyen emphasized, the project aims not only to preserve cultural heritage, but to “bring the archive back to life.” This involves transforming static materials into active sites of engagement—spaces where historical texts enter into dialogue with contemporary users, classrooms, and creative practices.

 

Van Le.jpg

Vân Lê expands on the evolution of extraction into regeneration and re:generation.

 

Vân Lê pushed this idea further by reframing outreach as a space of re:generation. If collections are where data is aggregated and validated, outreach is where that data is transformed—reworked into narrative, recombined into new forms, and used to generate meaning. Human experience, in this framing, is not structured by data alone, but by narrative and poetry. AI, therefore, is not the endpoint of interpretation, but a tool for opening new interpretive pathways.

 

Screenshot 2026-04-28 at 15.47.22.png

Phúc Lê explains the AI Diplomat Vision and the distributed digital Indra's net.

 

It is precisely at this point that Phúc Lê’s presentation extends the argument outward.

 

If data can be regenerated within a system, what happens when systems themselves begin to communicate?

 

Phúc Lê proposed the idea of an “AI diplomat”: a network of specialized, collection-grounded AI systems, each rooted in its own corpus but capable of querying and collaborating with others. Rather than isolated tools, these systems would form an ecosystem, routing questions across collections, synthesizing responses, and enabling research that spans domains and geographies.

 

In many ways, this vision operationalizes Phan’s earlier metaphor. Indra’s Net becomes not just a way of thinking about collections, but a model for how computational systems themselves might interact.

 

albert errickson.jpg

Albert Errickson showcases his multiple personal projects as a humanities researcher, including Nôm Flow, a learninng tool for Nôm.

 

Albert Errickson demonstrated how AI is reshaping the playing field for humanities scholars. He presented several projects he has developed with the support of AI, including the Han Nôm Research Hubs, which integrate a wide range of Hán Nôm collections with tools for working directly on the texts, such as OCR and document comparison. He also engaged the audience with his NomFlow app, a learning tool that uses spaced repetition to guide users through canonical Nôm works line by line, with stroke order provided for each character.

 

From here, the workshop moved into broader discussions across Vietnamese Studies and beyond, with Tuan Hoang and Virginia Shih, where this tension between scale and meaning became more concrete.

 

Constraints and Tensions: Data, Bias, and Interpretation

 

Speakers in the early afternoon highlighted both the promise and unevenness of computational approaches. Large-scale corpus analysis enables new forms of pattern recognition and contextualization, yet these capabilities depend on incomplete datasets, linguistic complexity, and uneven digitization. The challenge is not only technical, but it is also epistemological.

 

cindy nguyen.jpg

 

Prof. Cindy Nguyen captured this tension clearly. While emphasizing the power of turning words into vectors and uncovering semantic patterns across corpora, she also returned to the importance of the question space itself. Human inquiry, slow, dialogic, and situated, remains central. The proliferation of AI tools does not eliminate that space; it makes its role more visible.

 

The conversation then shifted toward the infrastructures that make these systems possible.

Screenshot 2026-04-28 at 16.27.14.png

Emily Zinger speaks about SEADL and how she has been using and questioning AI in archival work.

 

In the session on digital collections and curatorship, speakers emphasized that AI is only as strong as the data it is built upon. Emily Zinger introduced the concept of ground-truth diversity, noting that many existing models are trained on datasets that marginalize Southeast Asian materials. What appears as technical success often conceals systemic gaps. Training data, in this sense, is not neutral; it is a form of curatorial practice that shapes what AI systems can recognize and reproduce.

 

Screenshot 2026-04-28 at 16.26.10.png

Dr. Judith Henchy presents early efforts of preservation and digitization in Vietnam.

 

Dr. Judith Henchy extended this perspective historically, situating digitization within a longer continuum of preservation technologies. From microfilm to early digital archives, each stage has involved trade-offs between access, control, and sustainability. Her observation that “digitization is not the solution, it’s part of the problem” underscored the need for critical engagement rather than technological optimism.

 

shimizu .jpg

Prof. Masaaki recounts digitizing initiatives of Han Nom collection at University of Osaka and University of Kyoto.

 

These questions became even more concrete in Prof. Shimizu Masaaki’s discussion of Chữ Nôm. OCR technologies remain limited in their ability to process complex historical scripts. Yet digitization still fundamentally reshapes pedagogy. High-resolution images enable students to engage texts analytically, comparing character structures, tracing phonetic components, and uncovering patterns across sources. In this context, AI functions not as a replacement for expertise, but as a potential collaborator within a human-led interpretive process.

 

By the late afternoon, the scope expanded to situate these discussions within broader East Asian digital humanities. Shared challenges, multilingual corpora, classical texts, and uneven archival landscapes suggested that Vietnamese Studies is part of a larger transformation across the humanities.

 

The closing roundtable returned to the central questions that had surfaced throughout the day. What constitutes meaningful knowledge in an AI-mediated environment? How do we balance scale with nuance? And what kinds of collaborations, between scholars, institutions, and systems, will define the next phase of the field?

 

peter bol.jpg

Prof. Peter Bol discussed how AI has brought the humanities back to itself by going beyond quantitative methods and relying on language  as the major medium of inquiry instead.

 

As Prof. Peter Bol observed, the significance of AI may lie in a subtle but important shift. Where earlier digital humanities emphasized quantification, AI brings the field back toward language, toward interpretation, ambiguity, and dialogue.

 

The workshop closed with a dinner reception, but its core insight remained open-ended. The future of Vietnamese Studies will not be determined by technology alone, but by how it is shaped, through curation, critical engagement, and creative use.

 

In this sense, Digitizing Vietnam is both a platform and a proposition: that archives can become dynamic spaces of interaction, where data is not the endpoint of knowledge, but the beginning of new forms of understanding.