Cho Myung-dae, CEO of Linked Data Center, demonstrated how a digital archive works by taking the Korean War records as an example. By clicking on the keyword, a number of search results instantly pop up in a well-arranged list of related subcategories such as specific battles, locations and people. This way, users don’t have to dig through piles of data without having a clue what they will get in the end.
“Searching through string matches is essentially limited,” Cho said. “We have to put together as much data as possible and filter out unnecessary information that is not needed by users.”
Cho, a digital archive specialist who is also an adjunct professor of information and archival science at Hankuk University of Foreign Studies’ graduate school, outlined what should be done to build up an efficient and viable digital archive at an online forum titled “Digital Silk Road,” held on Sunday evening.
With digital technology specialists and professors in attendance via Zoom, Cho stressed that digital archive developers should bear in mind that the process of filtering, or what is known as “disambiguation,” is key to increasing the chance that users can discover more closely related pieces of data.
“Amazon is famous for doing such simplified search, reducing the number of multiple search results to a couple of final clicks,” Cho said. “We have to use semantic priming based on human psychology in offering a guided online navigation for digital archives.”
Both state agencies and private institutions have been establishing a wide range of digital archives, a treasure trove of social, cultural and historical assets in a digital format.
Provincial governments are spearheading various digital archive programs to preserve local cultural assets, artworks and documents so that they could be better shared and studied online.
Museums and universities are also actively setting up digital archives in hopes that they will not fall behind in the shift toward web-based information search and academic endeavors.
The push for such digital archives has gained a strong momentum after COVID-19 forced people to stay at home and browse the internet for entertainment and information. The question is what data should be stored and which technology is more efficient in organizing public and private databases.
Participants attend an online academic seminar on digital archives, titled “Digital Silk Road,” hosted by the information and archival science department of Hankuk University of Foreign Studies’ graduate school on Sunday. (Hankuk University of Foreign Studies)
Treasure trove of hidden discoveries
An archive refers to “a place where people can go to gather firsthand facts, data, and evidence from letters, reports, notes, memos, photographs, and other primary sources,” according to the US National Archives and Records Administration, which manages a vast amount of official government data with some 3,000 employees.
The US archive agency says all people also have personal archives, or a collection of material that records important events from their family’s history.
Thanks to digital technologies, the proliferation of personal and official data in cyberspace is accelerating at a breakneck pace. The use of digital archives is also on the rise. But what digital archives really are is often misunderstood.
Researchers in the field of digital humanities say that a simple digitalization of data -- a digital extension of the existing offline archives -- is far from enough. The primary task behind the build-up of digital archives is to provide a venue where basic data and resources are readily available online so that both the public and professional researchers can use them with ease and understand their social, cultural and historical contexts.
While traditional education takes place in separate offline spaces such as universities, museums and historical sites, today’s students and researchers are moving to digital platforms that offer a rapidly increasing storage of all sorts of archives, a mixture that appears to provide more chances to identify hidden relations among facts.
The Korean government, recognizing the implications of such digital archives, is helping establish a variety of online archives. For instance, the National Museum of Korean Contemporary History runs a digital archive that includes historically important photographs, documents and related information. It has 44 categories of data ranging from politics and culture to Japanese colonial rule and pro-democracy movements. The museum’s digital archive also provides what is called an “open application programming interface” that allows users and third-party companies to develop application programs and services based on its database.
The need for digitalization is not limited to tangible documents and assets. Intangible data such as traditional songs are likely to get lost over time without special care. The Intangible Heritage Digital Archive is an example of a digital archive specializing in the preservation of intangible cultural assets such as traditional performing arts, music albums and festivals. Semantic web and its challenges
In constructing digital archives, the term “semantic web,” coined by Tim Berners-Lee, who is also the inventor of the World Wide Web, is often mentioned. The term refers to an envisioned system that enables machines to readily “understand” data and respond to complex human requests based on their meanings. To achieve this, the relevant information resources should be semantically structured. An intricate and logical design of ontology -- a core structure to link information systematically for the sematic web -- is also needed to ensure database operability and better knowledge management.
At the online forum, Kim Hyeon, a professor of cultural informatics at the Academy of Korean Studies, said during the discussion session that there are a host of challenges in building open and flexible archives. “If we use semantic-based search, we can get meaningful information in various ways,” said Kim. “However, the user interface is not friendly enough.”
Another problem, Kim said, is that even though the same word is used in a digital archive, people have different ideas about the term, whose actual meaning can diverge and then create conflictions in the database.
Kim, who leads a project to help recreate the ancient capital of Korea in cyberspace, said that 32 researchers on the project team map out their own ontology schemes and databases.
“The aim is eventually to interconnect all of the different databases, but a problem can rise when a single concept refers to two different definitions,” Kim said. “This type of problem cannot be resolved by the use of artificial intelligence.”
By Yang Sung-jin (firstname.lastname@example.org