|Table of Contents|

Application of corpus in the study of the Yellow River in the Qing Dynasty from the perspective of digital humanities(PDF)

《长安大学学报(社科版)》[ISSN:1671-6248/CN:61-1391/C]

Issue:
2024年03期
Page:
125-138
Research Field:
数字人文
Publishing date:
2024-06-10

Info

Title:
Application of corpus in the study of the Yellow River in the Qing Dynasty from the perspective of digital humanities
Author(s):
PAN Wei XU Juan
(School of History and Archives, Yunnan University, Kunming 650091, Yunnan, China)
Keywords:
digital humanities corpus application analysis and part-of-speech tagging Qing Dynasty Yellow River “Financial Records of River Canals”
PACS:
K061
DOI:
-
Abstract:
The study of the Yellow River during the Qing Dynasty is a significant area of research in historical geography. The advancement of informatization in historical geography has provided a wealth of digital historical materials for studying the Yellow River in this period, while also introducing new challenges in organizing, analyzing, and extracting knowledge from these materials. Introducing corpus analysis into the study of the Yellow River in the Qing Dynasty can facilitate more in-depth research by enabling multilingual, diachronic, and large-scale data processing and analysis. This study begins with word segmentation and part-of-speech tagging to explain the necessity of using a corpus in researching the Yellow River in the Qing Dynasty. It proposes a set of feasible technical specifications for constructing an annotated corpus of the Yellow River in the Qing Dynasty. An empirical study is conducted using a portion of the “Financial Records of River Canals” corpus from the Qing Dynasty. The study extracts data on the source, destination, quantity, and timing of the subsidies allocated by the government for the Yellow River over the years from the annotated corpus based on parts of speech such as annotated dates, locations, organizations, numerals, and quantifiers. The sources include locations and silver items. This indicator data supports rapid knowledge organization and in-depth knowledge discovery, reflecting changes in the fiscal system of the river construction subsidy system during different periods of the Qing Dynasty and its impact on the Yellow River construction works.

References:

[1] 习近平.在黄河流域生态保护和高质量发展座谈会上的讲话[EB/OL].(2019-10-15)[2024-01-15].http://www.xinhuanet.com/politics/leaders/2019-10/15/c_1125107042.htm.
[2]潘威,夏翠娟,张光伟,等.历史地理信息化与图情研究融合的必要性与可行性——以“数字历史黄河”为中心的考察[J].图书情报知识,2021(3):37,50.
[3]潘威,白江涛,夏翠娟,等.基于TGIS的专项历史地名库设计与搭建——以“数字历史黄河”地名库为例[J].数字人文研究,2022(1):13-24.
[4]《清代河务档案》编写组.清代河务档案[M].桂林:广西师范大学出版社,2022.
[5]黄水清,王东波.国内语料库研究综述[J].信息资源管理学报,2021(3):4-17.
[6]林玉萍,龙红,李彪,等.基于医学影像和病历文本的甲状腺多模态语料库构建与应用[J].西北大学学报(自然科学版),2021(2):198-206.
[7]曾凡斌,陈荷.基于谷歌图书语料库大数据的百年传播学发展研究[J].现代传播(中国传媒大学学报),2018(3):135-145.
[8]宋鹏飞.大气污染专题语料库构建与语料空间化方法研究[D].青岛:山东科技大学,2020.
[9]LIN B,YIP P,On the construction and application of a platform-based corpus in tourism translation teaching[J].International journal of translation,interpretation,and applied linguistics,2020(2):30-41.
[10]马海群,张涛.文献信息视阈下面向智慧服务的语料库构建研究[J].情报理论与实践,2019(6):124-130.
[11]付璐,李思,李明正,等.以清代医籍为例探讨中医古籍分词规范标准[J].中华中医药杂志,2018(10):4700-4705.
[12]胡俊峰,俞士汶.唐宋诗之计算机辅助深层研究[J].北京大学学报(自然科学版),2001(5):727-733.
[13]柯永红,江琛.古代汉语词性标注语料库建设述评[J].语料库语言学,2021(1):97-111.
[14]梁社会,陈小荷.先秦文献《孟子》自动分词方法研究[J].南京师范大学文学院学报,2013(3):175-182.
[15]王姗姗,王东波,黄水清,等.多维领域知识下的《诗经》自动分词研究[J].情报学报,2018(2):183-193.
[16]王晓玉,李斌.基于CRFs和词典信息的中古汉语自动分词[J].数据分析与知识发现,2017(5):62-70.
[17]FU X,YUAN T,LI X,et al.Research on the method and system of word segmentation and postagging for ancient Chinese medicine literature[J],Bioinformatics and biomedicine,2019(1):2493-2498.
[18]徐玉慧.中文N-gram分词模型改进[D].天津:天津财经大学,2018.
[19]刘畅,王东波,胡昊天,等.面向数字人文的融合外部特征的典籍自动分词研究——以SikuBERT预训练模型为例[J].图书馆论坛,2022(6):44-54.
[20]林立涛,王东波.古籍文本挖掘技术综述[J].科技情报研究,2023(1):78-91.
[21]石民,李斌,陈小荷.基于CRF的先秦汉语分词标注一体化研究[J].中文信息学报,2010(2):39-45.
[22]黄水清,王东波,何琳.基于先秦语料库的古汉语地名自动识别模型构建研究[J].图书情报工作,2015(12):135-140.
[23]王东波,高瑞卿,沈思,等.面向先秦典籍的历史事件基本实体构件自动识别研究[J].国家图书馆学刊,2018(1):65-77.
[24]崔竞烽,郑德俊,王东波,等.基于深度学习模型的菊花古典诗词命名实体识别[J].情报理论与实践,2020(11):150-155.
[25]杜悦,王东波,江川,等.数字人文下的典籍深度学习实体自动识别模型构建及应用研究[J].图书情报工作,2021(3):100-108.
[26]余馨玲,常娥.基于DA-BERT-CRF模型的古诗词地名自动识别研究——以金陵古诗词为例[J].图书馆杂志,2023(10):87-94.
[27]刘浏.古汉语典籍中的实体知识挖掘研究[D].南京:南京大学,2018.
[28]王一钒,李博,史话,等.古汉语实体关系联合抽取的标注方法[J].数据分析与知识发现,2021(9):63-74.
[29]潘威,岳佳雲.关于数字人文进入清代河流研究的若干想法[J].史学月刊,2023(1):116-121.
[30]潘威,张丽洁,张通.清代黄河河工银制度史研究[M].北京:中国社会科学出版社,2020.
[31]邓三鸿,胡昊天,王昊,等.古文自动处理研究现状与新时代发展趋势展望[J].科技情报研究,2021(1):1-20.
[32]刘石.文献学的数字化转向[J].文学遗产,2022(6):10-13.
[33]贾国静.水之政治:清代黄河治理的制度史考察[M].北京:中国社会科学出版社,2019.
[34]林伟杰,杨阳,文玉锋,等.古籍知识组织中的知识计算:理论特性与基础指标[J].图书与情报,2022(5):24-30.

Memo

Memo:
-
Last Update: 2024-06-10