Presenters
Language Big Data Approaches to COVID-19
Jinghang GU
Title: Extreme Multilabel Classification on Covid19 Literature.
Abstract
Extreme multi-label classification (XMC) problem can be found in many biomedical
applications, such as document indexing and disease categorization. Recently, with
the rapid development of deep neural networks, deep learning methods have
achieved outstanding performances in XMC tasks. This paper describes our work of
the XMC task on COVID-19 articles, where the objective is to attach documents
with the most relevant semantic labels from an extremely large label set. We first
constructed the COVID-19 semantic indexing corpus (CSIC) with MeSH terms
which consists of more than 80,000 COVID-19 articles. We then proposed to
leverage the correlation neural network to represent the latent label correlations to
enhance the model predictions. Experimental results show the correlation neural
network can significantly improve the prediction preformance and can be easily
extended to other existing deep XMC models.
Short Biography
Dr. Jinghang Gu is a postdoctoral researcher in the Department of Chinese and
Bilingual Studies (CBS), The Hong Kong Polytechnic University (PolyU), under the
supervision of Prof. Chu-Ren Huang. He received his M.S. and Ph.D. degrees from
the Department of Computer Science and Technology , Soochow University , China,
in 2014 and 2017, respectively. Before he joined PolyU CBS, Dr. Gu worked as a
senior natural language processing engineer in Big Data Group, Baidu. His
research focuses on Natural Language Understanding and Big Data Mining. He is
currently working on biomedical information extraction, knowledge discovery and
extreme multi-label classification.
Christine M. JI
Title: Bayesian Network analysis of the Australian Bureau of Statistics COVID-19 Household Survey.
Short Biography
Christine Meng Ji specialises in empirical translation studies, especially data- driven
multilingual corpus analyses. She has published on environmental translation,
healthcare translation, statistical translation stylistics/authorship attribution, and
international multilingual education (statistical translation quality evaluation). She is
the author/editor of more than two dozen research books (with Cambridge
University Press, Oxford University Press, Routledge, Palgrave, Springer, John
Benjamins, Waseda University Press in Tokyo, University of Montréal Press), the
editor of two special journal issues published by the MIT Press (Leonardo), USA
and University of Montreal Press in Canada (Meta: Journal des traducteurs), and
more than 50 journal papers and book chapters on empirical translation studies.
She is the editor of The Oxford Handbook of Translation and Social Practices, New
York: Oxford University Press (with Professor Sara Laviosa) (2020); editor of
Advances in Empirical Translation Studies, Cambridge University Press (with
Professor Michael Oakes) (2019); guest special section editor of Leonardo:
TransCreation: Creativity and Innovation in Translation, Cambridge: The MIT Press
(2020); founding series editor of the Cambridge Studies in Language Practices and
Social Development, Cambridge University Press; founding editorial board member
of the series of Cambridge Elements of Translation and Interpreting, Cambridge
University Press; and the founding editor of Routledge Studies of Empirical
Translation and Multilingual Communication, New York/Oxon: Routledge. Her
research has been supported by the British Academy, Japanese Society for the
Promotion of Sciences, the Australian Research Council, Economic and Social
Research Council of the UK, Toshiba International Foundation, Worldwide
University Networks Research Development Fund, and a number of leading
universities in Europe, North America, Japan, South Korea, Brazil. She is a also
qualified professional translator between English, Spanish and Chinese having
previously worked for major international organisations before teaching at
universities.
Menghan JIANG
Title: Epidemic or Memetic: Modelling Chinese Neologisms with internet usage data.
Abstract
This paper adopts models from epidemiology and memetics to account for the
development and decline of neologisms based on internet usage. The model fitting
research design focuses on the important issue of whether a meme-driven memetic
model or a host-driven epidemic model is better suited to explain human behavior
regarding neologisms. We extract the search frequency data from Google Trends
covering the ninety most influential Chinese neologisms from 2008-2016 and find
most of them (62 out of 90) possess similar rapidly rising-decaying pattern. Memetic
and epidemic models are utilized to fit the evolution of these Internet based
neologisms. Although both models have good fitting performance for the rapid
growth, the epidemic model is able to predict the peak point in the neologism’s life
cycle. This result underlines the role of human agents in the life cycle of neologisms
and supports the macro-theory that the evolution of human languages mirrors the
biological evolution of human beings.
Short Biography
Dr. Menghan Jiang is currently a postdoctoral researcher under a joint programme
between Department of Chinese Language and Literature, Peking University and
Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic
University. Her research interests are Corpus linguistics, Language variation and
language change, Language modeling (on human collective behaviours), Chinese
syntax, and Conceptual Metaphor.
Siyu LEI
Title: Contagious words and epidemic behaviors: A usage-based exploration of internet neologisms related to COVID-19.
Abstract
Internet neologisms are contagious and reflect collective human behavior. Can this
observation be leveraged to reflect an epidemic situation by tracking the
development of internet neologisms over time? This paper proposes an innovative
approach to correlate the competition of neologisms with the development of an
epidemic as collective human behavior changes, enhancing our understanding of
these two phenomena. Specifically, this study tracks the use of COVID-19
neologisms from late December 2019 to the end of June, 2020 based on Baidu
index. The neologisms are designated by five categories: under-specified
references, pre-official names, pejorative names, official names, and English
abbreviations. Qualitative analyses based on these categories show the impact of
language-internal factors (i.e., frequency) and the changes of social psychological
situation (i.e., policy and emotion) on lexical competition and evolution. Quantitative
analyses show strong correlation between neologisms and pandemic development
that can be expressed by a binomial formulation. These results are summarized in a
flowchart showing different developmental stage of COVID-19, neologism uses and
the changes in collective behaviors especially in terms of emotion. In sum, this
innovative approach of leveraging internet usage data to study emergent events is
shown to be effective when observational data is inadequate or unaccessible.
Keywords: neologisms; internet usage data; COVID-19 pandemic; frequency; social psychological factors
Short Biography
Ms. Siyu Lei is a dual PhD award student. She is currently the third year PhD
student in Xi’an Jiaotong University under the supervision of Prof. Ruiying Yang
whereas the first year PhD student in The Hong Kong Polytechnic University under
the supervision of Prof. Chu-Ren Huang. Her research interests are corpus
linguistics, English for academic purposes, second language acquisition, genre
analysis. Till now she has one paper published on Journal of English for Academic
Purposes.
Jing LI
Title: Social Media Keyphrase Prediction for COVID-19 Context Modelling.
Abstract
As social media continues its worldwide expansion, the way we communicate with
each other has been profoundly revolutionized. The exposure to new information
and the exchange of personal opinions have been mediated through online
platforms. Especially in the crisis of COVID-19, social media becomes the only
channel for individuals to stay in touch with the outside world. Facing the explosive
growth of online messages, which far outpace human beings’ reading and
understanding capacity, how shall we help individuals quickly access the key
information they need? In this talk, I will present our recent work to automatically
predict keyphrases concerning the salient contents on social media. Most existing
keyphrase prediction methods, though work well on formally-written and well-edited
texts, will suffer from the data sparseness issue widely exhibited in short and
informal social texts. This talk presents two possible ways to enrich contexts for
short messages: one is to exploit explicit contexts formed with user responses; the
other is to explore implicit contexts via discovering latent topic clusters underlying
the corpus. Beyond the text signals, we also strive to understand cross-modality
contexts from texts and images and their joint effects to indicate keyphrases. At last,
I will discuss the big picture of how social media keyphrase prediction methods will
help us fight against the COVID-19 crisis.
Short Biography
Dr. Jing Li is an Assistant Professor of the Department of Computing, The Hong
Kong Polytechnic University (PolyU) since 2019. She is a member of Data Science
and AI Lab (DaSAIL) of Department of Computing and the COMP representative for
Doctor of Applied Language Sciences Programme (DALS). Before joining PolyU,
she worked in the Natural Language Processing Center, Tencent AI Lab as a senior
researcher from 2017 to 2019. Jing obtained her PhD degree from the Department
of Systems Engineering and Engineering Management, The Chinese University of
Hong Kong in 2017. Before that, she received her B.S. degree from Department of
Machine Intelligence, Peking University in 2013. Jing has broad research interests
on natural language processing, computational social science, and machine
learning. Particularly, she works on novel algorithms for topic modeling, information
extraction, discourse analysis, and their applications on social interaction
understanding. Jing regularly published in top-tier NLP conferences and journals,
such as ACL, EMNLP, NAACL, TACL, and CL. For academic services, she served
as a publication co-chair in EMNLP 2020, a tutorial co-chair in ICONIP 2020, and a
program committee member in many premier conferences (e.g., ACL, EMNLP,
NAACL, AAAI, IJCAI), where she received a best reviewer award in EMNLP 2018.
Cindy SB NGAI
Title: Grappling with the COVID-19 health crisis: Analysis of communication strategies and their effects on public engagement on social media.
Abstract
Background: COVID-19 has posed an unprecedented challenge to governments
worldwide. Effective government communication of COVID-19 information with the
public is of crucial importance.
Objective: We investigated how the most-read state-owned newspaper in China, People’s Daily, utilized an online social networking site, Sina Weibo, to communicate about COVID-19 and whether this could engage the public. The objective of this study was to develop an integrated framework to examine the content, message style, and interactive features of COVID-19-related posts, and determine their effects on public engagement in the largest social media network in China.
Methods: Content analysis was employed to scrutinize 608 COVID-19 posts and coding was performed on three main dimensions: content, message style, and interactive features. The content dimension was coded into six sub- dimensions: (C1) Action, (C2) New evidence, (C3) Reassurance, (C4) Disease prevention, (C5) Healthcare services, and (C6) Uncertainty, while the style dimension was coded into the sub-dimensions of (S1) Narrative and (S2) Non- narrative. As for interactive features, they were coded into: (I1) Links to external sources, (I2) Use of hashtags, (I3) Use of questions to solicit feedback, and (I4) Use of multimedia. Public engagement was measured in the form of the number of shares, comments, and likes on the People’s Daily’s Sina Weibo account from 20 January 2020 to 11 March 2020 to reveal the association between different levels of public engagement and communication strategies. One-way ANOVA followed by post-hoc Tukey test, and negative binomial regression analysis were employed to generate the results.
Results: We found that although the content frames of (C1) Action, (C2) New evidence, and (C3) Reassurance delivered in a (S2) Non-narrative style were predominant in COVID-19 communication by the government, posts related to (C2) New evidence and a (S2) Non-narrative style were strong negative predictors of the number of shares. In terms of generating a high number of shares, it was found that (C4) Disease prevention posts delivered in a (S1) Narrative style were able to achieve this purpose. Additionally, an interaction effect was found between content and style. The use of a (S1) Narrative style in (C4) Disease prevention posts had a significant positive effect on generating comments and likes by the Chinese public while links to external sources fostered sharing.
Conclusions: These results have implications for governments, health organizations, medical professionals, the media, and researchers on their epidemic communication to engage the public. Selecting suitable communication strategies may foster active liking and sharing of posts on social media, which in turn, might raise the public’s awareness of COVID-19 and motivate them to take preventive measures. The sharing of COVID-19 posts is particularly important because this action can reach out to a large audience, potentially helping to contain the spread of the virus.
Short Biography
Cindy SB Ngai (PhD) is an Assistant Professor cum Programme Leader of MA in
Bilingual Corporate Communication in The Hong Kong Polytechnic University
(PolyU). By adopting an interdisciplinary research approach, she integrates her
knowledge of language, media and communication into the business, health and
science disciplines. Her work appeared in SCI and SSCI journals
including Discourse and Communication, English for Specific Purpose, International
Journal of Business Communication, Journal of Business and Technical
Communication, Journal of Medial Internet Research, PLOS One, Public Relations
Review and Studies in Higher Education.
Mingyu WAN
Title: Understanding and Combating ‘Infodemic’: A Corpus Linguistic Approach to Analyzing COVID19 Misinformation.
Abstract
COVID-19 misinformation, also known as ‘infodemic’, presents a serious risk to
public health and public policies. The urgency of finding COVID-19 misinformation
can be attested by scores of already published papers (e.g. Brennen et al. 2020,
Pennycook et al. 2020) and constant discussion in press and in social media.
Misinformation refers to fabricated, deceptive or distorted information at various
degrees, which can mislead people’s decision-making, harm the public trust, and
even lead to global tragedies (e.g. Grinberg et al. 2019, Su et al. 2020). To mitigate
its risks to the society, it is of vital importance for us to understand its key properties
(linguistic generalization patterns in particular) before taking right actions. There
hasn't been any major corpus linguistic work on the analysis of COVID-19
misinformation. A few published papers in corpus linguistics, such as Wolfer et al.
(2020), focus on applying corpus linguistics tools to analyzing COVID-19 texts
without dealing with information quality issues. Past work on misinformation relied
mostly on computational ways of misinformation detection without in-depth analysis
of how misinformation is constructed (e.g. Guacho et al. 2018, Torabi & Taboada
2019). As such these studies do not contribute to our standing of language and
cannot be viewed as corpus linguistic research. In addition, automatic textual
classification studies by themselves do not help to pinpoint the fake part of the news
or how these fake news misinform. These studies have little contribution to
ameliorate the negative effect of misinformation. To urgently combat the negative
impact of COVID-19, we propose a corpus driven analysis of misinformation aiming
to analyze the intrinsic (linguistic) properties of texts containing COVID-19
misinformation. Preliminary analysis found that in addition to the frequently
mentioned keywords of “virus, coronavirus, China, Chinese, spread, death, kill”,
misinformation tends to employ more negative emotion words (e.g. fear, worthless,
deadliest), expression for exclusion (e.g. cannot, without, except), vulnerable
population groups (e.g. Children, young people, older adults), verbs of elimination
(e.g. reduce, die, kill). Besides, false information demonstrates a less formal,
linguistically simplier, as well as less specific compared to factual information. Our
provisional theory is that misinformation typically focuses on 1) people’s inherent
fear for particular people and/or especially vulnerable groups; or 2) both fear and
anger against specific groups of the society. These groups could be elite or could
be socially marginalized, but it is crucial that they are easily separated from the
identities of the speakers. Based on our hypothesis and our data, we will adopt the
theory of Bronstein et al. (2019) to investigate the identification of constructions,
logical incongruities and metaphorical expressions, etc. for introspecting the salient
cognition mechanism of information generators.
Short Biography
Dr. Mingyu Wan currently works for Prof. Huang, Chu-Ren and Dr. Su, Qi under the
Boya Joint Post-doctoral Project. Her recent research focuses on misinformation
detection, metaphor detection, complexity analysis and Mandarin Alphabetical
Words (code-mixing words) with linguistics-motivated, corpus-based and NLP-
oriented methodologies.
Vincent Xian WANG
Title: Two Tales of One city: Unveiling the sentiments and conceptual metaphors for a pandemic in Macao.
Abstract
This study seeks to understand Macao residents’ lives during the COVID-19
pandemic. We gathered data from two main sources – (a) articles published in
Macao Daily (MD) and (b) postings on Chuchu Channel (CCC) on Youtube. The
results showed that Macao Daily systematically lays out the measures initiated by
the local government to combat the disease and to ease the financial distress of
both the residents and local businesses, while the netizens who posted on Chuchu
Channel elaborated on topics about casinos, local economy and politics, making
critical remarks on The Hong Kong and Macao governments. The MD and CCC
used conceptual metaphors differently in various ways. MD predominantly
employed WAR, TIDING and JOURNEY metaphors that entailed a collective and lasting
battle against the pandemic, and comparing the virus with a PERSON or a MONSTER.
By contrast, CCC only favoured some subtypes of WAR metaphors – e.g., 撐 chēng
‘endure, hold on’ – and 執(笠) zhí (lì) ‘close down, shut down (business)’. MD
tended to convey positive sentiments consistently, whereas the viewers of ChuChu
Channel exhibited sharply divided views on political matters and on the shutdown of
casinos, provoking bitter arguments between the posters. We interpret our findings
in relation to the two-dimensional model proposed by Bentley, O’Brien and Brock
(2014) for mapping collective behaviour. From the tales told by the two groups, we
lend support to the central argument advanced by Chater’s (2020) editorial that
multiple interpretations rather than a singular one about the pandemic – which can
be effectively captured by distinct conceptual metaphors – need to be endeavoured
by policy makers in the middle of the unprecedented uncertainties.
Short Biography
Vincent X. Wang, associate professor of the University of Macau and a NAATI-
certified translator, received his PhD in Applied Linguistics from the University of
Queensland (2006). He published journal articles in Sage Open, Target, Journal of
Language, Literature and Culture and TESOL-related periodicals, book chapters
with Springer, Routledge and Brill, among others, and a monograph Making
Requests by Chinese EFL Learners (John Benjamins).
Xiaowen WANG
Title: From Contact Prevention to Social Distancing: The Co-evolution of Bilingual Neologisms and Public Health Campaigns in Two Cities in the Time of COVID-19.
Abstract
This paper investigates the evolution of social distancing expressions in Chinese
and English in two geographically close yet culturally distinct metropolitan cities:
Hong Kong and Guangzhou. Our study of bilingual public health campaign posters
during the COVID-19 pandemic focuses on how the evolution of neologisms and
linguistic strategies in public health campaigns adapts to different societal contexts.
Baseline meanings of the re-purposed linguistic expressions were established
based on the BNC corpus for English and the Chinese Gigaword Corpus for
Chinese. To establish the links between linguistic expressions and public health
events, we converted them to eventive structures using the Module-Attribute
Representation of Verbs and added interpersonal meaning interpretations based on
Systemic Functional Linguistics. The two cities are found to take divergent
approaches. Guangzhou prefers “contact prevention” with behavior-inhibiting
imperatives and high value modality. By contrast, the original use of “contact
prevention” in Hong Kong was gradually replaced by the neologism “social
distancing” in English, triggering competing loan translations in Chinese.
Predominantly behavior-encouraging expressions are used with positive polarity
and varying modality and mood devices, varying to map the epidemic curve of
COVID-19. We conclude that lexical evolution interacts with social realities.
Different speech acts, prohibition in Guangzhou but advice and warning in Hong
Kong, are constructed with careful bilingual reconfiguration of eventive information,
mood, modality and polarity to tactfully cope with the social dynamics in the two
cities.
Key words: COVID-19, social distancing, event representation, health communication, bilingual communication
Short Biography
Xiaowen Wang, Annie is an Associate Professor of Applied Linguistics and director
of the Research Center for English Education and Linguistic Studies at the School
of English Education, Guangdong University of Foreign Studies. She also serves as
the associate editor for the Asian English for Specific Purposes Journal. Currently,
she is doing her doctoral study under the supervision of Professor Huang Chu-ren
at the Hong Kong Polytechnic University. Her research interests cover English for
Medical Purposes, corpus linguistics, computational linguistics, discourse
analysis, lexical semantics, translation and English Education. She has been
principal investigators of 6 provincial-level or university-level projects in China.