Analisis Korpus Jaringan Leksikal Seksual Menggunakan Perisian Graphcoll Berdasarkan Laporan Jenayah BH Online (2004–2020)
Corpus-based Analysis of a Sexual Lexical Network Using Graphcoll based on BH Online Crime Reports (2004–2020)
DOI:
https://doi.org/10.15282/ijleal.v15i2.11411Keywords:
Jenayah, Linguistik korpus, Seksual, Teks akhbar, TematikAbstract
Seksual dalam konteks jenayah merujuk kepada pelanggaran undang-undang seperti yang digariskan dalam Kanun Keseksaan, Akta 792 dan Akta 840, dan sering dikaitkan dengan implikasi serius terhadap kesejahteraan mental dan sosial masyarakat. Kajian ini bertujuan menganalisis pola kolokasi perkataan seksual dalam Korpus Perlaporan Akhbar Jenayah BHonline (2004–2020) bagi memahami gambaran jenayah seksual dalam wacana media. Data korpus terdiri daripada 3,064,134 token, 48,351 jenis kata dan 7,950 teks, dan dianalisis menggunakan pendekatan Statistik Linguistik Korpus melalui perisian #LancsBoxX. Dapatan menunjukkan bahawa perkataan seksual berkolokasi secara signifikan dengan kata seperti kanak-kanak, penderaan, dan gangguan, yang mencerminkan hubungan rapat antara jenayah seksual dan kerentanan mangsa. Kolokasi tersebut dikelaskan kepada beberapa tema utama iaitu konflik [gangguan (11.9), penderaan (11.7), pedofilia (8)], budaya [gejala (8.1)], serta pencirian sifat [kanak-kanak (11.2), pemangsa (8.4)]. Kata lain seperti amang, penganiayaan, bapa, anak, polis, dan guru turut muncul dalam jaringan kolokasi dan menggambarkan konteks pelaku, mangsa serta institusi terlibat. Secara keseluruhan, pola kolokasi ini memperlihatkan bagaimana media membingkai isu jenayah seksual melalui pilihan leksikal yang konsisten. Hasil kajian dapat membantu pihak berkepentingan merangka dasar serta strategi pencegahan yang lebih berkesan dalam menangani jenayah seksual di Malaysia.
Sexual acts in the context of crime refer to conduct that violates the law as provided under the Penal Code, Act 792, and Act 840, and are often associated with psychological harm and long-term social implications. This study aims to analyse the collocational patterns of the word sexual in the BHonline Crime News Corpus (2004–2020) to understand how sexual crimes are represented in Malaysian media discourse. The corpus comprises 3,064,134 tokens, 48,351-word types, and 7,950 texts, and the analysis was conducted using a quantitative Corpus Linguistics approach with the #LancsBox X software. The findings reveal that sexual frequently co-occurs with terms such as children, abuse, and harassment, indicating strong lexical associations with vulnerable victims and criminal behaviour. These collocations are classified into three main themes: conflict [harassment (11.9), abuse (11.7), paedophilia (8)], socio-cultural framing [symptom (8.1)], and characterisation [children (11.2), predator (8.4)]. Additional collocates such as molestation, maltreatment, father, child, police, and teacher further illustrate the roles of perpetrators, victims, and institutional actors. Overall, the collocational patterns demonstrate how the media constructs societal perceptions of sexual crime through recurring lexical choices. These insights offer valuable implications for policymakers and relevant agencies in designing more effective strategies for addressing sexual crime in Malaysia.
References
Abrusán, M., Asher, N., & van de Cruys, T. (2018). Content vs. function words: The view from distributional semantics. Proceedings of Sinn Und Bedeutung 22, 1(269427), 1–21.
Ahmad, A., Mohamad Mangsor, M., Ardi, N., & Ab. Wahab, A. (2020). Peristilahan Bahasa Melayu dalam Akta Kesalahan Seksual terhadap Kanak-Kanak 2017 (Akta 792). Jurnal Bahasa, 20(1), 151–172. https://doi.org/10.37052/jb.20(1)no7
Baker, P. (2004). ‘Querying keywords: questions of difference, frequency and sense in keywords analysis.’ Journal of English Linguistics 32,4, 346–59. https://doi.org/10.1177/0075424204269894
Baker, P. (2006). Using corpora in discourse analysis. In Continuum (pp. 1–206). Continuum.
Baker, P., Gabrielatos, C., & McEnery, T. (2013). Sketching muslims: A corpus driven analysis of representations around the word “Muslim” in the British press 1998-2009. Applied Linguistics, 34(3), 255–278. https://doi.org/10.1093/applin/ams048
Beckett, K. (1997). Making crime pay. New York: Oxford University Press
Black, J. W. (2023). Creating specialized corpora from digitized historical newspaper archives: An iterative bootstrapping approach. Digital Scholarship in the Humanities, 38(2), 779-797. https://doi.org/10.1093/llc/fqac079
Brezina, V., & Gablasova, D. (2018). The corpus method. In J. Culpeper, J., Kerswill, P., Wodak, R., McEnery, A., & Katamba, F. (2018). English Language: Description, variation and context (2nd ed.). Macmillan Education UK. https://doi.org/10.1057/978-1-137-57185-4
Brezina, V. (2018). Statistics in Corpus Linguistics. In Cambridge University Press (First). Cambrige University Press. https://doi.org/10.1017/9781316410899
Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–173. https://doi.org/10.1075/ijcl.20.2.01bre
Collins, L. C., & Nerlich, B. (2016). Uncertainty discourses in the context of climate change: A corpus-assisted analysis of UK national newspaper articles. Communications, 41(3), 291–313. https://doi.org/10.1515/commun-2016-0009
Defanti, T., Grafton, A., Levy, T. E., Manovich, L., & Rockwood, A. (2018). Lexical Collocation Analysis; Advances and Application (First). Springer Nature Switzerland AG.
El-Kanash, H. H., & Hamdan, J. M. (2023). COVID-19 Conceptual Metaphors in Jordanian Political Discourse: Evidence from a Newspaper-based Corpus. GEMA Online Journal of Language Studies, 23(1), 93–113. https://doi.org/10.17576/gema-2023-2301-06
Firth, J. R. (1957). Applications of General Linguistics. Transactions of the Philological Society, 56(1), 1–14. https://doi.org/10.1111/j.1467-968X.1957.tb00568.x
Farhan, A. K. (2023). Divergence in the translation of criminal law: A corpus-based study of prohibition in Iraqi penal code and its English translation. Ampersand, 10(December 2022), 100104. https://doi.org/10.1016/j.amper.2022.100104
Fontaine, L. (2017). The early semantics of the neologism BREXIT: a lexicogrammatical approach. Functional Linguistics, 4(1). https://doi.org/10.1186/s40554-017-0040-x
Giugliano, M. (2022). Discourses about independence: A corpus-based analysis of discourse prosodies in spanish and catalan newspapers. Discourse and Communication, 16(5), 525-550. https://doi.org/10.1177/17504813221099194
Gries, S. T. (2013). 50-Something Years of Work on Collocations. International Journal of Corpus Linguistics, 18(1), 137–166. https://doi.org/10.1075/ijcl.18.1.09gri
Gu, C. (2023). ‘Climate change concerns human survival. . .and justice in our international community’: A corpus-based positive discourse analysis (PDA) of the largest developing nation’s global involve/ engagement discourses (re)told in interpreting. PLoS ONE, 18(4 April), 1–20. https://doi.org/10.1371/journal.pone.0277705
Haney C. (2009). The social psychology of isolation: why solitary confinement is psychologically harmful. PrisonServ J. 181:12–20
Kramar, N. (2023). Construction of Agency within Climate Change Framing in Media Discourse: a Corpus-Based Study. Respectus Philologicus, 43(48), 36–48. https://doi.org/10.15388/RESPECTUS.2023.43.48.106
Liu, M., & Huang, J. (2023). Framing responsibilities for climate change in chinese and american newspapers: A corpus-assisted discourse study. Journalism, https://doi.org/10.1177/14648849231187453
McEnery, T and Hardie, A (2012) Corpus Linguistics: Method, Theory and Practice. Cambridge University Press.
O'Keeffe, A., & Breen, M. J. (2007). At the hands of the brothers: A corpus-based lexico-grammatical analysis of stance in newspaper reporting of child sexual abuse cases. The language of sexual crime, 217-236. https://doi:10.1057/9780230592780
Osama Ghoraba, M. (2023). Influential Spanish Politicians’ Discourse of Climate Change on Twitter: A Corpus-Assisted Discourse Study. In Corpus Pragmatics (Issue 0123456789). Springer International Publishing. https://doi.org/10.1007/s41701-023-00140-3
Parvaresh, V. (2023). Covertly communicated hate speech: A corpus-assisted pragmatic study. Journal of Pragmatics, 205, 63–77. https://doi.org/10.1016/j.pragma.2022.12.009
Pizarro, P. A. (2019). MadSex: collecting a spoken corpus of indirectly elicited sexual concepts. Language Resources and Evaluation, 53(1), 191–207. https://doi.org/10.1007/s10579-018-9435-x
Rosli, N. N., Hamzah, N., Zaini, M. F., Baharum, H., Mohd, F. H., Jabar, N. A., Damit, A. R., & Omar, R. (2022). Sentiment analysis of emotional words in a classical text web corpus. AIP Conference Proceedings, 2644(November), 030027. https://doi.org/10.1063/5.0104738
Smyth, C. (2016). An Introduction to Corpus Linguistics. Bulletin of Tokyo Denki University, Arts and Sciences, 14, 105–109. https://doi.org/10.1177/00754240022004965
Szczygłowska, T. (2022). Exploring Obama’s and Trump’s Political Discourse through the Lens of Wordlists, Keywords and Clusters. Brno Studies in English, 48(1), 93–116. https://doi.org/10.5817/BSE2022-1-5
Yüksel, H. G., Mercanoğlu, H. G., & Yılmaz, M. B. (2022). Digital flashcards vs. wordlists for learning technical vocabulary. Computer Assisted Language Learning, 35(8), 2001–2017. https://doi.org/10.1080/09588221.2020.1854312
Zaini, M. F., Muhammad, M. M., Goyak, F., Saradin, A., Osman, Z., Redzwan, H. F. M., & Al Muhsin, M. A. (2022). Geometric Lexical Representative Perspectives: The Impact of Threshold Values Through #LancsBox Software. AIP Conference Proceedings, 2644(November). https://doi.org/10.1063/5.0104817
Zaini, M. F., Sarudin, A., Muhammad, M. M., & Abu Bakar, S. S. (2020). Representatif Leksikal Ukuran sebagai Metafora Linguistik berdasarkan Teks Klasik Melayu (Representatives of Lexical Ukuran as Linguistics Metaphors Based on Malay Classic Text). GEMA Online® Journal of Language Studies, 20(2), 168–187. https://doi.org/10.17576/gema-2020-2002-10
Downloads
Published
Issue
Section
License
Copyright (c) 2025 The Author(s)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

