Modeling the impacts of climate change and air pollutants on the agricultural production yields in Malaysia using Random-Effects Error Components Regression model

– The occurrence of climate change is attributable to anthropogenic emissions of greenhouse gases (GHG) which have affected the C3 plants’ agricultural production yields in past decades. Therefore, this article aims to model the linear association among these C3 plants’ agricultural production yields with several climatic and non-climatic explanatory variables using one-way random-effects error components regression model. To be congruent with the main objective of this study, the balanced longitudinal dataset period 1980 to 2018 under big data was acquired. The analysis results revealed that merely maximum temperature ( 𝑇𝑇 𝑚𝑚𝑚𝑚𝑚𝑚 ) statistically significant detrimental affected the agricultural production yields, while vice versa for other explanatory variables. From a scientific perspective, temperature extreme can be massively detrimental affecting the pollination and developmental stages of C3 plants. Consequently, the high temperature decreases agricultural production yields. On the other hand, the proposed model in this study can be beneficial to national policymakers and smallholder plantations.


INTRODUCTION
Malaysia is a developing country in which the top three economic activities are including services, manufacturing, and agriculture with the main contribution to the national Gross Domestic Product (GDP). In principle, agricultural commodities such as oil palm (Elaeis guineensis), natural rubber (Hevea brasiliensis), cocoa (Theobroma cacao), and rice (Oryza sativa) have contributed respectively RM48.31 billion, RM14.53 billion, RM1.64 billion, and RM2.44 billion to the national GDP in the agricultural economic sector during 2020 [1][2]. Oil palm and natural rubber respectively have made Malaysia the second and fourth-largest worldwide exporter in the year 2020. According to the International Trade Statistics Database of United Nations COMTRADE, Malaysia's export of cocoa and cocoa preparations was US$1.48 billion in 2020 [3]. This report conveys that Malaysia plays a substantial role as a global cocoa exporter. On the other hand, rice is the staple food for Malaysians. However, the self-sufficiency rate (SSR) has dropped drastically from 69% to 63% in the year 2019 to 2020 due to the increase in the population [4][5]. This has alarmed the threat to Malaysian food security.
For Malaysia to be continuing prosperous in the agricultural sector and to achieve food security, extensive research on climate change and air pollution are excessively needed attention. The yields of agricultural commodities can be massively affected by climatic factors and air pollution. In specific, climate change is mostly attributable to anthropogenic emissions of greenhouse gases (GHG), which have impacted society and the economy such as food insecurity, human health, and damage to infrastructure [6]. The temperature rises, changes in rainfall patterns, high carbon dioxide (CO2) concentration emissions, and extreme climatic events, including droughts, floods, heatwaves, and landslides have all favorable and detrimentally impacted agricultural commodities, creating vulnerability in the food supply. Unfortunately, there are limited previous studies involving Malaysia that have been carried out in modeling the impact of several climates' numerical variables such as temperature [7][8], rainfall amount [7][8], and the occurrence of extreme climatic events such as drought, flood, and sea-level rise [9] towards the agricultural production yields.
For instance, Murad et al. [10] analyzed the time series dataset to investigate the linear association between climate change and CO2 air pollutant emission toward agricultural production yields using the ordinary least square (OLS) regression model. Their empirical analysis results revealed that climate change detrimentally affected the agricultural growth rate. Contrarily, CO2 air pollutant emission has a beneficial effect on agricultural production yields. However, the agricultural production yields taken into account in this study do not state in their publication. Moreover, Chizari et al. [7] also analyzed the time series dataset to model the linear association between the climatic and non-climatic ABSTRACT -The occurrence of climate change is attributable to anthropogenic emissions of greenhouse gases (GHG) which have affected the C3 plants' agricultural production yields in past decades. Therefore, this article aims to model the linear association among these C3 plants' agricultural production yields with several climatic and non-climatic explanatory variables using one-way random-effects error components regression model. To be congruent with the main objective of this study, the balanced longitudinal dataset period 1980 to 2018 under big data was acquired. The analysis results revealed that merely maximum temperature ( ) statistically significant detrimental affected the agricultural production yields, while vice versa for other explanatory variables. From a scientific perspective, temperature extreme can be massively detrimental affecting the pollination and developmental stages of C3 plants. Consequently, the high temperature decreases agricultural production yields. On the other hand, the proposed model in this study can be beneficial to national policymakers and smallholder plantations.
variables towards the cocoa production yields using Auto-regressive Distributed Lag (ARDL) model. The climatic variables taken into account in their study are including the temperature and average annual rainfall, while the nonclimatic variables are including the cocoa farm price, the fertilizer price, and the technology trend. Their empirical results showed that the rainfall amount has a statistically significant benefits effect on the cocoa production yields and vice versa for the fertilizer price. In contrast, there is statistically insignificant for the temperature, cocoa farm price, and technology trend.
Hazir et al. [9] carried out a preliminary study on projecting the effect of drought, flood, and sea-level rise on rubber production yields focusing on Peninsular Malaysia. Their study indicated that the occurrence of these hydrometeorological events may not severely affect rubber production yields in Peninsular Malaysia. On the other hand, Tan et al. [8] carried out an empirical study to investigate the linear association between rice production yields and three climatic variables such as minimum and maximum temperatures and precipitation. However, their study found that precipitation is not statistically significant associated with the rice production yields in the wet and dry seasons, and vice versa for both minimum and maximum temperatures. In particular, both minimum and maximum temperatures respectively have a favorable and detrimental impact on the rice production yields during the wet and dry seasons. Houma et al. [11] also conducted an empirical study about the association between the climatic variables and farming practices regarding the rice production yields for the lowland located at the coastal plain of Kuala Selangor district. Their analysis results using the AquaCrop model revealed that the rising temperature, water scarcity, and worst weed control detrimentally impacted the rice production yields. In addition, their findings also indicated that the rice production yields may be increased with the appropriate adoption of weed control and water management in the condition of the rise of temperature.
In a similar year, Roslan et al. [12] proposed the predictive models for forecasting the paddy production yields using elliptical (normal and t) and Archimedean (Joe, Clayton, and Gumbel) copula families for five selected the Association of Southeast Asian Nations (ASEAN) countries such as Indonesia, Malaysia, Myanmar, Thailand, and Vietnam. The climatic and non-climatic variables taken into account in their proposed models are including an average annual maximum temperature, annual rainfall amount, planted areas, and fertilizer usage. Their findings conveyed that the best-fitted copula model in modeling and forecasting the rice production yields for Indonesia, Malaysia, Myanmar, Thailand, and Vietnam respectively are normal, Gumbel, Clayton, Gumbel, and Gumbel copulas. Moreover, this study indicated that these best-fitted predictive models require to be included all the aforementioned climatic and non-climatic variables. Meanwhile, Abubakar et al. [13] carried out systematically reviewed studies in the context of climate change towards oil palm production yields. Their analytical results showed that those climatic and non-climatic variables, including temperature, rainfall, extreme climatic events such as El Niño, La Niña, drought and flooding, soil fertility, water management, fertilizer use, the emergence of disease can detrimentally impact the oil palm production yields.
In brief, previous Malaysian studies merely focused on a single type of agricultural production yields such as oil palm, natural rubber, cocoa, and rice. In principle, a large number of Malaysia's studies, merely respectively focused on oil palm and rice yields can be found in reputable databases [13] rather than natural rubber and cocoa. Therefore, the main purpose of this present article is to model the linear association between the several annual agricultural production yields (tonnes per hectare) and several climatic, air pollutant, and planted areas (PA) (in thousand hectares) variables using an error components regression model, which the dataset employed in this study is under big data [14][15]. In categorizedspecific, the agricultural production yields taken into account in this study are including oil palm, natural rubber, cocoa, and rice, which are four agricultural categorised as C3 plants, and the main contributors to the agricultural economic sector in Malaysia's national GDP. Moreover, Malaysia also plays a substantial role as an exporter of these three agricultural commodities across the globe with exception of rice. Meanwhile, the climatic, and air pollutant variables taken into account in this study are the average annual minimum ( ), and maximum ( ) temperatures (in degrees Celsius), annual rainfall amount (RA) (in millimetres), and annual CO2 emissions (in knots). The regression model yielded in this study is competent to provide insight into the national impacts of recent climate trends on the agricultures' production yields and would aid in anticipating the impacts of climate changes on food security in the future. Furthermore, the proposed model also can be beneficial to the smallholder agriculture plantations in adopting the future plantation strategy based on the predicted insights of the effects of climate change on the C3 plants' agricultural production yields.

RESEARCH METHODOLOGY AND THEORETICAL BACKGROUND
In achieving the main purpose of this study, this section provided an overview of the research methodology and the theoretical background of the predictive model employed in this article. In particular, Figure 1 depicted the schematic of the research methodology involved in this study.

DATA SOURCES AND STUDY AREAS DESCRIPTION
As the preliminary step of this study, the balanced longitudinal dataset under big data, which comprises climatic and non-climatic explanatory variables period 1980 to 2018 has been acquired from the Department of Statistics Malaysia (DOSM) and the World Bank Open Data website. In particular, all the C3 plants' agricultural production yields, and the corresponding PA variables have been acquired from the Department of Statistics Malaysia website. Meanwhile, the variables such as , , RA, an d CO 2 are acqu ired from the Worl d Ba nk Ope n Dat a web site. The acqu ired longitudinal dataset comprises all four C3 plants, while the corresponding C3 plants' agricultural production yields are the response variable for the error components regression model.
Planetary, Malaysia is the 66th largest nation covering approximately a total of land and water areas respectively 328,657 and 1,190 square kilometres located practically to the equator of the earth planet. Therefore, Malaysia experienced an equatorial climate, which is hot and humid with a uniform annual ground temperature [8], [16].

Start
Acquire annual agricultural production yields, planted areas, air pollutants, and climatic datasets from the Department of Statistics Malaysia and the World Bank Open Data website.
Conduct a statistical test to evaluate the stationarity of the dataset, and identify the most appropriate approach for the error components regression model.
Fits the error components regression model by identifying the statistically significant variables, which affected the agricultural production yields. Malaysia also experienced dry and wet seasons during the Southwest (May-August) and Northeast monsoons (November-March), respectively. In particular, Malaysia received a relatively small annual rainfall amount during the Southwest monsoon. Contrarily, Malaysia encountered the risk of occurrences of extreme rainfall events, which are frequently associated with short-duration thunderstorms, especially in the East Coast region [6], [16].
From the geographical perspective, Malaysia is composed of two main non-contiguous regions such as Peninsular Malaysia and Malaysian Borneo, which these two regions are isolated by the South China Sea as depicted in Figure 2. In particular, Peninsular Malaysia can be sub-divided into four main regions, including Central (Kuala Lumpur, Putrajaya, Selangor), East Coast (Kelantan, Pahang, Terengganu), Northern (Kedah, Perak, Perlis, Pulau Pinang), and Southern (Johor, Melaka, Negeri Sembilan). Meanwhile, Malaysian Borneo can be sub-divided into two main regions, including Sabah (Labuan, Sabah), and Sarawak. In principle, all the agricultural production yields taken into account in this article are mainly contributed from the selected states of Malaysia due to the appropriateness of biophysical plantations [17][18]. For instance, the main plantation areas for oil palm, and cocoa respectively are the Malaysian Borneo region as summarised in Figure 3 [1]. In contrast, the main plantation areas for natural rubber, and paddy are Peninsular Malaysia, where 85.5% of the paddy is from Peninsular Malaysia [18][19].

ONE-WAY RANDOM-EFFECTS ERROR COMPONENTS REGRESSION MODEL
In the worldwide previous studies [8], [20][21][22][23], an error components regression model has been widely employed in analysing the longitudinal datasets with one of the main focused research areas on climate change towards agricultural yields. This is due to the aforementioned statistical model which can be taken into account both cross-sectional and temporal dimensions simultaneously. In principle, there are three main approaches for error components regression models. These include pooled least squares, fixed-effects, and random-effects models. For fitting a precise error components regression model, this study has employed Hausman [24] and Lagrange Multiplier [25] statistical tests. The analysis results for these two statistical tests (Hausman test: p-value = 0.9956; Lagrange Multiplier test: p-value = 0.0372) consistently led to the random-effects model being fitted to the longitudinal dataset in this study. In addition, this study also employed Fisher Augmented Dickey-Fuller (Fisher-ADF) statistical test [26] in investigating the panel unit root with and without trends corresponding to individual series for the acquired dataset. From a statistical perspective, the failure of the rejection for the null hypothesis of the Fisher-ADF statistical test implied that the individual series required differencing transformation, such that the individual series is stationary.
represents the corresponding regression coefficients of , and = [ ] ×1 ; = + represents the vector of length × for the disturbance. In particular, is the unobservable specific effect of the C3 plants which is constant through time, and is the remainder disturbance which varies among the C3 plants regardless at a given time point and through time.
In this study, of equation (1) is estimated using the best linear unbiased estimator, namely the generalised least square (GLS) estimator in relating Swamy-Arora estimator of error components [27][28]. The main reason the Swamy-Arora estimator is employed in this study due to this estimator has been widely employed in previous socio-economic studies [29][30][31], which involved the random-effect error components regression model. Furthermore, the error components regression model is applied in this study rather than the time series model due to the several advantages of this model. These are including the capability in controlling for the cross-sectional and temporal-invariant variables, being more efficient and reliable in parameters estimation in the models, and the capability to identify and detect the unobserved heterogeneity when the heterogeneity is constant over time and uncorrelated with explanatory variables [29], [31][32].

DIAGNOSTICS CHECKING
In equation (1), the disturbance has been assumed to be = + , such that ( ) = 0, ( ) = 0, � � = � 2 , = 0, ≠ , and � � = � 2 , = ; = 0, otherwise , where , ( ) = 1,2, ⋯ , , ( ), and and assumed to be unknown and independent to each other. In particular, ~(0, 2 ) and ~(0, 2 ) with respectively independent and identically ( ) distributed with zero mean and unit variance. To validate the effectiveness of the best-fitted random-effects error components regression model, this study has employed Breusch-Godfrey and Pesaran's CD statistical test in diagnosing the serial correlation for the idiosyncratic component of the disturbance and the cross-sectional dependence, respectively. Moreover, this study also employed the Breusch-Pagan statistical test to validate the assumption of homogeneity of variance is fulfilled. The main objective of this diagnostic checking is to ensure that all the aforementioned assumptions of the best-fitted one-way random-effects error components regression model in this study are fulfilled. In contrast, the failure of the fitted model in fulfilling these statistical assumptions can be detrimental to the prediction performance. The analysis results of this diagnosis checking are presented in the next section.

RESULTS AND DISCUSSION
To be congruent with the main purpose and methodology presented in this present article, this section provided the interpretation and discussion of this research findings. In specific, this section is subdivided into two main sections, such as exploratory data analysis, and statistical modelling and diagnostic checking. All the statistical analysis presented in this section has been fully analysed using R statistical software for computing and graphics. Figure 4 and Table 1 depicted the time series graphs and the descriptive summaries for all individual series periods 1980 to 2018 taken into account in this study, respectively. These include the C3 plants' production yields (oil palm, natural rubber, cocoa, and rice), climatic (Tmin, Tmax, RA, and CO2), and non-climatic (PA corresponding to each C3 plant) variables. This study highlighted that climatic variable such as solar radiation is excluded as the explanatory variable in the regression analysis although this variable can affect the agricultural production yields. This is due to the previous study [33] conveyed that the inclusion of this explanatory variable into the regression model is affected by the presence of multicollinearity issues. In particular, solar radiation is highly positively associated with temperature.

Exploratory data analysis
Moreover, Figure 4 characterised that the oil palm yields (Figure 4(a)) and rice (Figure 4(d)) productions averagely have increased over the year, and vice versa for the natural rubber (Figure 4(b)) and cocoa (Figure 4(c)). As a result of the collapse in natural rubber and cocoa prices, as well as the growing demand for oil palm globally at the same time, the smallholder farmers have shifted their plantation crops primarily focus on the oil palm [17], [34]. In addition, the high demand for oil palm worldwide also resulted in the deforestation for the expansion of the plantation of oil palm [17], [34]. Simultaneously, deforestation in Malaysia brought an increase in temperature and CO2 emission, which is evidenced in Figures 4(e), 4(f), and 4(h), respectively. Inlines with the increase of plantation for oil palm, and decreasing of plantation for natural rubber and cocoa, the PA for the oil palm (Figure 4(i)) has been increased, while vice versa for the PA of natural rubber (Figure 4(j)) and cocoa (Figure 4(k)). On the other hand, Figure 4(l) depicted that the PA for rice has increased due to the increase of the population, as well as rice, being the staple of Malaysian cuisine. Moreover, Figure 4(g) also illustrates the total annual rainfall amounts have been averagely increased during the periods 1980 to 2018.
Meanwhile, Table 1 depicted the descriptive summaries of the individual series respectively characterised using the appropriate statistical descriptive measurements such as median (MD), interquartile range (IQR), skewness (SKEW), and kurtosis (KURT). This study has characterised the average and dispersion of the individual series using MD and IQR, respectively. Since the Shapiro-Wilks statistical test (SWST) indicated that some of the individual series, including oil palm and cocoa production yields, CO2, and PA for natural rubber and cocoa are non-normal distributed, therefore, MD and IQR are robust statistical descriptive measurements rather than the arithmetic mean and variance, respectively. For instance, the skewness value for the oil palm production yields revealed that the shape of the distribution for this series is negatively skewed. However, the box-and-whisker plot indicated that the distribution of this series is positively skewed with the presence of potential outliers. This contradiction showed the not robustness of the arithmetic mean to the potential outliers, which the involvement of the arithmetic mean as the reference point in computing the moments of skewness. In mathematical, In practice, Malaysia plays a substantial role as the exporter of both oil palm and natural rubber production yields in the world. Therefore, the average oil palm and natural rubber production yields are higher compared to cocoa and rice. In contrast, Table 1 presented that the rice production yields are lower than the natural rubber, however, the descriptive analysis conveyed that the average of the planted areas for rice is higher rather than the natural rubber, as well as rice, is the staple food for most of Malaysia's population. In specific, the daily consumption of Malaysia's adults on white rice is 2.5 plates on average [35]. On the other hand, the temperature (Tmin and Tmax) showed the lowest variation among the individual series depicted in Table 1. This is due to Malaysia experiencing a uniform annual ground temperature as its geographical location at the equator of the earth's planet.  Furthermore, verifying the individual series exhibited the statistical properties that the individual series is constant over time, which is also known as stationary is indeed much needed. This is because stationary is one of the preliminary statistical assumptions required to fulfil in fitting the one-way random-effects error components regression model. Hence, this study employed Fisher-ADF statistical test with and without trends in verifying this preliminary assumption of the individual series as the analysis results are depicted in Table 2. Based on Table 2, all the individual series have fulfilled the stationary assumption, as well as all the p-values, which are less than the significance level of 0.05 except for CO2. In ensuring the individual series of CO2 achieved stationary, this study employed the differencing transformation. The Fisher-ADF statistical test authenticated this series has fulfilled the stationary assumptions after applying the first-order of differencing transformation.

Statistical modelling and diagnostics checking
The fulfilment of the preliminary stationary assumption in the previous sub-section allowed for a further analysis, which is fitting the one-way random-effects error components regression model. The analysis result for the full model is presented in Table 3. *Note: "**" indicated the estimated regression coefficient is approximate zero after rounding to the four nearest decimal places; "*" represents the presence of a statistically significant in the statistical test.
In particular, the analysis result for this study showed that merely Tmax has a statistically significant detrimental impact on agricultural production yields after eliminating started from the most statistically insignificant explanatory variable from the model. In mathematics, the result of the best-reduced model for one-way random-effects error components regression can be expressed as equation (2).
This analysis results lead that the average agricultural production yields are expected to decrease by 0.5485 tonnes per hectare for a 1 degree Celcius rise in Tmax, while Tmin, RA, ΔCO2, and PA do not have statistically significant effects on agricultural production yields. However, the diagnostic checking analysis presented that there are violated the assumptions of the one-way random-effects error components regression model as depicted in Table 4. These include the presence of the serial correlation for the idiosyncratic component of the disturbance and no cross-sectional dependence. To overcome these issues, this study employed the Box-Cox transformation technique.
with the resulting model has fulfilled all the assumptions of the one-way random-effects error components regression model (Table 4). In particular, has detrimentally affected agricultural yields, which this finding is inlined with previous studies [8], [11], [13]. This finding also can be supported based on a practical perspective. In a natural science study, the pollination of plants is one of the most susceptible phenological stages to temperature extremes, which affect pollen viability, fertilisation and grain or fruit formation [36]. In addition, extreme temperature also can massively detrimentally affect C3 plants' agricultural production yields during the developmental stage [36]. As a result, the adoption of more effective strategies such as utilising a more appropriate irrigation system is indeed much needed to mitigate the impacts of greater temperature extremes events associated with climate change.

CONCLUSION
In summary, this study proposed a prediction of agricultural production yields using the one-way random-effects error components regression model. The analysis results revealed that Tmax is statistically significant affected towards the agricultural production yields such as oil palm, natural rubber, cocoa, and rice. The proposed model in this study can be beneficial to national policymakers and smallholder farmers. In particular, the national policy can use the proposed model to predict the future climate change impact on the agricultural production yields to anticipate the impacts of future climate changes on food security, while smallholder farmers can use the proposed model in adapting the future plantation strategy. However, the goodness-of-fits measurement followed by the prediction accuracy of the proposed model can be improved by taking into account more climatic and non-climatic variables as well as the intercept is indicated as statistically significant to the resulting predictive model. These climatic variables are including relative humidity, wind speed, and sunshine duration, while the non-climatic variables are including the emergence of disease, labour force skills, soil fertility, water management, weed control, and fertiliser use. This is because the previous empirical Malaysia studies [7], [11][12][13], [22] conveyed that these variables also can be affected towards agricultural production yields. In future, this article suggested conducting a comparative study in fitting the model respectively taking into account the temporal and both cross-sectional and temporal with focusing on a single agricultural production yield.