Redesigning Post-Operative Processes Using Data Mining Classification Techniques

Data mining classification models are developed and investigated in this paper. These models are adopted to develop and redesign several business processes based on post-operative data. Post-operative data were collected and used via the Waikato Environment for Knowledge Analysis (WEKA), to investigate the factors influencing patients’ admission after surgery and compare the developed DM classification models. The results reveal that each implemented DM technique entails different attributes affecting patients’ post-surgery admission status. The comparison suggests that neural networks outperform other classification techniques. Further, the optimal number of beds required to accommodate post-operative patients is investigated. The simulation was conducted using queuing theory software to compute the expected number of beds required to achieve zero waiting time. The results indicate that the number of beds required to accommodate post-surgery patients waiting in the queue is the length of 1, which means that one bed will be available due to patient discharge.


INTRODUCTION
Data mining has become a research area with increasing importance. Organizations of all sizes have started to develop and deploy data mining technologies to leverage data resources to enhance their decision making capabilities [1]. The capabilities of data analysis are a crucial factor of success for business today. The Main parts of the business world are based on IT and deal with huge amounts of electronic data. Approaches to modeling data mining in business contexts are limited in addressing a major current trend. Data Mining becomes more and more an integral part of executing a business. Tasks like placing advertisements, recommending products, or detecting fraud have become standard application yields of data mining, and have a serious implication for business products. A frequent re-engineering of business processes is a consequence of this development [2].
Data mining (also known as database exploration and knowledge discovery) is the process of extracting hidden patterns from large databases and using them to build predictive models [3]. This method of extracting concealed knowledge from large datasets is effective in solving business and scientific problems and obtaining a competitive advantage. Traditional data analysis methods, largely based on humans working directly with data, simply do not scale up to handle large data. Big data in healthcare institutions has led to an urgent need for new techniques and resources that can automatically turn processed data into valuable information and knowledge [4], [5]. In the healthcare sector, several ways of knowing how to gather, store and prepare data for mining are available. A few such methods include standardization of clinical terminology and the exchange of data across organizations to improve the value and utilization of healthcare data through mining applications [6].
The business process of a high bed occupancy rate, poses several challenges for hospital management. Variation in the arrival of surgical patients and their duration of stay can also lead to a discrepancy between the demand and supply of staffed beds. The effects of this imbalance are compounded during times of high bed occupancies. Patient care is potentially compromised when demand exceeds the supply of staffed beds [7].
The goal of the present study is threefold. First, we propose the use of DM classification techniques to better support the redesigning process of some hospital management business processes by retrieving the required information from large datasets recorded by a hospital via DM models. The redesigning process is based on post-operative data which affects the management side of things in the sector, such as bed occupancy rate and human resources in hospitals. In order to build these DM models, different algorithms are proposed such as neural networks and Bayes' decision tree and rule Induction methods. These models enable hospitals to reduce their costs by automating hospital management functions in determining the number of beds and human resources in the hospital [8].
Second, every classification algorithm is implemented at a time using WEKA based on post-surgery patient status attributes, and the resultant models are compared and tested to determine which one has the most influence on whether a patient will stay after surgery or check out from the hospital, eventually affecting the bed occupancy and human resources [9]. journal.ump.edu.my/ijsecs ◄ Third, simulation experiments were conducted using queuing theory to evaluate the optimal number of beds required to accommodate post-operative patients. These experiments were based on modeling the number of patients expected to stay after surgery (arrival) and the number of days they are expected to stay (service time) as statistical distribution based on real data. The goal is to compute the expected number of beds required to achieve a strategy of zero waiting time.
The paper is organized as follows. Section 2 describes the data and research methodology; Section 3 presents the selected classification techniques implemented via WEKA software to build data mining models; Section 4 discusses the Queueing Theory software used to determine the beds needed for post-surgery patients, and the conclusion and suggestions for future work are presented in Section 5.

RELATED WORK
Works on DM techniques in healthcare are related to and motivated by studies in the data analysis area in general and DM techniques in particular. These studies appeared mostly in the last decade. In 2010, one of the first studies was released [10], which deals with healthcare administration and specifically covers the issues of operating room planning and waiting list management. The paper analyses whether the Master Surgery Schedule (MSS) may be changed to better accommodate shifting surgery demand using an optimization-based method. Second, an expanded optimization-based strategy is described, which includes postoperative beds, in which alternative policies linked to priority rules are simulated to show their impact on average waiting time [10]. The findings suggest that implementing a specific strategy reduces average patient waiting time and operation cancellations while increasing operating room utilization.
Agent-Based Simulation (ABS) appears to be a potential approach to many challenges involving simulating complex systems of interacting entities, according to the applications covered in the paper. Furthermore, current ABS tools and platforms are rarely used, and simulation software is instead created from the ground up using a standard programming language [11]. A recent study looked at how classification-based data mining approaches like Rule-based, Decision tree, Nave Bayes, and Artificial Neural network may be applied to large amounts of healthcare data [11]. They investigated the topic of limiting and summarising alternative data mining methods as part of their research. They also concentrated on predicting combinations of numerous target attributes using various techniques. They provided an innovative and effective heart attack prediction approach utilizing data mining in their study [12].
In 2009, a survey of current KDD strategies for healthcare and public health utilizing data mining technologies was released [13]. It goes through some of the most important topics and challenges surrounding data mining and healthcare in general. The study discovered a rising number of data mining applications, including health care centre analysis for better health policy-making, disease outbreak identification and preventable hospital fatalities, and the detection of fraudulent insurance claims. In general, the study aims to give a survey of current data mining approaches for knowledge discovery in databases that are used in medical research and public health today.
The results of this study can help us realize that, before commencing on data mining, a company must first establish clear policies on patient privacy and security. It is responsible for enforcing this policy with its partners, as well as its branches and agencies. Rapid pandemic outbreaks, the need to detect illness starts in a non-invasive, painless manner, and the need to be more responsive to consumersall of these factors contribute to a growing requirement for health companies to integrate data and use data mining to analyze these data sets [13]. Another recent survey of different DM techniques were provided such as classification, clustering, association, and regression in the healthcare sector [14]. Several examples of each one of these DM techniques was introduced and investigated based on their accuracy and performance. For example, the performance of classification techniques is affected by noisy data. Therefore, the success of using data mining starts with the availability of clean data. Other important issues that were discussed are the importance of patients' data privacy and how to use security measures to protect patients' data from being accessed by authorized users. Finally, the survey also highlighted the importance of accuracy in the prediction of the disease which can be accomplished by combining different DM techniques [14].

RESEARCH METHODOLOGY AND DATASETS
Data mining classification algorithms were implemented via the WEKA software to determine the factors affecting the post-surgery stay duration of a patient. Patient data were obtained from the UCI Machine Learning Repository [15]. The number of beds that need to be made available to accommodate post-operative patients and the waiting time for each patient in the queue were determined using data from 2-3 months of hospital records of daily operations. The data were analyzed using statistical analysis and a queuing simulation software.

Data mining classification techniques and the factores effecting patients staying after a surgery
In this section, the use of decision tree J48, rule induction PART, Bayes' net, and neural networks algorithms multilayer perceptron is described. These were implemented via the WEKA software, which is used to build data mining models to predict post-surgery patient status. The WEKA software is a common suite of machine learning algorithms for data mining tasks [16]. WEKA's native method of storage is in the ARFF format. A conversion was therefore carried out to make the test data accessible for analysis via WEKA. It is a free software licensed under the GNU General Public License [17]. The developed data mining models were used to predict the possibility of a patient either staying after surgery or going home based on several attributes. The main goal is to determine the attribute of the dataset which has the most effect on this decision of a patient via data mining models. This decision affects the management body of a journal.ump.edu.my/ijsecs ◄ hospital by creating a need to offer more services to these patients such as providing more beds, nurse care, and nurse shifts.

Factors effecting patients staying after a surgery
The dataset was collected from the UCI Machine Learning Repository for post-operative patients. It was preprocessed, cleaned, and prepared for usage to build classification models. These models are developed based on the following attributes of post-operative patients: 1. L-CORE (internal temperature in C): high (> 37), mid (>= 36 and <= 37), low (< 36). 5. SURF-STBL (surface temperature stability): stable, mod-stable, unstable.
8. COMFORT (patient's perceived comfort at discharge, measured as an integer between 0 and 20).
9. Decision ADM-DECS (discharge decision): I (patient sent to Intensive Care Unit), S (patient prepared to go home), A (patient sent to general hospital floor).
WEKA is used to build classification models based on the above-mentioned attributes using 50 records of postoperative patent datasets by implementing the selected classification techniques and their algorithms mentioned earlier.
Following this, 40 records of post-operative patients' datasets were used to test the created models which required ensuring that these algorithms and models work properly and fulfill their potential. For the current paper, 50 records of post-operative patients were collected and used via WEKA to build classification models based on the above-mentioned attributes. The purpose of building these classification models is to identify the factors that determine whether a patient stays or leaves after surgery. After the implementation of each classification technique, the results of patients staying and leaving after surgery were computed in percentage.

Decision tree
Algorithm J48 was implemented via WEKA using 50 post-operative patients' dataset [18]. The results presented at the end of this section compare the implemented classification techniques using WEKA software ( Table 2). We are concerned about the statistics representing the numbers of correctly and incorrectly classified instances. Except for the Kappa statistic, the remaining statistics determine specific error quantities based on the class probabilities assigned by the tree. In the present paper, the confusion matrix is considered for each class and how instances from that class receive various classifications. For class "b", 2 correctly classified instances were obtained but 14 was put into class "C", while for class "C", 33 were correctly classified. To determine which attributes most influence patients' decision to stay at a hospital after surgery, we consider the model of 50 records created by WEKA using the decision tree algorithm J48. The unpruned decision tree and the pruned decision tree illustrate how attributes are used by the classifier to make a decision. The leaf nodes show the class to which an instance is allocated if the node is achieved. The numbers in the brackets after the leaf nodes present the number of instances assigned to a node, followed by how many of those instances are incorrectly classified as a result. One attribute that has the most effect on the decision of a patient staying or leaving after surgery is COMFORT, which starts as the root of the tree.

Neural network
A neural network algorithm (multilayer perceptron) is used via WEKA software on post-operative patients' dataset of 50 records [19]. In the current paper, the concern is that the figures must represent the numbers of the correctly and incorrectly classified instances. Except for the Kappa statistic, the remaining statistics determine specific error quantities based on the class probabilities assigned by the neural network. Also, the confusion matrix is considered for each class and how instances from that class receive various classifications. For class "b", 15 correctly classified instances were obtained but 1 was put into class "c", while for class "C", 31 were correctly classified but 2 were put into class "b". For class "I", 1 was correctly classified.
To determine that exact attributes influencing the decision to stay or leave after surgery, the first created model of 50 records by WEKA was considered using the neural network algorithm, which is a multi-layer perceptron algorithm, as illustrated in Figure 1. journal.ump.edu.my/ijsecs ◄  Figure 1 illustrates the entire network with all its attributes to determine which attribute most influences the patient's decision for which attributes were isolated one at a time and the error rate was monitored. If the error rate is more than that of the neural network without isolating, it means this isolated attribute affects the decision. This procedure was carried out by implementing the neural network algorithm on a patient dataset of 50 records using WEKA software one at a time after isolating each attribute. The multi-layer perceptron is presented without isolating any attributes. The comparison of neural network algorithms with attribute isolation and without attribute isolation is listed in Table 1. Following this, the algorithm was implemented by isolating one attribute at a time to find out which ones have the most effect. Table 1. Comparision between the used NN algorithm before and after attribute isolation The above table shows the result of the used NN algorithm after isolating the attributes compared with the used NN algorithm before isolating any attributes, along with comparing the error rate of the implemented algorithm before the isolation with the implemented algorithm of each isolating attribute. It was found that the main attributes affecting the decision of a patient staying at a hospital are as follows: The rest of the attributes were ignored as they don't have any effect on the patient's decision.

Rule induction
Rule induction algorithm (PART) was used via WEKA on post-operative patients' dataset of 50 records [20]. In the present paper, we are concerned with the number representing the correctly and incorrectly classified instances. Except for the Kappa statistic, the remaining statistics determine specific error quantities based on the class probabilities assigned by the rules. Also, the confusion matrix is considered for each class and how instances from that class receive various classifications. For class "b", 10 correctly classified instances were obtained but 6 was put into class "c", while for class "C", 29 were correctly classified but 4 were put into class "b". To determine the attributes that have the most effect on the decision of patients staying at the hospital after surgery, the first model of 50 records is created using the rule induction algorithm, which is a PART algorithm. The results show that the rules that affect the patient's decision on post-surgery stay at a hospital are as follows: 1. COMFORT >7 AND BP-STBL = MOD-STABLE.

CORE-STBL = stable AND COMFORT >7 AND L-CORE = mid.
The attributes that affect their decision are as follows: COMFORT, BP-STBL, CORE-STBL, L-CORE.

Bayes networks
The Bayesian algorithm (Bayes' net) is used via WEKA on the post-operative patients' dataset of 50 records [21], [22]. In the present paper, the numbers of correctly and incorrectly classified instances are focused upon. Except for the Kappa statistic, the remaining statistics determine specific error quantities based on the class probabilities assigned by Bayes' net. Also, the confusion matrix is considered for each class and how instances from that class receive various classifications. For class "b", 6 correctly classified instances were obtained but 10 were put into class "c", while for class "C", 30 were correctly classified but 3 were put into class "b". To determine the exact attributes that most affect patients' decision to stay at the hospital after surgery, the first model of 50 records was used via WEKA using the Bayesian approach algorithm, which is Bayes' net algorithm.
The produced ADtree shows the main attributes that have the main effect on the decision of staying patients after surgery. The ADtree illustrates the probability of the attributes that affect the decision of patients staying after surgery. If we press on any attribute's circle, we get the probability distribution of a patient either staying, leaving, or being sent to the ICU, as illustrated in Figure 2. In the above figure, the probability distribution of an attribute is obtained by pressing on any attribute's circle. These attributes affect the patient decision. In the present paper, the focus was only on the probability distribution of the attributes that affect the patient's decision. These attributes are as follows, based on the probability distribution table of each attribute showing the probability of a patient staying after surgery: L-CORE, L-SURF, L-O2, L-BP, SURF-STBL, CORE-STBL, BP-STBL, COMFORT), as illustrated above for each attribute.

ANALYSIS OF THE RESULTS
As mentioned above, four techniques were implemented to build classification models using 50 records of postoperative patients' datasets via WEKA. Table 2 illustrates the comparison between these four classification techniques. The above table shows the results of implementing data mining classification techniques used to build models to predict the number of patients staying or leaving a hospital after surgery. Furthermore, it shows a comparison between four classification techniques and their algorithms for only the correctly classified records, which can be considered as the best and worst classification techniques, respectively. This can be determined by the error level of each classification technique and can also be considered as an advantage in data mining classification techniques.
In the present experiment, the best-performed classification technique is neural network since each implementation of its algorithm presents a higher rate of correctly classified records and fewer errors. Neural Networks are one of many data mining analytical tools that can be utilized to make predictions about key healthcare indicators such as cost or facility utilization. Neural networks are known to produce highly accurate results in practical applications. A typical neural network may have several hidden layers, and each layer can have several neurons. A neuron receives several signals from its input links, computes a new activation level, and sends it as an output signal through the output links. The input signal can be raw data or outputs of other neurons. The output signal can be either a final solution to the problem or an input to other neurons. The worst performed classification technique is the decision tree as its algorithms yield the highest error rate and lowest correctly classified records. Another advantage of using data mining classification techniques is in determining the exact attributes that have the most influence on a patient's decision. As illustrated above, the attributes were examined using the models created via WEKA.

DETERMINING BED REQUIREMENTS FOR POST-OPERATIVE PATIENTS
After determining the attributes that most affect the decision of a patient to stay or leave after surgery using data mining classification techniques, this section concentrates on post-surgery patients staying at a hospital. These patients need services such as beds, nurses and other services, which pertain to, and affect the management side of a hospital.

Chi-square test
A chi-square fitness test was used to evaluate the hypothesis that a random sample of size n of random variable X follows a particular distributional structure. This test formalizes the intuitive concept of comparing the data histogram to the shape of the candidate density or mass function [23]. The test is valid for large sample sizes, both for discreet and continuous distributional assumptions, where the parameters are determined by maximum probability. The test procedure starts with the arrangement of n observations into a set of K class intervals or cells, and the statistic is given by the following equation: where and are the observed and expected frequencies, respectively, for the i-th class interval. The expected frequency for each class interval is computed as = n , where is the theoretical, hypothesized probability associated with the i-th class interval [24]. journal.ump.edu.my/ijsecs ◄ It can be shown that 2 approximately follows the chi-square distribution with K -S -1 degree of freedom, where S represents the number of parameters of the hypothesized distribution estimated by the sample statistics [24]. The hypotheses are as follows: 0 : The random variable, X, conforms to the distributional assumption with the parameters (s) given by the parameter estimate (s); and 1 : The random variable X does not conform.
In applying the test, if expected frequencies are too small, 2 will reflect not only the departure of the observed frequency from the expected one but also the smallness of the expected frequency. Although there is no general agreement regarding the minimum size of , values 3, 4, and 5 have been widely used. If the value is too small, it can be combined with the expected frequencies in adjacent class intervals. The corresponding values should also be combined and K should be reduced by one for each combined cell [25].
If the distribution being tested is discrete, each value of the random variable should be a class interval, unless it is necessary to combine adjacent class intervals to meet the minimum expected cell-frequency requirement. For the discrete case, if combining adjacent cells is not required, Otherwise, is determined by summing the probabilities of appropriate adjacent cells.

PATIENTS STAYING AFTER SURGERY
Beds needed for each patient after surgery are decided using the queuing theory, which requires analyzing the number of patients staying after surgery and the number of days they will stay.
The Poisson distribution is named after the French mathematician Simeon Denis Poisson, who published its derivation in 1837 [26]. It is a discreet distribution of probabilities in which the likelihood of occurrence of an outcome with a short period of time is too small and the likelihood of two or more such outcomes within a short time is negligible. The occurrence of an outcome over a span of time is independent of each other. Poisson distribution's formula: where, λ: is an average rate of value in a given time of period.
x: is a Poisson random variable. e: is the base of logarithm (e=2.718). In the above table, the data was collected from 2-3 months of hospital records on the number of operations per day, the patient's admission and discharge dates. The data is randomly picked from the population, it is presented in the form of frequencies (observed and expected frequencies). The Chi-square test is implemented to test the number of patients staying after surgery. The results presented in Table 4 use the chi-square test, indicating that the data (The arrival time at a serviced location) indeed follows the Poisson distribution. journal.ump.edu.my/ijsecs ◄  Table 3. 2-3 months of hospital records on the number of operations, the patient's admission, and discharge dates from daily patients arrival were calculated as given in Table 3.

Duration of patients staying after surgery
Service time in the queuing model usually follows a negative exponential distribution. The data of the number of days a patient stays after surgery were tested for the negative exponential distribution. In probability theory and statistics, exponential distribution (also known as a negative exponential distribution) is a family of continuous probability distributions. It defines the time between events in the Poisson process, i.e. a process in which events take place continuously and independently at a constant average rate [27]. The Exponential distribution formula is defined by: p(X) = λe -λx for positive λ and nonnegative x (4) Table 4. Testing for the Exponential distribution.
In the above table, the data was collected from 2-3 months of hospital records on the number of operations per day, the admission as and discharge dates of patients. The data is randomly picked from the population, it is presented in the form of frequencies (observed and expected frequencies). The results presented in Table 4 using the chi-square test, indicating that the data (the number of days a patient stays after surgery) indeed follows an exponential distribution.  The above figure illustrates the probability of a patient occupying a bed after surgery (bed service time, which is the waiting time for a patient to get a bed after surgery) following the implementation of the exponential distribution formula. The curve represents values from the fit of the exponential distribution given in columns 1 and 5 in Table 4. 2-3 months of hospital records on the number of operations per day, service time from daily patient's arrival were calculated as given in Table 3.

RESULTS
The results obtained from implementing both formulas were used by a simulation software (Queueing Theory Software for Calc) to obtain the number of beds required for each patient staying after surgery. Following this, a patient's waiting time in the queue after surgery is obtained. Queuing theory software for Calc provides a set of OpenOffice Calc spreadsheets that solve various queueing models. Collectively these spreadsheets are known as QtsPlus4Calc. The work of queueing software project is based on the Microsoft Excel-QtsPlus software package. The QtsPlus4Calc collection of spreadsheets will encompass the following areas: basic probability calculators; solvers for finite, linear difference equations; single-server, multiple-server, bulk, priority, and network analytic queueing models and several simulation models. Using a queuing model with exponential arrival and service time distributions, the following results were fed into the simulation software (Queueing Theory Software for Calc) to obtain the number of beds needed for patients staying after surgery: • Arrival rate (λ =3.78 patient/day); • Service rate (μ = 0.303951); • Number of server (c = 13); In the present paper, we considered an M/M/C queue in which the number of beds is fixed. The patients arrive according to a Poisson process at a rate (λ =3.78 patient/day). The service is provided by C servers, who serve the patients on a first-come-first-served (FCFS) basis. The service time of each patient is exponentially distributed with a mean (1/μ=3.29 days). The results show that the probability of the customer arriving is delayed in a queue for about 0.826303, meaning that a patient's waiting time is approximately 1 with 13 instances and with 15 beds in the whole hospital. The hospital should thus provide a bed for a patient who is waiting in the queue by discharging recovered patients.

CONCLUSION
Data mining technologies can be of great value to the healthcare industry. However, healthcare data mining can be constrained by the availability of data, as raw data mining inputs frequently occur in various settings and structures, such as administration, hospitals, laboratories, and more. Data must also be obtained and implemented before data mining can be undertaken. In this paper, we used data mining classification techniques to build an analytical procedure based on raw data containing several attributes on patient status after surgery collected from the surgical department and the UCI Machine Learning Repository. These data mining models helped determine which patients will stay and which will go home after surgery and also the attributes that contribute to the patient's decision to stay. Further, the present work indicates which classification technique is the most appropriate out of the four techniques (neural network, decision tree, rule induction, and Bayesian network). Also, the study helps determine which out of all the nine attributes have the most effect on the decision of a patient staying or leaving after surgery. It was proved that neural networks performed better than the other DM techniques in terms of accuracy and prediction. After determining the attributes affecting the stay of a patient, the next task was to determine the number of adequate beds required to accommodate them by analyzing the duration of the stay. The queuing model used by the Queueing Theory software allowed us to determine the number of beds that will keep the wait length to 1. It is assumed that if the length is 1, then one of the current patients occupying a room will be discharged, which is a common hospital policy. The study can further help any healthcare organization benefit from the developed data mining application in the management aspect of health care and could potentially contribute to the research on data mining and the healthcare industry. Future work will investigate other areas where data journal.ump.edu.my/ijsecs ◄ mining techniques can be used in the healthcare industry to help with the computerization of business and decisionmaking processes.