Machine learning and deep learning-based approaches on various biomarkers for Alzheimer’s disease early detection: A review

– Alzheimer’s disease (AD) is a progressive neurodegenerative disorder. It can cause a massive impact on a patient's memory and mobility. As this disease is irreversible, early diagnosis is crucial for delaying the symptoms and adjusting the patient's lifestyle. Many machine learning (ML) and deep learning (DL) based-approaches have been proposed to accurately predict AD before its symptoms onset. However, finding the most effective approach for AD early prediction is still challenging. This review explored 24 papers published from 2018 until 2021. These papers have proposed different approaches using state of the art machine learning and deep learning algorithms on different biomarkers to early detect AD. The review explored them from different perspectives to derive potential research gaps and draw conclusions and recommendations. It classified these recent approaches in terms of the learning technique used and AD


INTRODUCTION
Alzheimer's disease (AD) is a fatal disease that slowly destroys brain's cells causing serious damages in the patient's body, mentally and physically. The symptoms start to appear gradually, starting from memory loss, confusion, and depression, and ending at losing the ability to eat and walk [1]. The prevalence of AD within the next 30 years is really shocking. A study conducted in 2013 to estimate AD prevalence in the United States from 2010 until 2050 revealed that the number of elderly people suffering from AD dementia will increase from 4.7 million to 13.8 million [2]. In fact, the continuous increase in the number of deaths due to AD dementia in the US has made it the fifth leading cause of death for people aged 65 and older [1]. Furthermore, the global impact of AD is more dreadful. According to the latest world health organization fact sheet for dementia statistics around the world, there are nearly 10 million new cases of dementia worldwide, and 60 to 70 % of them are caused by Alzheimer's disease. Extensive research has been conducted to discover the real reasons behind this mysterious disease and find the perfect cure that can impede this rapid increase in AD cases [3]. Unfortunately, an effective cure for AD has not been discovered yet [4]. However, as the cognitive impairment progressively increases, an early prediction of AD will greatly help reduce its impact through an early therapeutic intervention [5], and it will give the patient more time to adjust with its symptoms and improve their lifestyle [6]. Therefore, several machine learning and deep learning techniques have been proposed to detect AD at early stages. Nevertheless, proposing an optimal approach able to efficiently predict AD with high accuracy is still a big challenge.
Artificial intelligence (AI) is one of modern technologies that has been largely used in many applications to build intelligent systems that simulate human's way of thinking. Machine learning is a subset of Artificial intelligence. It was defined in 1959 by Arthur Samuel, a pioneer in AI and computer gaming, as "field of study that gives computers the ability to learn without being explicitly programmed". Machine learning algorithms could overcome the static program instructions and create computational models able to automatically learn from data and derive different decisions and predictions [7]. There is a wide variety of ML algorithms that were successfully used in many fields such as healthcare, marketing and education [8] . However, with the advent of big data, deep learning, which is a subset of machine learning, has remarkably surpassed traditional methods [9]. DL algorithms have achieved high levels of accuracy in many areas such as voice and face recognition [10].
ABSTRACT -Alzheimer's disease (AD) is a progressive neurodegenerative disorder. It can cause a massive impact on a patient's memory and mobility. As this disease is irreversible, early diagnosis is crucial for delaying the symptoms and adjusting the patient's lifestyle. Many machine learning (ML) and deep learning (DL) based-approaches have been proposed to accurately predict AD before its symptoms onset. However, finding the most effective approach for AD early prediction is still challenging. This review explored 24 papers published from 2018 until 2021. These papers have proposed different approaches using state of the art machine learning and deep learning algorithms on different biomarkers to early detect AD. The review explored them from different perspectives to derive potential research gaps and draw conclusions and recommendations. It classified these recent approaches in terms of the learning technique used and AD biomarkers. It summarized and compared their findings, and defined their strengths and limitations. It also provided a summary of the common AD biomarkers. From this review, it was found that some approaches strove to increase the prediction accuracy regardless of their complexity such as using heterogeneous datasets, while others sought to find the most practical and affordable ways to predict the disease and yet achieve good accuracy such as using audio data. It was also noticed that DL based-approaches with image biomarkers remarkably surpassed ML based-approaches. However, they achieved poorly with genetic variants data. Despite the great importance of genetic variants biomarkers, their large variance and complexity could lead to a complex approach or poor accuracy. These data are crucial to discover the underlying structure of AD and detect it at early stages. However, an effective pre-processing approach is still needed to refine these data and employ them efficiently using the powerful DL algorithms. journal.ump.edu.my/ijsecs ◄ Machine learning algorithms have been increasingly utilized to analyse medical data and extract several features that can be used to understand many aspects related to the disease such as the disease pathology and human brain malfunctions [11]. And with the current progress in machine learning technology, new techniques have been developed to predict AD and model its progression [12], among which supervised machine learning algorithms have proven their efficiency to learn from a massive amount of data within a very short time, and demonstrated their capability of helping doctors to accurately predict diseases at early stages [13]. However, with the extensive breakthrough in neuroimaging technologies resulting in high complex and large-scale data, deep learning technology has intriguingly exhibited its preference over traditional ML methods at interpreting neuroimaging data with high dimensionality and precisely detect AD [14]. In fact, it was recently proven that DL technology has become the foundation for the prediction of AD [15].
Although ML and DL have achieved high precision at detecting AD with neuroimaging data, most of these approaches could lack the ability to discover the susceptibility of the disease early enough [16]. Therefore, some researchers have supported their analysis by including genetic data to other neuroimaging modalities [17]- [19] . This is because AD is considered as a complex disease with a genetic basis [20]. The complex nature is derived from the many factors that can contribute to the disease such as environmental factors and genetic or inheritance factor [21], in which the latter plays a fundamental role in the disease pathogenesis as it may contribute to 70% of risk factors [22].
According to age pattern, AD onset was divided into two subsets: early onset AD (EOAD) and late onset AD (LOAD). The first set affects people aged less than 65 and has 5% of total AD cases, whereas the second one has 90% to 95% of total cases and affects elderly people aged more than 65 [23]. EOAD is less complicated and more understandable than LOAD, in which many genes can be associated with it [21], [23]. Hence, many technologies have emerged to understand and decode the human genome and turn it into a readable format so researchers can study it closely and extract vital genomic biomarkers [24], [25]. These markers can enrich the knowledge about LOAD characteristics and its etiology, and lead to an early diagnosis and therapy development [26].
Machine learning algorithms with their powerful abilities at manipulating multi-dimensionality data have proved their excellence at increasing prediction precision of complex diseases using genetic markers (usually known as SNPs) [27]. They have been involved in many approaches and helped discover many disease genes associated with AD dementia.
The wide adaptability of machine learning technology into the health sector has resulted in a broad range of available medical datasets for researchers [28]. In fact, a great deal of open access data repositories and a wide range of medical datasets can be easily accessed such as massive electronic health records, neuroimaging datasets, and genomics biomarkers.
In this survey, we explored 24 papers from 2018 till 2021 based on machine learning and deep learning techniques to early predict AD. We gathered the latest AD prediction approaches and divided them into ML based and DL based approaches. We further classified them based on the type of medical data, and discussed their workflow and results to identify their advantages and disadvantages. The review summarized an overall knowledge about the recent ML and DL technologies and their findings in the context of AD early prediction. It provided researchers valuable insights into research gaps and future research.
The rest of the paper is organized as follows: section 2 is for the review methodology; section 3 states some types of AD biomarkers; section 4 explores a number of recent machine learning and deep learning based approaches for detecting AD. This section is divided into two subdivisions: Machine learning based approaches, and deep learning based approaches. In each division, the approaches are separated into a number of categories based on the data type used in the method. Section 5 is for findings discussion and results comparison. Finally, a conclusion and future work are presented in section 6.

MATERIALS AND METHODS
The continuous advances in machine learning and deep learning technologies, and the large diversity of biological and medical data have opened the way for a large field of various research studies for AD classification and prediction. In this review, we focused on 24 papers published from 2018 and 2021. These papers were selected to explore the recent findings in AD prediction using machine learning and deep learning algorithms on various biomarkers. We firstly outlined some types of AD biomarkers, demonstrated in Fig 1. Then, we objectively summarized the selected papers by dividing them into two main categories based on the type of AI learning technique used: approaches using ML algorithms and approaches using DL algorithms, demonstrated in Fig 2. Each category was split based on the type of data exploited in the approach. In the first category, the approaches were split into six sub categories: images data, large scale health data, gene expression data, genetic variants data, mobility and cognitive data, audio data. In the second category, the approaches were split into four sub categories: images data, large scale health data, genetic variants data, and heterogeneous data. We explained each approach in terms of its workflow, algorithms, data type, and performance results. After that, we discussed all approaches from different perspectives, outlined their pros and cons, and briefly compared their findings using area under the curve (AUC) and accuracy (ACC) as the evaluation metrics since these two metrics were the common metrics used in all papers. Lastly, we summarized all of them in two tables based on the dataset name, data type, algorithm/s, evaluation technique, and testing results. journal.ump.edu.my/ijsecs ◄

BIOMARKERS OF ALZHEIMER'S DISEASE DETECTION
AD biomarkers have played an integral role in understanding its structure and monitoring its progression to help its early detection and treatment [29], [30]. This section summarizes some types of these biomarkers.

Image biomarkers
Many imaging techniques have emerged for AD diagnosis such as structural magnetic resonance imaging (sMRI) scans, shown in Figure, which is one of neuroimaging techniques used to support doctors' diagnosis of AD and help measure the size of degeneration of some brain areas that can lead to early detection [31]. sMRI scans , shown in Fig 3, produce high contrast images used to measure the volume of the grey and white matter of the patient's brain [32]. They can detect the atrophic alterations in the human brain that could lead to severe damages causing AD [33]. Another neuroimaging technique used for detecting AD is amyloid and tau positron emission tomography (PET). It was proven that the accumulation of amyloid and tau, which are some types of proteins, in the brain can severely damage its cells and journal.ump.edu.my/ijsecs ◄ lead to AD [34]. Amyloid and tau PET biomarkers have become increasingly important for studying the abnormal accumulation of these proteins and understanding disease pathology [35]. There are many open access databases for sMRI and PET such as Alzheimer's Disease Neuroimaging Initiative (ADNI) datasets [36] and Open Access Series of Imaging Studies (OASIS) dataset [37]. Furthermore, as MRI and PET could be limited in accessibility and expensive [38], [39], a new non-invasive, inexpensive technique known as retinal imaging, shown in Fig 4, has recently been used as AD biomarkers. These biomarkers show the abnormal changes of retinal vascular that could be associated with AD [38]. An example of retinal images databases is UK biobank [40].   [38].

Genomics biomarkers
With the advance of DNA sequencing technologies and with the help of Genome wide association studies (GWAS), many disease-associated genes to AD have been discovered [21]. Next generation sequencing (NGS) is a DNA sequencing technology. It is a cost-effective, highly accurate deep sequencing technique that can sequence the whole human genome into millions of four-letters sequences within only one day [24]. Various platforms using NGS technology have paved the way for a wide range of studies to explore different regions of the human genome and discover many genetic variants contributing to complex diseases [42]. One of those studies is GWAS that concerns analysing human genetic variations to define the genetic risk factors of a complex disease [43]. Genetic variants can be a single alteration in DNA sequence known as single nucleotide polymorphism (SNPs) or longer alteration such as insertion and deletion variations (indels) and copy number variations (CNVs) [44] . Moreover, gene expression is another type of genomics biomarkers. Gene expression is the set of instructions encoded in DNA and used to build protein molecules (gene products) [45]. DNA microarray is one of many technologies used for gene expression profiling [25]. It is a powerful tool that can monitor the expression of thousands of genes at the same time and profile valuable information about the gene expression process. Gene expression profiles can help understand the basic genetic structure of a disease through discovering genes involved in its formation [46]. They have the ability to visualize the physiological changes of an AD patient and guide many researchers to understand the biological aspects of the disease pathology [47]. There are a wide range of genome datasets such as ADNI [36], and Dementia and Traumatic Brain Injury (TBI) Study. ADNI provides two resources of genetic variants: GWAS genetic variants and Whole genome sequencing (WGS) dataset. In the WGS journal.ump.edu.my/ijsecs ◄ dataset, genetic variants are stored in a variant call format (VCF), as shown in Fig 5. VCF is a standard representation of genetic variants including SNPs, indels and other structural variants [48].

Vocal and Gait biomarkers
Vocal biomarkers can be collected in non-invasive and inexpensive manner, and they can be used to analyse audio segments of subject's speech and extract risk features associated to AD [49], whereas, gait or walking biomarkers can be used to monitor subject's movements and reactions to extract risk features associated to AD [50], as shown in

ML AND DL-BASED APPROACHES FOR ALZHEIMER'S DISEASE DETECTION
In this section, the latest approaches proposed to predict AD dementia at early stages are illustrated. They are divided into two main parts: approaches with machine learning techniques and approaches with deep learning techniques. For each part, the approaches are split based on the type of biomarkers used.

Machine learning-based approaches
This section delineates recent machine learning-based approaches based on the type of data used to predict AD.

Medical imaging data
There is a variety of imaging data used by machine learning algorithms such MRI scans data, PET scans data, and retinal images data. In the following, ML approaches using two types of imaging data: MRI and retinal images data were explored: journal.ump.edu.my/ijsecs ◄

Magnetic resonance imaging (MRI) data
In [51] longitudinal structural MRI (sMRI) data of 150 participants were used for AD classification through a 4-stage automated pipeline. The first stage was for data pre-processing. The second stage was for dividing data into two main sets: training set and test set. The main training set was also divided into three subsets: train, validate, and test sets. In the third stage, the three subsets were used by 17 supervised machine learning algorithms to build predictive models, and the best model resulting from this stage was tested in the final stage using the main test set. The best algorithm was Random forest (RF) with AUC of 0.8722. Another approach in [52] employed structural MRI for differentiating people with AD dementia from people with vascular dementia. Researchers used a collection of MRI scans for 58 subjects with AD and 35 subjects with VD. The approach went through multiple steps of image pre-processing such as skull-stripping and alignment in order to select the appropriate features for the training stage. Four machine learning algorithms were applied: Support vector machine(SVM), K-nearest neighbours(KNN), RF, and Logistic regression (LR). The SVM with radial basis function (RBF) kernel attained the best outcomes with AUC of 0.861. Moreover, Researchers in [53] utilized 1,167 sMRI scans to classify normal cognitive (NC) state and three different states of dementia: early MCI, late MCI, and probable AD. The approach trained six ML classifiers: KNN, Decision tree, RF, Naïve Bayes (NB), linear SVM and nonlinear SVM with RBF kernel. The testing results showed that non-linear SVM with RBF kernel accomplished the best classification performance for all stages with AUC of 0.76.

Retinal vasculature imaging data
A recent study in [38] exploited retinal biomarkers to predict AD through a machine learning pipeline. The pipeline consisted of three stages. The first stage was for selecting images with sufficient quality, the second stage was for generating vessel maps and using T-test for feature selection, and the final stage was for model building using SVM classifier. The classifier demonstrated an overall accuracy of 0.824.

Large scale health data
As machine learning algorithms have exhibited their superiority in big data, many approaches have used them to analyse a massive amount of health data and extract important features for predicting AD. The researchers in [54] developed a number of predictive models for predicting definite AD and probable AD within 4 years. They applied three machine learning algorithms, RF, SVM and LR on large-scale data including clinical tests, participants and family information, and prescribed medications. RF classifiers surpassed other classifiers in 1-year to 4-year prediction of definite AD in which results ranged from AUC of 0.775 to AUC of 0.677. In addition, in [55] researchers employed an extensive data collection of clinical tests, neuropsychological tests, social and demographic information to predict the conversion of a patient from mild cognitive impairment (MCI) to AD dementia within three years. By using weighted rank average ensemble technique, they built an ensemble ML model consisting of 13 supervised machine learning algorithms such as KNN, LR, RF and NB, and achieved a performance of AUROC 0.88. Another approach in [4] used a large scale of health data for early prediction of AD. They collected multiple attributes from different tests such as Mini-Mental state examination, clinical dementia rating, estimated total intracranial volume, and other information of participant's socioeconomic status and education background. They utilized a number of machine learning classifiers to train and validate their models such as RF, LR, NB, SVM with linear kernels, in which the latter demonstrated best accuracy of 0.95.

Gene expression profiles data
In [46] gene expression data were exploited to classify AD and discover new genomics biomarkers associated with AD. The researchers at first ranked expressed genes with P-value by using T-test in order to remove genes with P-value less than 0.5 as they have significantly different expressed values than the two sample classes, AD, and NC. After removing differentially expressed genes, 2000 genes were selected for training and testing, and five machine learning classifiers were employed. The best classifier was SVM with a linear kernel. On the other hand, three techniques were utilized for feature selection: Principal component analysis (PCA), RF, and Extra tree classifier. After analysing the extracted features or genes, the 9 genes selected by PCA were chosen and joined with the overlap set of genes selected by the three methods. The new set of 14 genes were tested by the SVM classifier, since it got the best classification results, and the results were better. The new set of genomics biomarkers was considered as an influential set associated with AD. In contrast, the approach in [56] used differentially expressed genes (DEGs) extracted from four regions of the human brain to study their connection with the disease as researchers believe that these kinds of genes coming from different regions are correlated to AD. They started by removing redundant data from each sample since gene expression data were taken for four regions of the same person. Then, the expressed genes were ranked with P-value by using linear mixed effect model (LMM) technique. After that, genes with minimum P-value that were differentially expressed were enriched by gene ontology to explain their biological implications. Lastly, all genes from both classes, AD and non-AD, were enriched by using a gene ontology database in order to find the functional connection or pathways between them and DEGs. Top ten and top six of DEGs were chosen and tested by four ML algorithms, in which RF algorithm achieved the best accuracy of 0.73 and 0.83 respectively. journal.ump.edu.my/ijsecs ◄

Genetic variations
In [57] researchers used SNPs data for classifying AD and extracting the genetic variants associated with AD. They suggested a new approach to improve the classification accuracy by using the misclassified samples. At the beginning, they trained three ML classifiers: BSWiMS, GALGO, LASSO, and from the best classifier, LASSO, they selected the misclassified testing samples. Then, they extracted the related SNPs of these samples and retrained the model with the LASSO classifier. After that, they merged the features extracted from all samples and the features extracted from the misclassified samples, and used them to train the model. The results achieved by using the last set of features demonstrated the best testing performance with AUC of 0.842. One more approach utilized SNPs data in [16]. The researchers designed an ensemble model to predict AD. They first pre-processed genetic variants by applying quality control procedures. Then, they picked the top 2,500 SNPs to build the ensemble model consisting of five ML classifiers by using a benchmarking tool called feature selection algorithm for computer aided diagnosis (FRESA.CAD). After validating and testing the models, the classifiers performance ranged from AUC of 0.6 to AUC of 0.7, whereas the ensemble model achieved a better output with AUC of 0.719. However, when the ensemble model was trained with the top 1000 SNPs, it attained a result with AUC of 0.554. Moreover, the ensemble model resulted in eight genes that were the most selected genes among all classifications, and these genes were known for their strong association to AD. Another approach suggested in [58] is to study and discover the effect of genetic mutations related to AD through extracting the most influential features and using them to segregate the harmful SNPS from harmless SNPs. In the suggested approach, a two-stage feature selection was applied to select the most important features. In the first stage, recursive feature elimination cross validation (RFECV) was used to select 39 features. In the second stage, forward feature selection was used to select the best feature combination. After selecting the best combination of 11 features, a model was trained on these properties using a random forest algorithm, and the achieved result was AUROC of 0.8949.

Mobility and cognitive data
Researchers in [50] used dual-task gait assessments data for classification AD, MCI, and NC. The gait features were extracted from a gait analysis software with a pressure sensitive carpet. The subjects underwent dual-task valuations through walking and testing their cognitive ability such as memory, language and attention at the same time. SVM classifier was trained on the data, and it achieved an average accuracy of 0.78

Audio data
A new approach was suggested in [49] to employ speech data for predicting AD at early stages. In the approach, the audio data were gathered and divided into 1-second segments. After that spectrogram features were extracted and used to train ML models using five ML algorithms. The classifier's performance was tested on two data sets, in which logistic regression CV classifier achieved best results with accuracy of 0.833 and 0.844 in the two datasets.

Deep learning-based approaches
This section outlines recent deep learning-based approaches based on the type of data used to predict AD.

Magnetic resonance imaging (MRI) data
Researchers in [41] employed a two dimensional convolutional neural network (2D CNN) to predict AD. They used MRI data to train their model in which they tried a number of inputs for the last hidden layer ranging from 120 inputs to 130 inputs with a dropout rate ranging from 0.1 and 0.5 in order to get the best performance, which was found at 121 units with a drop rate of 0.2. The model attained a testing accuracy of 99.30. Another approach exploited MRI data in [59] to predict AD. Researchers used structural MRI features extracted from the hippocampus area of 933 subjects. They designed a lightweight three dimensional CNN by using the deep visual attributes extracted from another model called 3D Dense CNN and the global shape attributes extracted from hippocampus segmentations. The features then were combined in a fully connected layer followed by a softmax layer. The model accomplished an accuracy of 92.52.

Positron emission tomography (PET) data
Raeserchers in [60] used amyloid or tau PET features for AD classification. At first, they trained 3D CNN for classifying AD and NC. Then, they used the trained model to predict the conversion of MCI state to AD state, in which a subject with a probability close to 1 was classified as an AD conversion, whereas a subject with a probability close to 0 was classified as nonAD conversion. After that, a layer wise relevance propagation (LRP) algorithm was used to extract features resulting from the model to be visualized in a heat map to show brain areas closely related to AD. The average accuracy of classifying AD and NC was 90.8.

MRI data + PET data
The approach in [61] employed two image modalities: MRI scans and amyloid PET scans to predict AD. After preprocessing both modalities, two identical CNNs of the two modalities trained on the same time. The weights of both journal.ump.edu.my/ijsecs ◄ networks were merged at the last hidden layer consisting 128 inputs to form a fused network with one output layer. The testing results of this network was with accuracy of 92.34.

Large scale health data
In [62] researchers used longitudinal electronic health records from 2007 to 2017 including many features such as subject's age, background and clinical test results. Three models were trained on these data to predict MCI and AD within three to eight years using recurrent neural network (RNN), RNN with trained weights of another model, and a feed forward network. In the latter, researchers inserted three features, sex, age, days of collecting data, directly to the last hidden layer to ensure that all of their weights are included. The best results ranged from 0.81 to 0.84.

Genetic variations data
In [63] researchers exploited SNPs data only to predict AD. They used whole genome sequencing data of 42,908,833 SNPs. After applying a quality control pipeline to remove bad SNPs, they used 1,884 SNPs for building their predictive models. They suggested two neural network architectures, DNN and 1D CNN. For evaluating the performance, they divided the SNPs into a number of subsets based on their p-value that was copied from the international genomic of Alzheimer's project (IGAP) report. The best performance was for DNN on a subset of 200 SNPs with AUC of ≈ 0.62.

Heterogeneous data
Some approaches have used different types of biomarkers in order to improve the prediction accuracy either by merging them into one unified form, or by using them separately and merging final results.

Neuroimaging + Genetic variants data
Researchers in [17] proposed to merge SNPs data with brain region of interests (ROIs) data. This is because they believe that this kind of data can directly describe the disease, whereas genetic biomarkers can describe its etiology, and as a result, the neural network could fail when dealing with these biomarkers only. Thus, they assumed that the structural information of brain regions can help the network understand genome data and improve its accuracy. At first, SNPs and ROIs information were normalized and ranked based on their degree of importance by using random forest algorithms. After merging them, the total number of features was 542 features. After that, a deep learning model was trained and tested on these features, and the results showed an improvement in the network performance. The best result was for the top 10 SNPs and ROIs with AUC of 0.80. Another approach employing images data and SNPs data was suggested in [18]. The researchers suggested a method to improve the accuracy of a conventional neural network (CNN) used to predict AD by merging its predictions with another network's predictions. In the approach, the two networks, CNN with MRI data and multilayer perceptron network (MLP) with SNPs data, were trained separately. After getting the output of both networks, an ensemble gate merged them to form the final prediction result if the prediction accuracy of CNN was low. Otherwise, the final result would be for CNN prediction only. After the approach evaluation, the prediction accuracy for 75 subjects improved from AUC of 0.9232 when using MRI scans only to AUC of 0.936 when using both MRI and SNPs data. Furthermore, in [19] researchers used MRI data and SNPs data of APOE ε4 allele and 19 SNPs known for their strong contribution to AD. They suggested merging MRI features and SNPs features and building predictive models to predict the conversion from MCI state to AD state. They trained 100 models using DNN and 100 models using logistic regression (LR). The models were trained to classify AD and NC states. Then, they were tested to predict MCI conversion. The DNN showed better performance than LR with AUC of 0.835.

DNA methylation profiles + gene expression profile
DNA methylation is a process involved in gene expression regulation [64]. Its potential effect is usually at DNA regions known as CpG islands. Some studies believe that there is a correlation between gene expression and DNA methylation. Hence, we found that researchers in [65] used both of them to predict AD. They used gene expression and DNA methylation profiles extracted from the prefrontal cortex. As both profiles cannot be merged directly because of their different behaviour and characteristics, the researchers proposed a feature selection method to extract features from both profiles into two features, one for genes and the other for CpG probes. The method had two steps. The first step was for filtering differentially expressed genes (DEGs) and differentially methylated positions (DMPs). As every DMP has its related genes, researchers in the second step merged both features by intersecting genes that were differentially expressed and differently methylated as they believe that these genes have a strong connection to the disease. After that, DNN was built and optimized with Bayesian hyper-parameter optimization, and the model achieved an accuracy of 82.3% and AUC of 0.797.

DISCUSSION
This section discusses all approaches from different perspectives to draw implications about their results, strengths and limitations, and make recommendations for future work.
Few machine learning research studies have tried to use new types of data other than Neuroimaging and genetic variants, which are the most modalities used by ML methods, to predict AD. One recent study [50] that analysed gait movement and patient's cognitive responses to extract features capable of classifying AD patients from a cognitively normal person. Analysing such data with machine learning technology could greatly contribute to discovering the subtle journal.ump.edu.my/ijsecs ◄ cognitive or physical changes that a patient may exhibit long time before AD onset, and this will help doctors to discover the disease early enough. However, as the utility of these biomarkers is still limited by few research studies, further research might be needed to assure their significance. Another study used Audio data to predict AD [49]. These data are inexpensive and more accessible compared to other modalities such as Neuroimaging data. And same as physical and cognitive features, some Audio features could be an important indication of AD development in future, and they could help doctors and patient's families to predict AD susceptibility from the way this patient talks or sounds. Nevertheless, further research might be also required to explore their relevance to AD early prediction.
Moreover, although DL technologies have shown a higher precision performance than ML technologies [66], it was found that the number of ML based approaches employing genetic variants only to predict AD were more than DL based approaches. In fact, only one research was found using SNPs data as the only modality to classify the disease. This might be because of the complicated nature of these kind of features by which a neural network usually achieves poor classifying accuracy. This could also be due to the limited number of samples compared to the enormous number of features in most SNPs datasets that might affect the network performance because DL algorithms require huge amount of instances [33]. Therefore, most researchers have tended to use ML algorithms for feature selection and classification. Another reason could be due to the limited number of samples compared to the enormous number of features in most SNPs datasets. This may also affect the network performance because DL algorithms require a huge amount of instances. On the other hand, many DL based approaches have involved genetic variants with other modalities [17]- [19], mostly neuroimaging modalities, as a way to improve network performance. And in spite of adding more complexity to their approaches, most of them got only an accuracy improvement of 2% to 3%. As genome biomarkers such as genetic variants play an inarguable role at understanding the disease's underlying structure [26], and because of the promising capability of DL technology with genetic data [67], further research on employing this technology with genetic variants could help explore them more deeply and define the vital regions in human DNA that are strongly related to AD development. However, an effective pre-processing and quality control pipeline could be the decisive step for reducing the complexity and variety of SNPs data, and leading to a noticeable improvement in network performance.
In addition, when the results of ML based approaches were compared in terms of genetic variants data and neuroimaging data, shown in Table 1 and Fig 7, it was noticed that they were relatively close with an average AUC of 0.82. Nevertheless, the results were largely different in DL based approaches, shown in Table 2 and Fig 8, in which methods using neuroimaging data achieved an average ACC of 93.74, while the others using SNPs data achieved an average AUC of 0.67. Data type AUC [51] sMRI 0.8722 [52] sMRI 0.861 [53] sMRI 0.76 [57] SNPs (482) 0.842 [16] SNPs (2500) 0.719 [58] SNPs (11) 0.8949 Data type *ACC/AUC [41] MRI 0.993 [59] sMRI 0.9252 [60] amyloid PET 0.908 [61] amyloid PET + MRI 0.9234 [63] SNPs (200) 0.62 [17] SNPs (20 & 50) 0.68 [18] SNPs (41) 0.6807 [19] SNPs (20) 0.689 *Note: images data results were measured by ACC journal.ump.edu.my/ijsecs ◄  Furthermore, when the results of ML and DL based approaches were compared in terms of genetic variants data and neuroimaging data, shown Table 3, it was found that DL based approaches had achieved better performance in Neuroimaging data compared to ML based approaches, while they were relatively poor with SNPs data. Additionally, Table 4 demonstrates strengths and limitations of ML and DL based approaches in terms of data type, and Table 5 and 6 demonstrate a summary of all approaches mentioned in the survey in terms of the algorithms used, dataset name, modality type, evaluation technique, and results. They provide a comparative analysis that helps formulate a knowledge about the latest accuracy level of AD prediction, and the type of modalities and algorithms related to that accuracy.  The technique achieves good accuracy, and its extracted risk factors could help doctors and patient's family to detect the diseases at early stages Although these data can be easily adapted in clinics, they are still limited and few research studies have explored them to predict AD. Therefore, the extracted risk features might be still indecisive for AD early detection especially that the brain and genetic variations are not considered Audio data Data acquisition is inexpensive, and non-invasive. The technique achieves very good accuracy, and its extracted risk factors could help doctors and patient's family to detect the diseases at early stages Although these data are non-invasive and easily collected, they are still limited and few research studies have explored them to predict AD. Therefore, the extracted risk features might be still indecisive for AD early detection especially that the brain and genetic variations are not considered Lastly, as genomics biomarkers form 70% of risk features, using them for predicting AD would be essential. So far, there have been many approaches using ML algorithms with genetic data, and most of them achieved good classification accuracy. Nevertheless, few approaches have used genetic data such as SNPs data with DL algorithms, and have poor accuracy. In fact, it was found that neuroimaging data and genetic variants were the most utilized modalities by DL technology. However, using genetic variants only to predict AD is still limited, and further research is needed.
can handle larger sets of genetic variants and automatically recognize related patterns without prior feature extraction SNPs to avoid overfitting and achieve high accuracy. However, most of the available datasets are highly complex and variance with a huge number of features and limited number of samples. Besides, very few studies have used this modality without merging it with other modalities and achieved a relatively poor accuracy. Therefore, further research is needed.
Images & genetic variants data A wide range of research studies have used this combination and achieved an excellent accuracy The approach is highly complex and computationally expensive, and data acquisition is difficult and expensive

CONCLUSION
This survey explored some of recent approaches employing machine learning and deep learning algorithms to early predict Alzheimer's disease (AD) and contribute to its therapeutic development. These approaches were categorized in terms of learning technique and data modality used. In addition, they were discussed from different aspects, and their strengths, limitations and outcomes were compared. In spite of the great diversity of these approaches, almost all of them have endeavoured to offer the best model that could efficiently employ the medical dataset and successfully diagnose the disease. Nevertheless, some types of biomarkers such as genetic biomarkers were largely variant and complex. Therefore, this kind of data could dictate the type of algorithms used and the complexity level of the proposed model. It was noticed that most deep learning (DL) based approaches using genetic variants data tended to merge them with other modalities to improve the prediction accuracy, and this combination increased their complexity. On the other side, the other DL based approaches that used only genetic variants data could not achieve higher accuracy. Improving the prediction accuracy for AD using deep learning techniques with genetic variants data is still challenging. In the near future, we will propose a journal.ump.edu.my/ijsecs ◄ deep learning model for predicting AD using genetic variants data. We will offer a pre-processing pipeline that seeks to reduce the complexity of these data and improve the prediction precision. This survey can be a coherent and informative reference for many researchers without a solid background in the latest AI technologies used for AD early diagnosis.