A Perspective: Use of Machine Learning Models to Predict the Risk of Multimorbidity

Machine Learning (ML) is a common Artificial Intelligence (AI) method. The use of ML offers the opportunity to develop better data mining techniques in order to analyse complex clinical interactions with a large number of variables. ML models should provide “real-time” clinical support reducing clinical risk to patients with model-agnostic interpretation to deduce a more specific clinical decision. Whilst ML algorithms have been used as the relatively “new kid on the block” in healthcare practice, they have shown promising results in predicting disease outcomes or risks in a variety of diseases such as depressive disorder, Type 2 diabetes mellitus, postoperative complications and cardiovascular diseases. However, patients suffering from a chronic condition are likely to have more than one condition requiring simultaneous attention and care. Therefore, a risk assessment model developed using ML methods, in theory, would be suitable to evaluate multimorbid populations. While there are many AI/ML algorithms and methods to build such a risk assessment tool, an optimal ‘fit-for-purpose’ model is chosen by comparing and contrasting across many possible alternatives. Further, given the high-stake decisions associated with health, it is also important that the model is interpretable and explainable by the clinicians who are purported to use such a model as their decision support system. In this paper, we provide a perspective on the current landscape of multimorbidity treatment, potential benefit of employing AI/ML to enhance holistic care of multimorbid patients, and associated challenges, concerns that need to be addressed as we make progress in this direction.

f) Vulnerable due to poor health, cognitive impairment, advancing age, and comorbidities such as depression or anxiety.
g) Limited health literacy.
Multimorbidity is on the rise in the recent few decades, and WHO attributes this to the increase in overall life expectancy. People are living longer, and those with long-term diseases are likely to suffer from multiple conditions than a single condition. The burden of multimorbidity varies drastically due to population growth, genetic susceptibilities and age. For example, young age, 18-39 years, were grouped in a cluster of high prevalent of anxiety and depression and older age were grouped in a cluster with polypharmacy, heart failure, PAD, osteoporosis, atrial fibrillation, CHD, CKD, stroke/TIA, and dementia as the most common co-occurring conditions [8]. This is further complicated given the number of people with or susceptible to conditions such as Human Immunodeficiency Virus/ Acquired Immunodeficiency Syndrome (HIV/AIDS), diabetes and cancer is growing. Gender is also a determinant for multimorbidity, with men being more prone to cardiovascular and metabolic diseases and women to psychogeriatric conditions. The WHO reports that the prevalence of multimorbidity is higher among disadvantaged populations attributing to health inequalities. A study done in Scotland by Barnett, et al. showed that those living in the most deprived areas are likely to develop multimorbid conditions around 10 to 15 years earlier than those in the least deprived areas [9]. With an increased childhood survival rate, 86% of the world's adolescent population resides in low-andmiddle-income countries [1]. So, there is a pressing need to have an efficient and effective health-care system that can handle holistic care of people from all sections of the society while ensuring that individualness with respect to age, gender, ethnicity, location, social status, etc., are leveraged to make the delivery of healthcare optimal instead of biased and prejudiced.

Multimorbidity Diagnoses and Treatment
Despite challenges with managing multimorbidity clinically, the research landscape over the past two decades demonstrates a stepwise increase in multimorbid studies. This could be due to multiple factors including increases in the ageing population and an awareness of the high prevalence of patients with multiple chronic conditions. The increased appreciation may also be due to the dissemination of information in regards to patient reported outcomes by way of advocacy groups as well as the promotion of awareness campaigns in social media. There is more patientreported data now demonstrating the presence of comorbidity and multimorbidity as well as longitudinal outcomes of chronic conditions [1]. However, there remains a knowledge and practice gap to understand the disease sequelae aetiology, causation and its' subsequent evidence synthesis. Current clinical guidelines to address these are "not fit for purpose". Interestingly, Vitry, et al. evaluated 17 Australian regulatory-approved guidelines and found half of them addressing treatments for patients with only one comorbid condition and only one of them addressing multimorbid conditions [10]. In the USA, Boyd and colleagues showed that following Clinical Practice Guidelines (CPGs) of individual conditions for a patient with multimorbid conditions may have undesirable effects [11]. In an attempt to summarize existing indices available to measure multimorbidity in a general sense, Stirland, et al. conducted a systematic review and found 35 original articles each introducing a new multimorbidity index with differing components and outcomes [12]. The review prompted the authors to call for clinicians and researchers to examine current indices for suitability before introducing new ones. Primary care may face more challenges than acute clinicians with these uncertainties and manage adherence to polypharmacy-based regimens. Limited availability of Randomised Clinical Trials (RCTs), small sample size and short-term trials without longitudinal data could further reduce the validity of most guidelines. Duncan, et al. developed a Multimorbidity Treatment Burden Questionnaire (MTBQ) which is a list of concise, simple-worded questions for health service researchers to measure the impact of their intervention on multimorbid patients [13]. A large RCT was conducted in England and Scotland by Salisbury, et al. to validate the use of the MTBQ questionnaire and examine the impact of a patient-centered intervention for patients with multimorbidity as compared to usual independent care for their individual conditions [14]. But the intervention showed no additional improvement in patients' healthrelated quality of life as measured in terms of mobility, self-care, pain, discomfort, anxiety and depression. However, patients who received the intervention experienced a higher quality of care. So, RCT data alone is equally insufficient to develop clinical guidelines as EBM requires treatments to have "real-world" applicability.

Machine Learning and Precision Medicine
Individuals differ in disease risk and treatment response based on multiple factors, including their internal biological characteristics. Precision medicine is a medical model that personalises clinical applications based on evidence generated by genomics and other analytical methods using data. It is poised as a disruptive method to promote individualisation of care [15]. An important facet of precision medicine is to use evidence-basedmedicine (EBM) principles so that the treatment, either preventive or therapeutic, is optimal. This enables clinicians to provide tailored interventions that are uniquely suitable to each particular individual, rather than follow a generic guideline developed for a disease. But this requires all data to be analysed and interpreted in an automated, precise and adaptive manner that would produce a tangible clinical outcome that is patient specific [16]. Misdiagnosis is not common, but still a possibility, with studies citing 10% to 15% of clinical diagnosis to be false-negatives [17]. Misdiagnosis rate varies across diseases, ranging from 2.2% for myocardial infarction to 62.1% for spinal abscess according to one study [18]. This study also reported that 53.9% of the patients who

576
had a misdiagnosis ended up being permanently disabled or died because of the error. Artificial Intelligence (AI) provides a potential solution, as an informatics technique to outperform any classical statistical analyses in handling large volume of healthcare data that is available in variety of forms [19]. Many Machine Learning (ML) techniques have been demonstrated to provide improved disease classification and risk prediction. If applied at scale with issues around feasibility, validity, fidelity, data security, interpretability, etc., being appropriately addressed, then clinicians around the world could use these tools as either a diagnostic, prognostic and/ or treatment aid by automating the identification process for the presence of morbid diseases. ML is an advanced program that uses large datasets to detect pattern inferences and correlations through iterative code processing or a system-based algorithm. Algorithms thus built and trained on large representative data should provide a validated prediction on any subsequent dataset with similar characteristics with a high accuracy that is less prone to human cognitive biases or errors.

Machine Learning and Risk Prediction
ML therefore could be a type of aggressive risk factor management method to improve outcomes. The promise of precision medicine in combination with ML has shown early success in some areas such as oncology and rheumatology [20,21]. Genetic profiling used to personalise chemotherapy regimens is based upon the patient's cancer mutative status thereby improving the outcomes in comparison to traditional regimen allocation [22]. In another study, Picciallie et al, compared multiple ML methods for early detection of Celiac Disease (CD) in those who show limited clinical symptoms but possess genetic predisposition [23]. Uddin, et al. urges the medical community to explore the possible application of AI for precision medicine in neurodevelopment disorders where there is lacking progress [24]. In a large study of 3.4 million patients in USA conducted by Lip et al., stroke risk prediction efficiency was compared across three assessment choices: a common clinical risk score, a clinical multimorbidity index and ML based algorithms [25]. The study demonstrated that ML-based gradient boosting/ neural network logistic regression formulation provided the best discriminant values among the three assessment tools. In another study by Lip, et al., a ML-based algorithm demonstrated improved discriminatory validity in the prediction of new onset Atrial Fibrillation (AF) amongst incident COVID-19 cases as compared to a statistical main effect model [26]. The study also indicated that the decision curve analysis was better when aided by the MLbased formulation than by the 'treat-all' strategy and the main effect model. Personalised therapeutic strategies combined with precision medicine efforts using AI could refine the diagnostic and predictive capabilities for any disease [27]. This could be enhanced in the presence of a number of clinical features as ML techniques are able to assess disease behavioural patterns, comorbidities and individual biologics. Majnarić, et al. proposes an interactive research framework using AI and advanced big data analytics to tackle multimorbidity [28]. ML models could be developed with the influx of "real-world" patient data residing within Electronic Healthcare Record (EHR) systems and research studies, both retrospective and prospective. This pool of information could become a meta-dataset to adopt and refine the predictive capabilities of the algorithms.
The development of risk assessment models may require the use of several ML algorithms including Classification and Regression Tree (CART), Logistic Regression (LR), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Random Forest (RF), Artificial Neural Networks (ANN), etc., to refine the complexities of the clinical variables. Hassaine and colleagues provide an overview of recent advances in use of various ML methods to understand evolving patterns of multimorbidity [29]. Model performance is assessed using one or more evaluation metrics like accuracy, sensitivity, specificity, Area Under the Receiver Operating Characteristic (AUROC) curve, area under the precision-recall curve, etc. Feature engineering techniques are used to select and transform input features such that model produces the best possible result. Explanation techniques like Shapley additive explanations method help evaluate the significance of input features in model prediction. This helps ensure that the importance of variables in the predicted output aligns with the expectation from subject matter experts of the domain. Performance models that plateau or is negative should be excluded to ensure that the most competent and robust model is used to determine insight into understanding the risk factors without a priori assumption of causality.
An interesting example where ML risk prediction models have demonstrated usefulness is Type 2 diabetes mellitus (T2DM), a common chronic disease stated to have a disease burden of approximately 350 million people by 2030 reported by the WHO. A systematic review conducted in 2011 by Collins, et al. identified 43 different ML prediction models across 39 published studies but concluded the review with an observation that there is a widespread use of poor methods that leads to questioning the reliability and usefulness of such models [30]. Fortunately, a few years down the line, there are reports of ML based frameworks that perform better than state-of-the-algorithms in predicting risk of developing T2DM [28][29][30]. Dalakleidi and colleagues demonstrated their model achieved best performance with ensembles of artificial neural networks whilst Zheng, et al. could achieve best accuracy measures from Decision Trees, Random Forest algorithm and Support Vector Machines [31,32]. Zhang and colleagues demonstrated a sample size of ~36k from a rural Chinese area and a Gradient Boosting Machine (GBM) based prediction model gave the best performance off the six different ML algorithms compared within the study [33].

ML Interpretability
ML interpretability is often cumbersome, and there are debates between data scientists and clinicians to assess their applicability within clinical practice and healthcare. Despite the widespread adoption of ML algorithms and tools in other industries and a significant pool of research papers on the use of ML in healthcare, meaningful application development has had various challenges 577 [34]. Therefore, it is important to explore and provide clarity for atypical specialists across healthcare, by demonstrating evidence of effectiveness and reliability of the ML applications.
Success of ML models is predicated on the quality of data used to build those models. Understanding the features of the data that are relevant to a model is equally important as it is these features that allow models to provide an accurate prediction. Raw data initially gathered to train an ML model are not necessarily used as it is. During the testing and validation phase, features could be added or deleted using data transformation techniques to improve the model performance. Feature test performance is another vital composite to ensure the model is able to provide evidence-baseddecisions (EBD). It is because of these pre-processing steps like data transformation and feature engineering that ML models are perceived as black-boxes, despite showing widespread adoption. For models built using Python programming language, for example, basic statistical modules like statmodels and scikit-learn also provides methods to measure feature importance. However, novel explanation techniques such as Local Interpretable Modelagnostic Explanations (LIME) [35], Shapely Additive Explanations (ShAP) [36], Quantitative Input Influence (QII) [37] etc., are being designed to provide easily interpretable insights about the models and describe the importance of observations which in the case of multimorbidity would be symptoms or even characteristics of comorbidities. Feature importance can be estimated globally, as relevant to the model in whole or locally, for each specific record. ShAP is different from the scikit-learn which assess the global feature importance as a specific feature may not be uniformal across all data points. This becomes a pivotal point for patients with Endometriosis that demonstrated chronic pain. Whilst this is a simple example of a specific use of a ML model composite, development of these methods could take significant time. Therefore, the developmental landscape of ML models would require multidisciplinary teams that could provide input for each composite of the diseases explored and simultaneously testing as well as validation using various methods including clinical trials.
Explanation techniques that describe feature importance are model-agnostic, which makes them flexible and allows developers to contrast and compare different models. Additionally, some AI/ ML models like linear regression, logistic regression, decision trees, Naive-Bayes, etc., have an easily interpretable structure that can be leveraged to provide additional insights about the model. Though it might tempt developers to use only interpretable models, they might lose out on predictive performance of other advanced models which are not easily interpretable. If it is possible to explain what an algorithm is doing and what aspects of the input data is it processing to arrive at the output, it will boost the confidence of those using such algorithms in clinical practice. However, not all AI models are easily interpretable. Furthermore, advance algorithms which are less interpretable are providing more accurate results. Also, it is common to combine several individual algorithms to create a more robust ensemble model. One such example is Gradient Boosting Machine (GBM) where the primary aim would be to train various classifiers such as the weak classifiers within the same training model and to combine these with stronger final classifiers. By way of a series of iterations, the classification results could be optimised, thus, GBM is less prone to being apt in comparison to most learning algorithms. Consequences of setting a trade-off between accuracy on sample data and interpretability of a model needs to be further evaluated. Rudin and colleagues proposed that ML scientists should be constrained to developing inherently interpretable models for high-stake problems like those in healthcare or criminal justice [38].
Equally, interpretability of ML systems is vital for clinical consultations to discuss the use of such methods to patients as part of any diagnosis and treatment. For clinicians, the use of ML algorithms would be to augment their EBD process. Multimorbidity conditions will involve multidisciplinary clinicians including those from image intensive disciplines. Whilst AI may not replace clinicians, there is a real need for clinicians to become well versed with the design and validation of the AI applications they would use to ensure they are able to understand its applicability to patients. Scott and colleagues have developed an interesting checklist to refine the use of AI applications in the clinical domain in the first instance before wide-scale rollout [39]. The checklist focuses on 10 specific questions, each standing for its own merit although further refinement would be required if it is to become useful in the multimorbidity context. Additionally, this questionnaire requires large-scale validation across various specialist areas with the possibility of requiring adaptations made under specific healthcare frameworks. Nonetheless, this type of open-source method is a significant and positive step to better integrate the use of ML models into clinical practice.

Machine Learning Disease Risk Prediction and Ethics
The AI and ethics often share a cumbersome dilemma. Higher transparency is a significant issue in terms of ML algorithm use where algorithmic decisions are used in healthcare. Thimbleby and colleagues advocated for open-source software associated with ML applications allowing reproducibility of programming methods developed [40]. However, publishing such details in itself could be grounds for further ethical issues whereby a developer's code becomes a subject of public scrutiny as well as marginalising the work that could otherwise lead to a patent, or an intellectual property associated with licensing. de Laat provides similar sentiments where he argues against full transparency with additional factors such as the divulgence of sensitive data to the public, opacity of algorithms where the interpretability is cumbersome even for experts and the misuse of this information that would introduce further issues such as accuracy to systems that utilise algorithms [41]. On the other hand, Ananny and colleagues argued that full algorithmic transparency would still be inadequate to address ethical dimensions associated with their modus operandi [42]. However, a social understanding on the development and their interface with a human being could be a facet that could

578
provide the necessary transparency and understanding to the non-AI scientific community. Raji and colleagues suggested an algorithm auditing process that the developers could use to address some of the ethical issues [43]. But a similar approach is already available for example in the UK where national ethics committees would independently review the algorithm whilst obtaining advice from the MHRA for those algorithms that are within the medical devices category [44]. The MHRA also requires those algorithms that fall within the medical devices category to be UK CA marked which is similar to the non-conformity assessment process used to obtain CE marking. Additionally, the UK has a vigorous data code of practice implemented to protect the use of patient data to develop AI applications which includes additional guidelines to develop ethical algorithms [45]. The authors of this paper also argue by including a multidisciplinary team when developing AI applications, ethical complications associated with patients could be addressed early on in addition to ensuring algorithms are tested using clinical trials.
To use algorithms on electronic healthcare records-based data only would not demonstrate confidence to patients or clinicians around the interpretability, relevance or even addressing of ethical issues.

Conclusion
With growing research in the field of precision medicine, promising results in the adoption of AI/ML models in healthcare in general and precision medicine in particular, there is a unique window of opportunity to enhance healthcare and enable clinicians to provide holistic, personalized treatments to patients with multimorbidity. Such a system, if implemented at scale, could reduce the burden on clinicians while improving the quality of care for patients, particularly those suffering from multimorbid conditions. This noble vision of using AI is no short of hurdles including feasibility. Testing of AI algorithms and tools in clinical trials as well as the involvement of patients, clinicians and scientists would ensure the applications developed are ethically, clinically and scientifically sound. Hence, it demands all the more reason for collaborators from medical community and AI community to work together from the initial developmental phase to ensure a smooth adoption and avoid costly mistakes.