Introduction: The world has witnessed a surge of COVID-19 cases since the first case was reported in 2018 December, and despite the large number of cases seen across the world, there are still many gaps in the understanding of the course of the disease in different people. Several scoring systems and early warning signs have been developed to prognosticate the disease process. Clustering the patients into specific clinical phenotypes is one such strategy. In this study, we have clustered the COVID-19 patients using different variables into phenotypes and studied the outcome based on this classification. Aim and Objectives: To derive clinical phenotypes based on demographic, clinical, and laboratory data of COVID-19 patients and look at the efficiency of the phenotypes as a model for predicting course of disease. Materials and Methods: A retrospective cohort study on COVID-19 patients admitted to a tertiary care hospital in South India between July 2020 and October 2020 was conducted. Nine hundred and eighty-seven subjects fulfilling the inclusion criteria were enrolled. Results: Three clinical phenotypes were derived using 43 independent variables which included epidemiological, symptomatology, comorbidities, and laboratory values. Of the 987 patients studied, patients could be clustered into three phenotypes named A, B, and C. There were 379 patients in phenotype A, 313 in phenotype B, and 295 were in phenotype C. Males predominated in phenotypes C and B, which was 218 patients (73.9%) and 204 (65.2%), respectively. Mild disease was predominant in phenotype A (89.2%) patients, followed by10.3% of moderate disease and 0.5% of severe COVID disease. In phenotype B, 93.3% of patients had mild disease and the rest 21 (61.7%) had moderate disease. In phenotype C, 177 (60%) patients had severe COVID disease. Mortality was seen in phenotype C (23.1%). Conclusions: It can be inferred that among the phenotypes, the hyperinflammatory group was phenotype C. The independent predictive association of each variable such as age, male gender, and comorbidity is an important factor in determining the outcome but, because of the varied distribution of the multiple variables in each patient, it is not possible to consider each of these values independently and deduce the outcome, hence phenotypes which cluster the patients based on all these variables are associated with predictable outcomes The phenotypes thus can be implicated as a tool to aid in clinical management of COVID-19.
Keywords: Clinical phenotypes, COVID-19, severe COVID illness
|How to cite this URL:|
Krishnamurthy V, Suhail K M, Raj MP, Basu E, Aslam S S, Kumar S. Study of clinical phenotypes and its outcomes in patients of COVID-19 in a tertiary care hospital. APIK J Int Med [Epub ahead of print] [cited 2022 Oct 6]. Available from: https://www.ajim.in/preprintarticle.asp?id=347194
| Introduction|| |
COVID-19 presents with a myriad of clinical features, and despite the large number of cases seen across the world, each solution in its management brings out an equal number of challenges. There are still large gaps in the understanding of the course of the disease and the way it manifests in different people. We predict whether the patient improves or deteriorates using specific variables. Though at large it has been helpful in managing patients, the unpredictability with using the specific variables continue.. Several scoring systems, early warning signs, and predictive modeling have been developed to solve this problem. Clustering the patients into specific clinical phenotypes is one such strategy to predict the outcomes. There are limited studies which are currently in the literature on this subject, especially from our country. The main objective of this study was to derive clinical phenotypes by clustering the patents using the data available at admission on demographic, clinical, and laboratory values of COVID-19 patients and look at the efficiency of the phenotypes as a model for predicting course of disease.
| Materials and Methods|| |
This was a retrospective cohort study on COVID-19-positive patients. A total of 987 subjects were included in this study. Patients admitted to a tertiary care medical college hospital, diagnosed as COVID-19 (reverse transcription-polymerase chain reaction test positive for swab), and a minimum of 18 years and above were included as cases in this study. The study cohort was extracted after a systematic random sampling from the list of patients fulfilling the inclusion criteria admitted to medical college hospital between July 2020 and October 2020.
Patients' data were retrieved from the medical records department of the hospital. The basic demographic details such as name, age, sex, and address were noted. Later, the symptomatology, SpO2 (oxygen saturation) at admission, and the note of existing comorbidities were made. Further, the levels of various markers were looked into. This included hematological markers (complete blood count, neutrophil-lymphocyte ratio [NLR], inflammatory markers (lactate dehydrogenase [LDH], C-reactive protein [CRP], D-dimer, and ferritin), LFT (serum albumin, aspartate transaminase [AST], and alanine transaminase), and RFT (serum creatinine, serum uric acid, and BUN). The outcomes were noted based on the severity of the disease as per MOHFW guidelines, and mortality among the cases was recorded. The extracted data were tabulated in a MS Excel worksheet which was analyzed using SPSS software (SPSS 2009. PASW statistics for windows version, 18.0. Chicago, Illinois, USA).
This was a substudy, and the cohort was selected from the larger study on clinical, laboratory profile and outcomes of COVID-19 patients at this tertiary care center and for which institutional ethics committee approval was obtained.
Sample size with justification
A study done by Gutiérrez-Gutiérrez et al. demonstrated the usefulness and reproducibility of clinical phenotypes. Based on the findings of the above study, in order to exhibit statistical significance, with 95% confidence and 2.78% precision, it was estimated that a minimum of 951 subjects must be recruited for the study.
It was estimated that a minimum of 951 subjects have to be recruited for the study.
All the quantitative variables such as age and laboratory values were expressed as descriptive statistics such as mean and median with interquartile range. All the categorical variables such as symptoms and various comorbidities were expressed in terms of percentage.
In this study, we analyzed all the available data on epidemiological, symptomatology, comorbidities, basic examination findings, and laboratory values that were collected at the time of admission.
A dissimilarity matrix was constructed for continuous variables using Pearson correlation. The same was done by using the Chi-square test for the categorical variables. Based on the dissimilarity matrix, variables with high degree of correlation were excluded. Then, a total of 43 variables consisting of demographic details, symptoms, comorbidities, and laboratory values were included to create the clusters.
Next, a two-step cluster analysis using these variables was done. An optimum number of clusters needed were derived. From the analysis, it was found that the optimum number of clusters was 3. These three clusters represent the three clinical phenotypes in this cohort. Further, to assess the quality of phenotype derivation, silhouette analysis was done.
Then, the data obtained in the three phenotypes were compared. The categorical variables were compared using tests for significance, and the Student t-test was employed to compare the continuous variables.
| Results|| |
In the cohort of 987 cases, 659 (66%) of the patients were <60 years of age and the rest 328 (33.3%) were aged above 60 years. Six hundred and fifty-three (64.4%) were males and 351 (35.6%) were females [Table 1].
Among the cases, 625 (63.3%) had mild disease, 188 (19.1%) had moderate, and 174 (17.6%) had severe disease [Table 1]. Nine hundred and nineteen patients (93.3%) recovered from COVID-19 disease; 68 patients (6.7%) died in hospital.
After assigning the patients into the different phenotypes as explained earlier, there were a total of 379 patients in phenotype A, 313 in phenotype B, and 295 were in phenotype C.
Phenotype A predominantly consists of younger population (91.5%) with sex distribution of (M: F – 54.8%:45.2%). Most cases were of mild severity (89.2%). The recovery rate among phenotype A was 100%. With regard to oxygen saturation and most other lab parameters, the mean values were within normal limits in phenotype A. Mean SpO2 levels were 96.88%. However, increased levels of D-dimer (1.3 ng/mL), CRP (3 mg/dL), and IL-6 (59.65 pg/mL) and mildly elevated NLR (3.18) were observed. Although the mean values of LDH (269.72) and ferritin (233.14) were within normal limits, the values were on the higher side or normal. None of the patients among phenotype A had respiratory, endocrine, cardiovascular, renal, or neurological comorbidities [Table 2].
Phenotype B had an equal distribution among age groups (age <60:age >60: 52.7%:47.3%) but had almost double the males than females (65.1%:34.9%). Most cases were of mild severity (93.3%) and had a recovery rate of 100%. With regard to SpO2, the mean values were within normal limits (97.44%). Mild elevation of mean levels of LDH (285.43 IU/L) and ferritin (276.31 ng/mL) was seen in patients among phenotype B. However, increased levels of D-dimer (1.31 ng/mL), CRP (3.95 mg/dL), and IL-6 (59.29 pg/mL) were observed. Serum levels of creatinine (1.1 mg/dL) were on the higher side of normal. The majority of the patients in phenotype B had known endocrine (225 patients [71.9%]) or cardiovascular comorbidities (178 patients [56.9%]) [Table 2].
Phenotype C had comparable age distribution (age <60–age >60: 49.8%–51.2%) and was predominantly of males (73.8%). Sixty percent of the patients under phenotype C had severe COVID disease. The mortality rate was 23.1% in phenotype C in contrast to 0% mortality rates among phenotypes A and B. Mean SpO2 levels were approximately 86%. High NLR (10.6) and low eosinophil count (0.54%) were also observed among phenotype C. Mean levels of platelet count (2.38 L cells/cu mm) though not elevated beyond normal range were on the higher side. Positive acute phase reactants, namely CRP (12.77 mg/dL), interleukin (IL)-6 (60.73 pg/mL), LDH (446.29 IU/L), and serum ferritin (451.74 ng/mL), were elevated. Serum albumin (3.57 g/dL) levels were on the lower limit of normal. With respect to comorbidities, the majority of the patients in phenotype C had endocrine (168 patients [56.9%]) and cardiovascular (120 patients [40.7%]) comorbidities. The next most common comorbid condition was chronic kidney disease (29 patients [9.8%]) [Table 2].
The variables of the three phenotypes were compared.
Distribution of age pattern specifically geriatric (>60 years) versus younger (<60 years) was looked into in different phenotypes. Three hundred and forty-seven (91.5%) in phenotype A, 165 (52.7%) in phenotype B, 147 (49.8%) in phenotype C were in the age group <60 years and the rest were in the age group of more than 60 years. On comparing the age distribution, it is statistically significant (P ≤ 0.005) [Table 2].
Males predominated in phenotypes C and B, and these data regarding the gender distribution among phenotypes were found to be statistically significant (P ≤ 0.005) [Table 2].
While a large proportion of mild COVID cases were seen in phenotypes A (338 [89.2%)] patients) and B (292 [93.3%] patients), the majority of the patients in phenotype C had severe COVID (177 [60%] patients) disease. No patients in phenotype B had severe COVID illness. This was found to be statistically significant (P < 0.005) [Table 2].
The mortality rates among the phenotypes were compared, and this was found to be statistically significant (P ≤ 0.005).
Multiple blood parameters and oxygen saturation were compared among the phenotypes. Moreover, the results are as follows. Mean SpO2 was highest in phenotype B and lowest in phenotype C. Both WBC and neutrophils were highest in phenotype C and lowest in phenotype B. Lymphocytes (%), platelets, eosinophils (%), and serum albumin were highest in phenotype A and lowest in phenotype C. N/L ratio, LDH, ferritin, CRP, AST, and serum creatinine were lowest in phenotype A and highest in phenotype C. Monocytes (%) were highest in phenotype B and lowest in phenotype C. D-dimers and IL-6 were highest in phenotype C and lowest in phenotype B. Of the above data, all laboratory parameters except monocytes (%), platelets, and serum albumin were found to be statistically significant (P < 0.005) [Table 2].
Comorbidities among the phenotypes were compared and tested for statistical significance. The prevalence of endocrine conditions, cardiovascular comorbidities, and chronic kidney disease was found to be significant with a P < 0.005 [Table 2].
| Discussion|| |
We have several important parameters presently which guide us in stratification of patients in the triage into three groups, i.e., mild, moderate, and severe as per the MOHFW guidelines; among these, the most important variable considered is the oxygen saturation. In spite of this classification of patients coming to hospital, it is often found that there are differences in the disease course in the hospital as well as final outcomes and predicting them is difficult. This was attributed to the large number of independent variables in each of these groups (from demographic to clinical to laboratory variables) which strictly would not be reflecting the same stage of disease.
Clinical phenotyping and clustering the patients to particular group is a further refinement of identifying the set of patients who can have severe outcomes. “Clinical Phenotype” is “a single or combination of disease attributes (phenotypic traits) which describe differences between individuals with the disease of interest as they relate to clinically meaningful outcomes.”
It is becoming increasingly popular to identify patient cohorts by trait for clinical research. Although patient data obtained at triage in hospitals are often incomplete, they convey the minimum required information for creating clinically important groups or clinical phenotypes that can help in management.
In this study, our cases were divided into three clinical phenotypes modeled along the study done by Gutiérrez-Gutiérrez et al. They identified three distinct phenotypes using multiple variables (demographic, clinical, and laboratory variables) by developing a simplified probabilistic model and assigned patients into them. We identified three phenotypes based on demographic, clinical, and laboratory variables recorded at admission and stratification to different phenotypes were done as explained earlier. We studied 987 cases, and among this, there were 379,313,295 patients in phenotypes A, B, and C, respectively.
In phenotype C, nearly 60% were patients categorized as severe and mortality was seen in 23.1%. In phenotypes A and B, 10.3% and 6.7% were classified as severe, respectively, with no mortality recorded in both the groups. It clearly showed that phenotypes A and B had favorable outcomes compared to C which was comparable to the study done by Gutiérrez-Gutiérrez et al. It can be inferred that among the phenotypes, the hyperinflammatory group was phenotype C. These results were consistent with the Gutiérrez-Gutiérrez et al. study.
The phenotyping provided an immaculate model for understanding the severity of the disease and thus predicting the outcome of COVID patients.
The mean elevation in values of certain inflammatory markers was significant in phenotype C. Statistically, when tested, these were in adherence to the international standards of classification of deviation from normal. Furthermore, the ratio of neutrophils to lymphocytes, an established marker of severity of the COVID-19 disease, was significantly elevated in the severe group among phenotype absolute value. When mean values were looked at, there was an only mild elevation in phenotypes A and B while there was a significant elevation in phenotype C.
Interestingly, though 47.3% of the subjects under phenotype B were elderly patients, only 21% of the cases were moderate and rest 79% were mild. There were zero severe cases or mortality, implicating a favorable outcome.
The same in contrast with phenotype C which had almost the same elderly population in the group (50.2%) had more patients in severe category (60%) and high mortality (23.1%). A similar thing was noticed in sex distribution; there were almost same percentage of males (65.3% in B vs. 73.9% in C) in these two groups. Yet, the outcomes were in contrast with each other. A study by Ye et al. identified three distinct clusters of the COVID-19 patients. It also showed that age alone could not be used to assess a patient's condition and their immune function is more important.
In a retrospective analysis of patients with COVID-19, by Lusczek et al., three clinical phenotypes were identified reflecting adverse, moderate, and favorable outcomes. Patients from each phenotype presented with different comorbidities and developed different complications. In this regard, our study had a quintessential finding (regarding comorbidities among the various phenotypes). In phenotype B (a group which had favorable outcome), nearly 72% had endocrine comorbidities; in contrast, phenotype C, the poor outcome group, had 57% of endocrine abnormalities. Nevertheless, phenotype C had significantly more chronic kidney disease patients compared to A or B.
A study done by Palmieri et al. established a relation between presence of a comorbidity and mortality among COVID patients in Italy regardless of the age, whether aged older than or <65 years of age. This, however, was in contrast to the clinical phenotypes where age was a significant factor in determining the outcome of COVID-19 patients. However, a study done by Raparelli et al. showed a significant relation between sex of COVID patients and clinical outcomes which was consistent with our study.
The phenotypes might reflect different profiles of pathogen and host interactions, which may be the result of differences in the viral load as well as individual response., These observations emphasize the point that even though the independent predictive association of each variable such as age, male gender, and comorbidity is an important factor in determining the outcome, because of the varied distribution of these and other multiple variables in each patient, it is not possible to consider every value and deduce the outcome. Hence, phenotypes which cluster the patients based on all these variables are more consistently associated with predictable outcomes. It also reflects the host's innate as well as humoral response coupled by his genetic susceptibility.,
One of the limitations of our study was in the variable of symptomatology; it was a one-time collection of various symptoms as remembered by the patient and not a time series as done in few studies which can lead to bias.
It may be beneficial to the treating team in the future to segregate the patients into phenotypes to have a much clearer idea regarding the prognosis, course, and outcomes of the patients. This would also help in modifying treatment strategies as well, such as an early aggressive therapy and immunomodulators in the hyperinflammatory groups, which in turn could alter the course and help in improving the outcomes.
| Conclusions|| |
The derivation of clinical phenotypes among COVID-19 patients has proved to be a simple and effective model to predict the course of disease and outcomes. The phenotypes can be implicated as a tool to aid in clinical management of COVID-19. Seemingly asymptomatic or patients with mild disease may require vigilant monitoring while patients with older age or certain comorbidities who supposedly are at high risk may not always have an increased risk of mortality. Classifying into phenotypes thus helps in terms of better understanding of the disease and its clinical management.
The authors are grateful to Late Prof. Dr. NS Murthy, Ex-emeritus Medical Scientist (ICMR), Research Director Division of Research, and patents MS Ramaiah Medical College for all his support in this research work.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Gutiérrez-Gutiérrez, B, del Toro MD, Borobia AB, Carcas A, Jarrín I, Yllescas M, et al
. Identification and validation of clinical phenotypes with prognostic implications in patients admitted to hospital with COVID-19: A multicentre cohort study. Lancet Infect Dis 2021;21:783-92.
Agusti A. The path to personalised medicine in COPD. Thorax 2014;69:857-64.
Horn PS, Pesce AJ. Reference intervals: An update. Clin Chim Acta 2003;334:5-23.
Ye W, Lu W, Tang Y, Chen G, Li X, Ji C, et al
. Identification of COVID-19 clinical phenotypes by principal component analysis-based cluster analysis. Front Med 2020;7:570614.
Lusczek ER, Ingraham NE, Karam BS, Proper J, Siegel L, Helgeson ES, et al.
Characterizing COVID-19 clinical phenotypes and associated comorbidities and complication profiles. PLoS One 2021;16:e0248956.
Palmieri L, Vanacore N, Donfrancesco C, Lo Noce C, Canevelli M, Punzo O, et al.
Clinical characteristics of hospitalized individuals dying with COVID-19 by age group in Italy. J Gerontol A Biol Sci Med Sci 2020;75:1796-800.
Raparelli V, Palmieri L, Canevelli M, Pricci F, Unim B, Lo Noce C, et al
. Sex differences in clinical phenotype and transitions of care among individuals dying of COVID-19 in Italy. Biol Sex Differ 2020;11:57.
Magleby R, Westblade LF, Trzebucki A, Simon MS, Rajan M, Park J, et al.
Impact of severe acute respiratory syndrome coronavirus 2 viral load on risk of intubation and mortality among hospitalized patients with coronavirus disease 2019. Clin Infect Dis 2021;73:e4197-205.
Mateus J, Grifoni A, Tarke A, Sidney J, Ramirez SI, Dan JM, et al.
Selective and cross-reactive SARS-CoV-2 T cell epitopes in unexposed humans. Science 2020;370:89-94.
Severe Covid-19 GWAS Group; Ellinghaus D, Degenhardt F, Bujanda L, Buti M, Albillos A, et al.
Genomewide Association Study of Severe COVID-19 with respiratory failure. N Engl J Med 2020;383:1522-34.
No 45/2, Evantha Cinnamon, 3rd Main Vyalikaval, Bengaluru - 560 003, Karnataka
Source of Support: None, Conflict of Interest: None
[Table 1], [Table 2]