Age Structured Mixture Model for Early COVID-19 Spread: A Zimbabwean Risk Factor Analysis

Unique severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2/COVID-19) prevention measures to distinct age, geographical and community groupings can only be effectively and efficiently implemented with a clear understanding on dynamics of the disease. Dynamics include disease spread, different risk factors and their level of influence and individual attributes that aid the spread. The paper aims at determining the major COVID-19 spread risk factors in Zimbabwe by identifying individual, age and community groupings, their risk levels given the complex heterogeneous population. COVID-19 data for 37 individuals as provided by the Ministry of Health and Child Care (MoHCC) for the period from 20 March 14 May 2020 is used. Generalised Mixture Models were implemented to achieve the objectives. Results show that gender, age, mode of infection and history of travel were the main predictors of COVID-19 spread in Zimbabwe. However, their effects were distributed differently across two clusters. Children (0-14) years, females and those with imported infections were among high level risk spread groups. Whilst low risk groups consist non travelers, males and those infected by local transmission. We thus recommend that the Zimbabwean government need to prioritise children, females, and non-travelers when implementing prevention measures.


INTRODUCTION
As Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) continues to be a problem world over, it is paramount to understand the major determinants of the spread of the virus in Zimbabwe. To date Corona virus disease 2019 (COVID-19) remains with no cure or vaccine. The COVID-19 pandemic has resulted in so many fears, myths, misrepresentations and misconceptions [1,2]. Different governments and authorities have been putting different measures to help limit the spread of the disease [3,4]. However, enough knowledge needs to be obtained for the different measures to be effective. Adequate effective measures will require capturing the complexity of COVID-19 at many levels including individual level attributes, community level attributes together with their interactions as risk factors based on unique economies, and environments [5].
There have been different schools and myths on the dynamics of COVID-19 in children, their risk of spreading the disease and how to keep them healthy in COVID-19 times [5,6]. We intend to assess the risk of spreading COVID-19 by children as well as all other different age categories in Zimbabwe. Although COVID-19 deterministic age structured models have been developed in several countries in order to inform on the implementation of isolation strategies per age group [7,8], same control measures in different countries have resulted in different courses of the disease dynamics and contrasting impacts as alluded by [9]. Understanding the age structure risks peculiar to Zimbabwe is therefore of paramount importance to enhance the Zimbabwe COVID-19 prevention strategies. We divided the age variable into five (5) different categories to capture the heterogeneous age structures in Zimbabwe. These categories are the same as those used by the Ministry of Health and Child Care (MoHCC) in Zimbabwe. This is done to achieve the objective and determine associated risk for each age category. Zimbabwe is divided into different regions which offer different lifestyles, varying hygienic standards and health facilities. These regions include rural, urban, peri urban, farms, mines etc. Therefore, efficient and effective measures for Zimbabwe can only be implemented if there is an understanding of the COVID-19 dynamics and risks in these different regions.
It is our aim therefore to demystify the fears and misconception in Zimbabwe by determining the major predictors of COVID-19 by age and region using the data availed by the Ministry of health and Child Care daily reports on www.mohcc.gov.zw/ and their corresponding social media platforms (https://twitter.com/MoHCCZim). Our objectives are to • Determine COVID-19 major predictors in Zimbabwe,

OPEN ACCESS
• Identify the different risk group of COVID-19, and hence identify with higher risk and lower risk heterogeneous populations in Zimbabwe.

Data
The work considered thirty-seven (37) individual data profiles as provided by the MoHCC in Zimbabwe recorded in the period from 20 March to 14 May 2020. Individual profiles included eleven (11) variables as given by Table 1. The number of active cases per day of diagnosis was used as the dependent variable to access how it is affected by the predictor/risk variables. COVID-19 associated risks cannot be treated as a one blanket suit all scenario. To assess the effects on children, the age variable was further categorised using five (5) age layers provided by MoHCC in Zimbabwe as shown in Table 1.
Although all the 10 provinces in Zimbabwe were considered for capturing the number of tests conducted, only provinces with active cases will be used as we are using number of active cases per diagnosis date. The total number of tests per day considered consist of combined Polymerase Chain Reaction (PCR) and Rapid Results Test done per day. History of travel captures whether an active individual once travelled outside Zimbabwe since December 2019 or not, whilst mode of transmission has three categories: either imported due to travel, locally transmitted as contact case or unknown if there is no evidence of the first two scenarios. Location refers to the residential region someone resides in and the province thereof.

Summary by age and province
The study group had more of middle-aged people in categories 2 and 3, few elderly people and only 2 children (0-14 years). The median number of cases for the children was 5, whilst for the over 60 years was 3. Active cases in Zimbabwe are distributed within 5 provinces for the referred period. Of the 5 provinces, Harare has the highest number of cases standing at 14, followed by Bulawayo (12), Mashonaland East (6), Mashonaland West (4) and lastly Matabeleland North recording only a single case. The highest median number of cases was 3 in Bulawayo and Mashonaland cases with Mashonaland east having the median number of 1. The descriptive statistics on the number of COVID-19 active cases in Zimbabwe by location and Age category is shown in Table 2.

Statistical Analysis Models
In this study, risk associated with increasing the number of active cases (spread) by each of the predictor variables on the COVID-19 is implemented via Generalized linear Mixture models (GLM Mixture) models. We intend to explore the effects of Age and location differently to come up with a holistic understanding of the age structure and location on the spread in COVID-19 in Zimbabwe. GLM Mixture models allows us to measure risk factors by considering heterogeneous risk groups comprising of similar individual attributes as in [10]. The groupings will also enable us to infer into level of risk (high, medium or low) based on their individual composition. All the statistical analysis was done in R version 3.6.9 at 5% level of significance using flexmix packages [11] for the GLM Mixture models.

GLM mixture model
A GLM Mixture Regression model is used in order to identify the risk groups, individual level risk, community/location level risk and age level risk effects of each predictor and level of risk associated. This is due to its high ability to capture heterogenous attributes without having to give a lot of sometimes unrealistic assumptions on the data. Mixture models also works better on diseases with complex diagnosis and such that we have where represent the response variable in this case, number of active COVID-19 cases diagnosed per day with conditional mixture density ℎ, is a vector of risk predictor variables, is the prior probability of an individual being in component , whilst represent the component specific parameter vector with a density distribution f and finally = ( 1 , … … , , 1 , … … , ) is a vector containing all parameters. The parameter estimates will be done using the Expectation Maximisation (EM) algorithm.

ANALYSIS AND RESULTS
The number of active cases by provincial location and age categories show that indeed there are differences on how age groups are related to the number of counts per diagnosis (Figure 1). Figure 1A shows that more females were infected in Bulawayo than any other province whilst more males were infected in all the other provinces with the highest being in Harare. Figure 1B shows that more males were infected across all the age categories except category 1 that consists of children (0-14 years) were only females were infected.

GLM Mixture Model Results
The GLM mixture model enables us to not only identify the main COVID-19 predictors but to capture the complexity of the individual and group level heterogeneous characteristics. In this case individual level characteristics across age groups and community level of risk of spread could be identified. A two-component risk model based on individual characteristics was identified using both Poisson and Gaussian link function. Results showed that a Gaussian Mixture model with two components was more appropriate due to its low AIC value.  Table 3 shows that the total number of tests does not affect the spread of COVID-19, so we removed the factors from the analysis. Parameter estimates for component 1 in Table 3 showed the following: 1) children were significantly more likely to spread COVID-19 by 2.13, significantly more likely to spread COVID-19 than those whose mode of infection is unknown, 5) those infected via local transmission were 0.20 times less likely to spread the disease compared to those whose mode of infection was unknown although the difference is insignificant, 6) Bulawayo residents in component 1 were more likely    = . , = . , = , ratio = 0.  Individuals from component 2 showed the following: 1) Although children were more likely to spread the disease than 15-29, 30-44, and over 60-year categories, only the 60-year category was 1.99 times significant. Children were however 0.27 times less likely to spread COVID-19 than the 45-59 year category although this was insignificant, 2) Males in component 2 were 1.17 times significantly more likely to increase the number of active cases than females, 3) Travelers were still 2.18 less likely to spread the disease than nontravelers, 4) Those who had imported infections were and infected locally were 2.70 and 1.01 significantly more likely to spread COVID-19 than those with unknown transmission, 5) Bulawayo residents were 3.25, 4.93, 4.15 and 4.67 more likely to increase the COVID-19 cases than Harare, Mashonaland East, Mashonaland West and Matabeleland North provinces respectively. We can therefore conclude that the level of risk in component 2 is a bit lower than component 1 based on the magnitude of the parameter values. Secondly, inferring into individual attributes for those in component 2, we observed that high potential risk to spread COVID-19 consists of: children and elderly, males, nontravelers, those who had imported infections and those who got infected through local transmission. These were main characteristics for Bulawayo residents.

-----------------------------------------------------------------
The confidence interval for model parameter for both components are shown by Figure 2B. Whilst age differences mainly characterize individuals in component 1, location differences mainly characterize individuals in component 2. We observed that component 1 parameter estimates were more positive than those in component 2 thus we can conclude that component 1 individuals pose more risk to spread COVID-19 than those in component 2. Although in general children and non-travelers are more likely to spread the disease in both components, the high-risk cluster was uniquely associated with children, females and those with imported infections. We can overall associate high risk with Harare, Mashonaland West and Matabeleland North residents. Bulawayo and Mashonaland East residents can be categorized under the low risk cluster. This is a clear indication that in Zimbabwe, effective measures may have to give priority to children, gender and make sure that non travelers are protected from the spread. Low risk cluster, however, characterized by mainly Bulawayo and Mashonaland East residents where spread was mostly likely to be from males, elderly and those with either with imported or locally transmitted infections. The distribution of age groups by either high risk or low risk group is shown by Figure 3A whilst distribution by province is given by Figure 3B. It is evident from Figure 3A that children (represented by 1) are in cluster 2 which is the high risk cluster and has only imported cases mainly in Mashonaland West and Matabeleland provinces as indicated by Figure 3B. Low risk which is characterized by local transmissions consist mainly of Bulawayo and Mashonaland East residents.

DISCUSSION
A GLM Gaussian Mixture model was fitted to Zimbabwean COVID-19 data for the period from 20 March-14 May 2020 as provided by the Ministry of Health and Child Care in Zimbabwe. The primary goal was to determine the major risk factors associated with the spread of COVID-19 given the heterogeneous age structure and locations found in Zimbabwe. The model was fitted to 37 individual data and the following 10 variables were considered: number of cases per day, total number of tests conducted per day, gender, age, history of travel, mode of infection, location, province and date of diagnosis. A mixture model was preferred due to its flexibility in handling complex heterogeneous problems. This model enabled us to identify different risk groups and their associated levels of risk. Age structure models were considered so that preventative measures will be better implemented on more risk age groups and province/locations Results from the Gaussian mixture model classified individuals into two (2) groups based on their individual characteristics and hence risk levels of spreading the disease. We termed cluster 1 high risk cluster and cluster 2, the low-risk cluster, respectively. Whilst the major risk factors remain the same (gender, history of travel, mode of infection, province and age category) across clusters, their risk contribution was distributed differently depending on whether an individual is in the high risk or low risk cluster. Overall, the model showed the risk group predictors as being a female, a non-traveler, child, local infections and imported cases. The probability of getting into a high-risk cluster was (0.628), a much higher than the low risk cluster (0.372) an indication that in Zimbabwe COVID-19 is 0.256 more likely to be spread than controlled. Considering cluster 1 attributes we noticed that all age categories were likely to spread the disease although children a much higher potential to spread COVID-19 than any other age group. Females and those with imported infected were also among the highrisk groups in cluster 1 an observation with most Harare residents (which had the highest number of active cases), Mashonaland West and Matabeleland North provinces. We observed as alluded by [5] a mixture model distinguishes between individual level, community level, and group level risks associated by each individual in the spread of COVID-19. Whilst it is generally believed that men are at more risk for worse outcomes and deaths given the same prevalence with women [12], our results showed that in Zimbabwe women tend to spread COVID-19 more than men even when age is also being considered as a very important factor in the spread. It is evident therefore that in Zimbabwe children (0-14 years), those with imported infections and females have a higher risk of spreading COVID-19 disease. Based on our findings, we can therefore conclude the age structure population is important in understanding COVID-19 dynamics as alluded by [13,14]. In Zimbabwe, major prevention measures on the spread should also target children and females and imported infections management. It is also interesting to note that in Zimbabwe, the major risk of COVID-19 spread is by those infected outside the country are concentrated in the capital city Harare.
Thus, government may need to either keep the borders closed to avoid imported infection and in cases where it is unavoidable, travelers entering Zimbabwe must be severely quarantined and monitored. Measures also need to be implemented targeting different gender groups as the model predicted that in Zimbabwe, females are more likely to spread COVID-19 than their male counterparts. The low risk cluster however, consisted mainly of individuals from Bulawayo and Mashonaland East, males, non-travelers and those who had local transmissions. Again, measures that minimize local transmission may be implemented like isolation centers to cater for those infected until they heal. The differences in the provinces although exhibited as risk to a lesser extend should also be explored. Considering the heterogeneous difference of Zimbabwe's residential set up, this could have been attributed by the differences in sanitary conditions in the different areas. Since COVID-19 spread is highly related to hygienic conditions, improvement in hygiene in residential predicted to have a high risk may curb the spread of the infection.

CONCLUSION
COVID-19 in Zimbabwe has been largely due to imported cases and lesser extend local transmissions. High risk groups for the spread of the disease include, children, women non-travelers. Thus, therefore these social groupings should be thoroughly considered when authorities are to come up with any meaningful prevention measures. Overall, the difference in the residential locations although they contribute to spread, they pose a lesser risk to spread of COVID-19 compared to age differences.