Projects

Phd Thesis: Statistical methods for characterising the severity of an emerging pathogen: case studies of the COVID-19 pandemic

My PhD thesis can be viewed here.

The three following projects listed below are part of my PhD thesis.

Gaussian process nowcasting: application to COVID-19 mortality reporting

Published in Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, 2021.

Why? COVID-19 mortality reporting is delayed, leading to misleading data often suggesting an improving situation.
What? We developed a nowcasting (forecasting of ‘now’ or very near future) method to correct for reporting delays in COVID-19 mortality data.
How? We used a latent Gaussian process to describe the auto-correlation structure in the reporting delays.
Result? Our method outperformed comparable approaches and human expert predictions, providing accurate nowcasts with robust uncertainty estimates.

Read the publication Document

Code on GitHub GitHub

Tech: Python, Stan, R, Pandas, Matplotlib Tech

Inference of COVID-19 epidemiological distributions from Brazilian hospital data

Published in Journal of the Royal Society Interface, 2020.

Why? Effective care planning and pandemic modelling require accurate COVID-19 epidemiological distributions, especially in low and middle-income settings.
What? We determined epidemiological distributions, such as onset to death time, for COVID-19 patients using a large hospitalisation dataset from Brazil.
How? We applied a joint Bayesian subnational hierarchical model with partial pooling to describe data across Brazilian states, and selected for th best fitting underlying probability density functions with a model selection method.
Result? We found significant geographical variation in epidemiological distributions, providing essential estimates for COVID-19 modelling in Brazil.

Read the publication Document

Code on GitHub GitHub

Tech: Python, Stan, R, Pandas, Matplotlib, scikit-learn Tech

Application of referenced thermodynamic integration to Bayesian model selection

Published in PLoS ONE, 2023.

Why? Normalising constants are crucial for Bayesian model selection but are often difficult to compute for high-dimensional distributions.
What? We applied referenced thermodynamic integration (TI) to efficiently compute normalising constants.
How? We propose solutions to effectively construct a reference density for an arbitrary problem. We also used a reference density in the TI method to perform model selection for a complex Bayesian model of COVID-19 transmission.
Result? Demonstrated the practical utility of the method with a successful application to a high-dimensional real-world problem.

Read the publication Document

Code on GitHub GitHub

Tech : Python, Stan, NumPyro, Pandas, Matplotlib, scikit-learn, SciPy Tech

Peer-group Behaviour Analytics of Windows Authentications Events Using Hierarchical Bayesian Modelling

Presented at the AAAI 2023 Artificial Intelligence for Cybersecurity workshop, 2023.

Why? Cybersecurity analysts are overwhelmed by false positives from existing threat detection methods.
What? We proposed a new approach for modelling peer-group behavior of Windows authentication events.
How? We used hierarchical Bayesian models in a two-stage approach involving data-driven peer-group formation and Poisson distribution modeling.
Result? We showed empirical evidence of reduced false positives on a real-world dataset, aiding in more efficient threat detection.

Read the ArXiv preprint Document

Code and data were proprietary and could not be shared.

Tech: Python, NumPyro, Pandas, Matplotlib, Spark, SQL Tech

The projects listed below are projects I substantially contributed to but did not lead.

Spatial and temporal fluctuations in COVID-19 fatality rates in Brazilian hospitals

Published in Nature Medicine, 2022.

Why? The Gamma variant of COVID-19 caused extreme mortality shocks, highlighting the need to understand fatality rate variations.
What? We documented sweeping shocks in hospital fatality rates in Brazilian state capitals during the Gamma variant spread.
How? We developed a Bayesian multi-strain fatality model, which we then use to analyse individual patient records from a large dataset, focusing on temporal and geographical variations. Result? Our analysis evealed significant mortality shocks and suggested that health service strain and variant spread exacerbate mortality risks.

Read the publication Document

Code on GitHub GitHub

Tech: R, Stan, ggplot Tech

The association between mechanical ventilator compatible bed occupancy and mortality risk in intensive care patients with COVID-19: a national retrospective cohort study

Published in BMC Medicine, 2021.

Why? Understanding the impact of ICU strain on mortality risk is crucial for managing resources during pandemics.
What? We investigated the association between ICU bed occupancy and mortality risk in COVID-19 patients.
How? We employed a Bayesian hierarchical model to analyse data from English hospital trusts, adjusting for various factors.
Result? Our analysis reealed that high ICU occupancy is associated with increased mortality risk, emphasising the need to manage ICU strain.

Read the publication Document

Code and data were proprietary and could not be shared.

Tech: R, Stan, brms, ggplot Tech

Use of Contrastive Learning to Predict the Prevalence of Malaria in Africa Using Satellite Imagery

paper in preparation

Why? Accurate malaria prevalence prediction is vital for effective intervention, especially where the household survey data is lacking.
What? We estimated the malaria prevalence in Africa using remote sensing and deep learning techniques.
How? We applied contrastive learning to satellite imagery to predict malaria prevalence with high accuracy.
Result? The model achieved a low error, demonstrating the model’s effectiveness in predicting malaria prevalence even without household survey data.

Tech: Python, Tensorflow, PyTorch, NumPyro, Google Earth Engine, gdal, weights&biases Tech