NOTE: this position listing has expired and may no longer be relevant!
This exciting PhD project will use state of the art machine learning causal inference methods to unravel the role of genetics and ion channels (for example potassium or sodium) in determining heart rhythm, increasing our understanding of the causal pathways from genetic variants to heartbeat patterns.
The electrical signals that regulate the contractions of the heart result from the flow of ions into heart muscle through ion channels. Changes in the electrical signal are associated with abnormal heart rhythms and sudden cardiac arrest.
Genetic studies have identified DNA variations in many genes which are associated with changes in heart rhythm, however it is often unclear if these work through direct effects on the ion channels or through other pathways. The number of genetic variants and blood or heart-measured electrolytes of interest, together with other epidemiological and lifestyle data, makes this a “Big Data” problem.
Available statistical methods are not suitable for the complexities of this problem, which requires adjustment for a high number of potential confounders (other genetic variants that may affect the ion channel of interest and heart rhythms at the same time). The use of machine-learning or other data-adaptive selection of confounders (e.g. based on stepwise selection or lasso) is problematic in such high-dimensional settings as it tends to induce bias in effect estimates as well as their confidence intervals (which are then typically too tight).
Targeted maximum likelihood estimation (TMLE) incorporates machine learning, and was designed to attenuate these biases, but can be complicated to develop. Moreover, it results in biased standard errors under model misspecification. Therefore, to overcome both concerns (high-dimensional confounders and potential model misspecification), we will extend the so-called bias-reduced double-robust estimators to incorporate penalisation and other machine learning methods, with the aim of attaining valid causal inferences in these high-dimensional settings, even under (partial) model misspecification.
The project will involve implementing and evaluating these methods (which belong to the class of causal inference methods known as “double machine learning methods”) through computer simulations. We will also apply the methods to large UK cohort data sets (e.g. the TWINs UK cohort, and the UK Biobank).
This project is part of a new and exciting collaboration between geneticists at the Cardiogenetics Lab in Saint George’s University of London, and statistical methodologists working in causal inference based in LSHTM.
The successful candidate will be funded for 3.5 years, and be based in the Department of Medical Statistics, LSHTM. In addition, there is also an opportunity to spend a fully-funded 3-month internship at the Wellcome Trust Sanger Institute, a world leader in genomic research, located in Cambridge UK.
Funding is available for UK/EU nationals from the UK Medical Research Council London Intercollegiate Doctoral programme. Studentship will cover
– full fees at the UK/EU rate,
– annual stipend at the MRC stipend rate (with London weighting for those eligible, check rules),
– annual GBP 5,000.00 research training and support grant, and
– annual GBP 300.00 travel and conference allowance
for the duration of the award.
Additional flexible funding is available for students for training, and for the 3-month internship.
Applicants must have (or expect to complete by September 2018) a Master degree (or equivalent, e.g. MMath) in Statistics, or strongly related fields, with strong programming skills or some background in machine learning.