Biases in electronic health records - Revision history

Awaash at 04:53, 11 October 2018

2018-10-11T04:53:18Z

Awaash at 14:27, 10 October 2018

2018-10-10T14:27:57Z

Awaash at 14:24, 10 October 2018

2018-10-10T14:24:38Z

Awaash at 14:24, 10 October 2018

2018-10-10T14:24:25Z

Awaash at 14:19, 10 October 2018

2018-10-10T14:19:26Z

Awaash at 14:15, 10 October 2018

2018-10-10T14:15:47Z

Awaash at 14:15, 10 October 2018

2018-10-10T14:15:04Z

Awaash: Created page with "{{StudentProjectTemplate |Summary=To evaluate the impact of sample bias on the predictive value of machine learning models built using EHR data |Keywords=Machine Learning, Ele..."

2018-10-10T14:13:52Z

Created page with "{{StudentProjectTemplate |Summary=To evaluate the impact of sample bias on the predictive value of machine learning models built using EHR data |Keywords=Machine Learning, Ele..."

New page

{{StudentProjectTemplate
|Summary=To evaluate the impact of sample bias on the predictive value of machine learning models built using EHR data
|Keywords=Machine Learning, Electronic health records, Sample bias
|TimeFrame=Spring 2019
|References=1. Verheij, Robert A., et al. "Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse." Journal of medical Internet research 20.5 (2018).
2. Gianfrancesco, Milena A., et al. "Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data." JAMA internal medicine (2018).
3. Johnson, Alistair EW, et al. "MIMIC-III, a freely accessible critical care database." Scientific data 3 (2016): 160035.
|Prerequisites=Good knowledge of applied mathematics. An ability to implement state-of-the-art algorithms in a suitable programming environment. An interest in machine learning algorithms and medical data analysis.
|Supervisor=Awais Ashfaq, Sławomir Nowaczyk,
|Level=Master
|Status=Open
}}
Predictive modeling with electronic health records is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHR than normal patients. Thus prediction models built on EHR data are likely to be biased towards the sicker population. This is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
The goal of the project is to evaluate the impact of sample bias on different prediction models trained on EHR data. You will use the MIMIC-III database (see references) for this project.

Tentative project plan:

1- Scan through sources of potential bias in EHRs. (See references for a start)

2- Build simple predictive models (Logistic regression, Neural nets, random forests etc.) to predict an outcome of interest (like in-hospital death) using the MIMIC-III database.

3- Sub-sample the training and testing set based on your knowledge (from step 1) and re-run the developed models. For instance, you may sub-sample the data based on age, gender, the frequency of visits, time, place of visit etc.

4- Evaluate the change in model performance and discuss in detail.

5- Suggest possible solutions to overcome the impact of sample bias when building EHR driven prediction models.

Deliverables:

1- A succinct review of different biases in electronic health records and their impact on predictive models.

2- A summary of recent (2017-2018) studies designed for predicting in-hospital deaths using EHR data.

3- Results: Prediction performance of developed models using complete EHR data and its sub-samples.

4- A critical analysis of the bias problem in light of your results.

Contact: Awais Ashfaq (awais.ashfaq@hh.se)

@@ Line 24: / Line 24: @@
 - Sub-sample the training and testing set based on your knowledge (from step 1) and re-run the developed models. For instance, you may sub-sample the data based on age, gender, the frequency of visits, time, place of visit etc.
-- Evaluate the change in model performance and discuss it in detail.
+- Evaluate the change (if any) in model performance and discuss it in detail.
 - Suggest possible solutions to overcome the impact of sample bias when building EHR driven prediction models.

@@ Line 13: / Line 13: @@
 |Status=Open
 }}
-Predictive modeling with electronic health records (EHR) is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHR than normal patients. Thus prediction models built on EHR data are likely to be biased towards the sicker population. In the realm of machine learning, this is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
+Predictive modeling with electronic health records (EHRs) is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHR than normal patients. Thus prediction models built on EHR data are likely to be biased towards the sicker population. In the realm of machine learning, this is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
 The goal of the project is to evaluate the impact of sample bias on different prediction models trained on EHR data. You will use the MIMIC-III database (see references) for this project.

@@ Line 13: / Line 13: @@
 |Status=Open
 }}
-Predictive modeling with electronic health records is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHR than normal patients. Thus prediction models built on EHR data are likely to be biased towards the sicker population. In the realm of machine learning, this is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
+Predictive modeling with electronic health records (EHR) is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHR than normal patients. Thus prediction models built on EHR data are likely to be biased towards the sicker population. In the realm of machine learning, this is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
 The goal of the project is to evaluate the impact of sample bias on different prediction models trained on EHR data. You will use the MIMIC-III database (see references) for this project.

@@ Line 13: / Line 13: @@
 |Status=Open
 }}
-Predictive modeling with electronic health records (EHRs) is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHR than normal patients. Thus prediction models built on EHR data are likely to be biased towards the sicker population. In the realm of machine learning, this is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
+Predictive modeling with electronic health records (EHRs) is considered an essential step towards precision medicine and improving care quality. However, there is a potential risk of building biased and incorrect prediction models if the complexities and limitations of EHR data are not completely studied. For instance, data collection in EHRs depends on individual patient needs and health state. Sicker patients tend to have more data in EHRs than normal patients. Thus prediction models are likely to be biased towards the sicker population. In the realm of machine learning, this is referred to as 'sample bias' because the distribution of the available data does not reflect the true environment.
-The goal of the project is to evaluate the impact of sample bias on different prediction models trained on EHR data. You will use the MIMIC-III database (see references) for this project.
+The goal of the project is to evaluate the impact of sample bias on different prediction models built using EHR data. You will use the MIMIC-III database (see references) for this project.
 Tentative project plan:
@@ Line 20: / Line 20: @@
 - Scan through sources of potential bias in EHRs. (See references for a start)
-- Build simple predictive models (Logistic regression, Neural nets, random forests etc.) to predict an outcome of interest (like in-hospital death) using the MIMIC-III database.
+- Build models (Logistic regression, Neural nets, random forests etc.) to predict an outcome of interest (like in-hospital death) using the MIMIC-III database.
 - Sub-sample the training and testing set based on your knowledge (from step 1) and re-run the developed models. For instance, you may sub-sample the data based on age, gender, the frequency of visits, time, place of visit etc.

@@ Line 4: / Line 4: @@
 |TimeFrame=Spring 2019
 |References=1. Verheij, Robert A., et al. "Possible Sources of Bias in Primary Care Electronic Health Record Data Use and Reuse." Journal of medical Internet research 20.5 (2018).
 . Gianfrancesco, Milena A., et al. "Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data." JAMA internal medicine (2018).
 . Johnson, Alistair EW, et al. "MIMIC-III, a freely accessible critical care database." Scientific data 3 (2016): 160035.
 |Prerequisites=Good knowledge of applied mathematics. An ability to implement state-of-the-art algorithms in a suitable programming environment. An interest in machine learning algorithms and medical data analysis.
 |Supervisor=Awais Ashfaq, Sławomir Nowaczyk,
 |Level=Master
 |Status=Open