“Creating computers that can understand human language and make intelligent decisions based on programmed reasoning skills is a more challenging computational problem than most people can appreciate”, said Georgia D. Tourassi, Director, Health Data Sciences Institute (HDSI) at Oak Ridge National Laboratory (ORNL) www.ornl.gov and Co-Principal for a research project between ORNL and the National Cancer Institute (NCI) www.cancer.gov.
She adds, “Cancer experts agree that having computational tools automatically collect key data from medical records could be a gold mine for researchers seeking to rapidly assess what is working and what is not in working in cancer care.”
When a cancer patient visits an oncologist, has a prescription filled, or gets a lab test, a record is generated. However, capturing such day-to-day cancer treatment data has proven to be elusive. Most data is recorded by doctors as free test narratives but doctors very often have their own, slightly different shorthand.
“A lot of intermediate data in terms of recurrence, metastases, and treatment response isn’t currently captured in the cancer surveillance data,” reports Dr. Tourassi. “Having people capture this data manually from EHRs is not scalable. We need to monitor more people for longer periods, which is too much data for human curators. In addition, data collection is done across different clinical sites because people get care in several places”.
To further develop large scale computing, Dr. Tourassi is directing a pilot project to integrate cancer data with large scale computing. The Oak Ridge centered project will collect data nationally through the NCI’s “Surveillance, Epidemiology, and End Results” (SEER) https://seer.cancer.gov program. The goal is to track the data for cancer incidences over time and identify patterns for various populations.
Today, the pilot is working with data already collected from registries in Kentucky, Louisiana, Georgia, and Washington. Eventually, the SEER data will connect with other data sources so researchers will be able to examine costs and determine the quality of care.
Besides developing natural language tools, the team is developing scalable visual analytics for cancer surveillance data and developing large-scale computational tools to allow predictive modeling for individual patients.
In the future, high performance computing will be used to solve biomedical research questions by assembling big data, machine learning, and modeling and simulation. “We are using high performance computing to bridge the gap between precision medicine and population health” said, Dr. Tourassi.