The Wright Center’s Amy Olex, M.S., and Evan French were co-authors on a poster that won top honors at a recent conference.
The senior bioinformatics specialist and informatics system analyst, respectively, helped produce Covid-19 In Solid Organ Transplantation (SOT): Results of The National Covid Cohort Collaborative (N3C). The poster was accepted to the Cutting Edge of Transplantation (CEoT) 2021 conference that took place Feb. 25-27.
The primary author of the poster, an associate professor at Dalhousie University and the Nova Scotia Health Authority, won the Young Innovator Award for the submission.
The research identified a cohort of solid organ transplant recipients who received COVID-19 tests last year and evaluated health outcomes using N3C data. The collaborative, which VCU joined last summer, securely collects and organizes clinical and diagnostic data from patients across the country to create a dataset broad enough to engage in meaningful study of the novel coronavirus. VCU researchers can access the data for their studies.
Olex leads the national Immunosuppressed or Compromised Clinical Domain Team, which mines N3C data to identify how different types, levels, and durations of immunocompromise may affect severity and outcomes of infected patients. The team was instrumental to Vinson’s, Olex’s and French’s research.
Organ transplant recipients, people with HIV, those with autoimmune diseases like rheumatoid arthritis and multiple sclerosis – the COVID-19 pandemic has been especially scary for people whose immune systems are compromised or suppressed.
They’ve fought or are fighting battles against other diseases – or even their own immune systems. And the newness of the virus means no one is sure how they would fare against it.
“There’s very little data on how immunocompromised patients will respond to COVID-19,” said Amy Olex, M.S., senior bioinformatics specialist at the VCU Wright Center for Clinical and Translational Research. “It’s resulted in patients wondering if they should suspend life-altering treatments.”
To help fill that gap in data, Olex is leading a team that will leverage a national platform of COVID-19 clinical data to guide and support research into immunocompromised patients.
The National COVID Cohort Collaborative, or N3C, led by the National Institutes of Health (NIH), securely collects and organizes clinical and diagnostic data from patients across the country to create a dataset broad enough to engage in meaningful study of the novel coronavirus.
“The N3C initiative and data repository has sparked national collaborations with the goal of answering many of these yet unanswered questions about COVID-19,” Olex said. “It’s already yielding vital research.”
Within N3C, a collection of Clinical Domain Teams enable researchers with shared interests to analyze N3C data and collaborate efficiently. The teams provide researchers an opportunity to collect pilot data for grant submissions, train algorithms on larger datasets and learn how to use N3C tools. With teams, researchers can build off each other’s work, collaboratively and efficiently working to improve outcomes for patients affected by COVID-19.
Olex leads the Immunosuppressed or Compromised Clinical (ISC) Domain Team. Initial research will focus on people with HIV, organ transplants and those with autoimmune disorders, including skin diseases such as atopic dermatitis and eczema. And the team will identify areas that warrant additional study.
“The ISC Team will mine the N3C data to identify how different types, levels, and durations of immunocompromise may affect severity and outcomes of infected patients,” Olex said. “The hope is that our research brings much needed clarity to healthcare providers and people who are immunosuppressed or compromised.”
Teams like ISC welcome new members. They feature researchers and experts like statisticians, informaticists and machine learning specialists who collaborate across disciplines to tackle COVID-19 and its health impacts.
N3C is hosting an open house to engage CTSA members, newcomers, and the wider translational research community beginning on Jan. 19. The event will kick off with a 1-hour symposium, followed by a week of open Clinical Domain Team meetings, including the Immunosuppressed/Compromised Domain Team.
VCU researchers can contact Amy Olex at firstname.lastname@example.org with questions about immunosuppressed or compromised COVID-19 research.
The machines are learning. But that’s OK, because Amy Olex, M.S., is there to teach them.
The senior bioinformatics specialist at the Wright Center is extracting de-identified information from troves of clinical notes so that health researchers at VCU and VCU Health can create meaningful studies and bring research results to patients more quickly.
What is the technology behind NLP text analysis and when did it come about?
NLP’s been around for a little while, but the algorithms and science have improved. It started in linguistics, analyzing properties of texts, looking at frequencies of word distributions. The earliest NLP application that became widely available is your digital spellcheck. From there, it’s grown to where now it can predict your sentences. You have NLP applications that can write actual papers that get accepted to journals.
NLP in recent years has really taken off because of the ability to store and process massive amounts of information. We haven’t had the capacity until recently. That’s why Google is so advanced, because they’ve had this massive database of text, and they have the ability to run statistics and machine learning algorithms over it and process all the linguistic patterns from everybody that enters data into their system. That’s how you get text prediction in Gmail. Now, more of us have the technology to store and process large volumes of data.
Before we had all of that data available, NLP algorithms were rule-based, based on manually written rules. So, if you’re looking for terms associated with cancer, you build out manual dictionaries of all the different ways the word ‘cancer’ can be represented in clinical notes. Then you’d have expressions and rules go through the text and find those patterns that you manually specified. Honestly, rule-based systems are still very popular because they’re very easily interpretable. When they miss something, we immediately know why. ‘Oh, we didn’t have a rule for that instance.’
Now, with machine learning, when you’re using the large datasets and something is wrong, you don’t necessarily know why it found what it found. It’s a black box. Clinicians don’t like black boxes, especially when you’re pulling out medical concepts. It’s gotten a whole lot better over the years, for sure. But that’s an active research field, trying to figure out these machine learning algorithms and how they’re making decisions as to what is, for example, a cancer concept and what is not.
What’s a typical research project that you might employ NLP for?
One of the most frequent types of projects are extracting de-identified concepts or information from the unstructured clinical notes that is not stored in the structured Electronic Health Record (EHR) data. In structured EHR data, there are set fields and values, so there’s a specific line for ‘blood pressure’ and a space for the clinician to enter those numbers.
But if, say, a doctor types up a narrative of a person’s history, with all their symptoms and their symptom progression, that’s not going to be in the structured EHR data. That’s going to be an unstructured blob of notes.
For example, in cancer research, they’d want to pull out notes on a tumor’s stage. A lot of the time, tumor stage is written into the unstructured, narrative text. But some of that information may actually be important to defining the cohort for a clinical trial or for your study. You’re going to want to pull out those notes and put them into a structured database for processing, to extract specific terms, measurements, etc. and discretize it into structured data.
Fifty years ago, how might a researcher have conducted a study where they knew there was important information they needed inside clinical notes?
Manually going through the notes! They would pull out the information by hand, and honestly, they still do that to this day, because NLP is not widely available. And it’s still not generalizable, especially in the clinical domain. Before NLP, if you needed stuff that was in written text, you hired students to dig through the data and pull out the information you needed.
How has NLP played into COVID-19 research?
There’s a group of informatics people like myself working with NLP as part of the National COVID Cohort Collaborative (N3C), a national, centralized data resource for researchers studying COVID-19.
The NLP group is working to build out the algorithmic infrastructure that a researcher could use to study, not only COVID-19, but how it interacts with comorbidities like obesity or depression, where descriptors for those conditions are not going to be in the structured text. It will be very impactful, once we can get that additional information to the N3C repository.
How does being at a Clinical and Translational Science Award (CTSA) institution matter when it comes to NLP work?
It’s a big advantage. The Wright Center is like a large networking hub. We have connections on the medical campus. We have connections on the academic campus. So, working in NLP, I have access to the clinical data that someone on an academic campus does not have easy access to, and I have access to the clinicians who understand that data. But I also have access to those in College of Engineering’s Department of Computer Science, who develop the NLP algorithms and are really knowledgeable with machine learning.
I’m not a clinician. I came from a computer science background. So when I’m working on a project that’s dealing with a specific type of condition like cancer or depression, I need access to subject matter experts to figure out what concepts I need to pull out. What’s important for differentiating patient A from patient B?
That network all at one university is a really valuable thing to have for any researcher that comes to us. I’ve had researchers on the clinical side, and I’ve connected them with people on the academic side, and vice versa. The Wright Center is a bridge between worlds – two worlds that sometimes speak different languages. And researchers need to know that it’s here, and they can take advantage of it.
What else would you want a researcher to know about NLP and how it can help them?
NLP can be very powerful. It can do a lot of cool things, but it’s still not easy. The projects require very close collaboration between me, who’s the coder and the one developing the algorithm, and the PI in charge of the research. The PI understands the clinical background to the problem they’re studying.
NLP is not a quick ‘Hey, I need these concepts.’ It’s ‘okay, you need these concepts. Our pipeline isn’t set up for that right now, so we need to work together to figure out what we need to change – to get you the answers that you need.’ It’s very project specific. You can’t throw data at us, and we throw answers back.
And we’re always happy to talk with people if they have questions. People sometimes hesitate to contact us, because they don’t have a specific research project. But sometimes you don’t know what your research project is going to be, unless you know what services we can provide. It helps to call us before you even write a grant and to say, ‘Hey, what do you guys do?’
Virginia Commonwealth University researchers can now tap into a national resource to further their study of COVID-19.
The National COVID Cohort Collaborative, or N3C, led by the National Institutes of Health (NIH), securely collects and organizes clinical and diagnostic data from patients across the country to create a dataset broad enough to engage in meaningful research of the novel coronavirus.
VCU, led by the Wright Center, joined N3C this summer. As of last week, the collaborative contained de-identified data for more than 958,000 patients from across the country, meaning researchers have access to hundreds of millions of lab results, procedures and observations.
VCU researchers have two options for accessing the data:
Virginia Commonwealth University has joined a national, centralized data resource designed to help researchers study COVID-19 for years to come.
The National COVID Cohort Collaborative, or N3C, led by the National Institutes of Health (NIH), will securely collect and organize clinical and diagnostic data from patients across the country to create a dataset broad enough to engage in meaningful study of the novel coronavirus.
VCU joins more than 55 institutions so far that have agreed to participate in this national collaboration among hospitals and research centers. The institutions will send data, and researchers at those institutions will be able to request it.
“It’s difficult to draw conclusions from smaller data sets,” says Tamas Gal, Ph.D., the project lead for VCU at the C. Kenneth and Dianne Wright Center for Clinical and Translational Research. “Early on in the pandemic, there were observations about symptoms and outcomes, but no one really had the way to draw statistical conclusions, because there was no single center that had enough data.”
Patient identifiers are not included in the dataset. Researchers are able to interrogate the data through a centralized analytics platform without downloading it, providing additional privacy protection. The cloud-based N3C was certified secure by the Federal Risk and Authorization Management Program.
The project targets the capacity of research hubs like VCU’s Wright Center, who operate with the help of the NIH’s Clinical and Translational Science Awards. The 58 hubs, of which VCU had the first in Virginia, provide a base for N3C data, but the NIH is casting a wide net for a diverse set of data.
Research institutions collaborating on clinical data is certainly nothing new. VCU participates in national and global data sharing networks like ACT and TriNetX. But N3C is unique in the breath of data it’s collecting and the timeliness of its launch in collecting data during the pandemic.
“Clinical data is critical for understanding effective COVID-19 interventions,” says F. Gerard Moeller, director of the Wright Center. “N3C gives researchers an important foundation for future investigations, and I’m pleased that VCU is joining it.”
The Wright Center also participates in committees and working groups within N3C that help further develop the dataset and formulate research questions about COVID-19. The questions will inform future data points and selection criteria for the database.
“Inclusion criteria and the collected variables are continuously reassessed as COVID-19 research develops,” says Gal. “We want to make sure that all data required for future research are going to be available, so the work of the governance committee, as well as the data acquisition, harmonization and analysis working groups is very important.”
As the Wright Center’s director of research informatics, Gal oversees a team of data experts whose experience in biomedical informatics will be crucial to this project. The informatics team has activated quickly during the pandemic to build and maintain the data infrastructure that researchers need to make their projects happen.