Digging deep into data

Aaron Wolen, left, and Tim York, right, analyze data and show others how to collaborate using Open Science Framework
By: Krista Hutchins

July 12, 2017

Hovering over a computer, VCU Data Science Lab Director, Timothy P. York, PhD, and Wright Center Bioinformatics Specialist, Aaron Wolen, PhD, scrutinize, analyze, and interpret the emerging and rapidly growing field of big data. Much of their day is spent digging deep into data in an effort to make research more transparent and reproducible.

“The Data Science Lab is an idea that grew out of conversations Tim and I were having about how the data science movement has produced amazing solutions to some of the most common pain points researchers experience while working with their data,” Wolen said.

“One of our passions, York added, “is facilitating moving raw data to publication more efficiently while ensuring the robustness of the research product.”   York, also an Associate Professor for VCU Departments of Human and Molecular Genetics and OB/GYN and the Virginia Institute for Psychiatric and Behavioral Genetics, believes this is the wave of the future.

What is Data Science?

Data Science is both the science and art of working with data. The VCU Data Science Lab supports best practices for reproducible research using modern computational tools. The program, sponsored by the Office of the Vice President for Research and Innovation, and supported by the Wright Center for Clinical and Translational Research, aims to help researchers manage their projects and workflows.

“Reproducibility of research is key to advancing knowledge and maintaining public trust in science,” said Francis Macrina, PhD, Vice President for Research, Office of Research and Innovation.

Collaborating and sharing your research matters, according to York and Wolen.  “These techniques can dramatically improve the reproducibility and transparency of your research, which helps others understand exactly what you did to produce a result,” Wolen said.

So what exactly is Data Science?  According to a recent Harvard Business Review article, it is The Sexiest Job of the 21st Century and according to Glassdoor, it is one of the ‘Best Jobs in America’.  But it is much more than that according to York. The Data Science Lab hopes to solve the ongoing problem of managing, tracking and sharing your research with easy to use storage, data analysis and collaboration.

Open Science Framework

One of the tools in the scientific computing toolbox is the Open Science Framework (OSF).  It is a free, open source application built to help researchers manage their projects and workflows.  “The OSF is a great, free tool that provides an entry point to researchers, regardless of their technical background, to learn and adopt best practices for reproducible research, York said.  The OSF is part collaboration tool, part version control software, and part data archive.

A recent workshop, as part of an on-going OSF educational and hands-on series, provided an overview of Open Science Framework.  The workshop also demonstrated how VCU researchers can use it for securely storing data and materials, organizing projects, coordinating with collaborators and making all or part of their data public and citable.

Workshop attendee, Michael Broda, PhD, Assistant Professor, Department of Foundations of Education, found the information extremely helpful, “A deeper understanding of reproducibility in research is absolutely critical for education researchers.  The OSF training provided by the VCU Data Science Lab is a wonderful introduction to these issues, as well as an invaluable tool for research management and dissemination.”

Michael Broda learning how to use the Open Science Framework for research management

Roxann Roberson-Nay, PhD, Associate Professor of Psychiatry, thought the workshop contained the ideal balance of content depth and efficiency.  “I now feel much more confident in my ability to use the OSF to manage the research activity of my NIH funded grants.”

Roxann Roberson-Nay asking questions on how OSF can track her NIH funded grants

Other participants reflected on the value of sharing data, “to be able to have access to all of your data and analyses, as well as sharing, it is amazing.  I wish I had the Open Science Framework when I started 20 years ago,” Rita Shiang, Associate Professor, Human and Molecular Genetics said.

Laura Padilla and Rita Shiang, right, are collaborating to learn how to manage and analyze their data

Rigor and Reproducibility

The OSF module also falls in line with new National Institutes of Health (NIH) guidelines for Rigor and Reproducibility.  The NIH Guidelines advocate a commitment to promoting rigorous and transparent research in all areas of sciences…it is key to the successful application of knowledge toward improving health outcomes.

“The VCU Data Science Initiative is a university-wide solution that aims to educate and assist our community of researchers in implementing and sustaining best practices in data science,” Macrina said.

The next workshop, with participants learning how to create a reproducible project from start to finish, will be July 25th, at the Molecular Medicine Research Building from 10 am-12 pm, and is aimed at faculty, graduate students, postdocs, across disciplines, who are actively engaged in research.  If you would like to attend, please register here: https://training.vcu.edu/course_detail.asp?ID=16013.