Alumni Profile: Bradley Malin ('06)

malin300Bradley Malin is Professor of Biomedical Informatics, Biostatistics, & Computer Science at Vanderbilt University.

A 2006 PhD graduate of the Societal Computing (formerly Computation, Organizations, & Society) program, Bradley also holds a MS in Machine Learning, a MPhil in Public Policy and Management, and a BS in Biological Sciences from Carnegie Mellon University.

Bradley spoke with us about how his time at CMU prepared him to take up a leading role in the development of big data biomedical informatics and how his collaborative spirit continues to drive his research.

Bradley, you’ve been around the block at CMU a few times. You did your undergrad degree here as well as your PhD. What brought you to CMU and what led you to the then brand-new PhD program in Computation, Organizations & Society Programs?

Ever since middle school, I was fascinated by the concept of recombination, particularly as it pertains to genomics. It amazed me that you could take the genetic material of two different species, splice them together, and create something new. I attended the APEA (Advanced Placement Early Admission) Carnegie Mellon summer program while in high school and learned about all of the truly great geneticists. So, after high school I joined the Department of Biological Sciences. During my undergraduate, I took a variety of classes, including various computational biology courses, while completing a minor in Technology Policy in the Engineering and Public Policy Department.

In my senior year, I met Dr. Latanya Sweeney, who would go on to help found the Computation, Organizations & Society PhD program (now Societal Computing). After conducting research in genetics and privacy with her group, I joined the PhD program in the Heinz College - where she also held a joint appointment. At the same time, I joined the master’s program in the Center for Automated Learning and Discovery in the School of Computer Science (which would eventually become the Machine Learning Department) to hone my skills in analytics.

After I completed my master’s in both programs, I realized that my interests had shifted. I was still interested in public policy, privacy, and information systems management, but I became really interested in the analytics and the computational approaches for invading, as well as maintaining, privacy.

As a result, I moved to the PhD program in Software Engineering, and a year later I moved once again into the Computation, Organizations & Society (COS) program as part of the first class of doctoral students. CMU is one of those places where everybody’s so multidisciplinary and so collaborative that it just seemed natural to take the leap and join such an innovative environment. There were actually several of us in various programs across the university who transitioned into COS. All of us had a pretty similar mindset: we wanted to work across disciplinary boundaries to conduct computer science research in the real world.

It sounds like the multidisciplinary/interdisciplinary current is something that runs in your blood. How has that played out over your career?

When you’re at CMU, what really pushes you is looking at people like Herb Simon. Herb was involved in so many different aspects of society. He conducted foundational research in computer science (and artificial intelligence in particular), behavioral psychology, and economics. And as an artifact, he founded or became a part of numerous departments at CMU.

While immersed in the CMU culture, you begin to recognize that, while diving deep into a specific discipline is important, there’s much to be gained by synthesizing perspectives from seemingly disparate disciplines to solve real problems and create something new in the process.

I believe that the interdisciplinary component is what drives me on a daily basis. It’s definitely influenced how I’ve approached the research problems that I tackle at Vanderbilt and is how I encourage my collaborators and students to investigate and innovate. I believe that several examples can illustrate how such a

This was one of the reasons why I established the Big Biomedical Data Science (BIDS) program at Vanderbilt. It brings together faculty from over 10 different departments and sponsors students from multiple colleges across the university. It’s in this big mixer where you get cross-fertilization forging the basis of a new field as you move forward.

I also established the Center for Genetic Privacy and Identity in Community Settings (GetPreCiSe), which is an NIH Center of Excellence in Ethical, Legal and Social Implications research. We have faculty from Communications, Media Studies, Economics, Sociology, Computer Science, Pharmacology, Biomedical Informatics, and Law working together to study privacy issues and identity problems with respect to genomic data. We’re developing new technologies for the generation of genomic data, its interpretation, and its sharing on a mass scale.

What was it about computer science at Vanderbilt that drew you there?

I was drawn to Vanderbilt because of the Department of Biomedical Informatics in the School of Medicine. It is a highly collaborative, multidisciplinary group. This department is actually the largest of its kind in the world. And when people want to see how you do computer science and medicine, they use Vanderbilt as the example.

We have faculty with backgrounds in computer and information science, anthropologists with a focus on information technology, as well as library science who focus on knowledge management and exchange between clinicians to patients. We have a large number of physician scientists, with a wide range of specializations, including oncology, surgery, pediatrics, and neonatology.

And it turned out that was the best decision I could have made, because we ended up becoming the progenitor for a lot of different programs around the country who are now trying to reuse all of this data, and do it in an ethically viable, computationally sophisticated manner in order to have immediate health impacts. For instance, in the mid 2010, President Obama established the All of Us Research Program, a multi-billion dollar program to create a cohort of 1 million individuals, contributing their electronic medical records data, biospecimens, as well as mobile computing information and survey data into just a large, nationwide repository for research purposes. And building on over a decade of experience in electronic medical records and genomics research, Vanderbilt became the head of the Data and Research Center for the program.

How has your research evolved in the 15 years you’ve been at Vanderbilt?

When I first arrived at Vanderbilt, I established a lab to focus on health information privacy from a data science perspective. However, over time, I was afforded opportunities to initiate hypothesis-driven research, collaborating with biomedical researchers who had very specific questions. For instance, I’ve worked with nephrologists on trying to understand if you could predict if an individual would have issues with their kidneys. I’ve also had the good fortune working with oncologists to understand how social media data can provide intuition into if individuals will remain on treatment regimens, using the way they talk about things to build inferred personality type models.

As I grew these projects and brought on additional faculty, this led to the establishment of the Vanderbilt Health Data Science Center. This environment has cultivated an environment that blends hypothesis-driven approach with data science - bringing together all the facets of what’s needed to get data to sing for you.

You’ve done a lot of work at Vanderbilt to bring computational approaches to biomedical science there. There had to be some really significant challenges along the way, yes?

When I was at CMU, the School of Computer Science brought together everyone who worked on computation in some way shape or form. Vanderbilt, by contrast, is a biomedical research powerhouse. They are one of the largest recipients of NIH grant funding in the country. They have a relationship with this community for healthcare that really drives a sense of community spirit and community well-being at the university. However, what this means is that computational expertise are scattered all over the institution. There are certainly a large number of faculty in the Department of Biomedical Informatics, but there are also computational experts in just about every basic science and clinical department. Simply understanding where all of the experts are and what they are up to can be quite a challenge at times, but it’s also what makes working here so exhilarating.

That’s interesting. We do tend to pride ourselves on our interdisciplinary bent here in the School of Computer Science. But, as you say, that approach can take a number of different forms. Based on what you’ve seen and done at Vanderbilt, how can we do better?

CMU’s strong point is obviously computer science. But, if you look at the researchers that are really pushing the boundaries, they are often the ones who are going over to Social and Decision Sciences or Biology or Psychology; the ones that are getting multidisciplinary.

There were a lot of people who I worked with that I saw building those bridges, and it was very inspiring. Bob Murphy from the Biology Department was a perfect example of somebody who would spend time in the Department of Biology and then walk up the street to the School of Computer Science. Or I’d see Alan Montgomery, who was working in marketing, suddenly turn up to teach a course in Machine Learning.

And I actually have to give Tom Mitchell a lot of credit for establishing the Center for Automated Learning and Discovery, or what’s now the Machine Learning Department. Tom really created this environment that said, “Whatever background you come from, we’re all working on computational challenges. But we’re going to need all the domain expertise. And if you can bring that into this environment, we’ll create an opportunity to work together.” Tom’s leadership and perspective on research was a big influence on me and many others. It’s not uncommon for data scientists at CMU to collaborate with healthcare organizations, such as at the University of Pittsburgh Medical Center (UPMC). It’s moving in the right direction.

So, what are you working on now? What's the current hot research topic you are taking on?

There are many different projects that I am working on, ranging from computational infrastructure to process large amounts of biomedical data to cryptographic strategies to compute over information in untrusted (such as public cloud) environments. One of the projects I am particularly excited about is focused on the development and evaluation of risk-based approaches to privacy.

We began this line of research based on the observation that, over the past several decades, the computer science community has developed numerous privacy enhancing technologies that can enable data sharing with provable guarantees of protection. Yet despite the abundance of such advanced technologies, there is very little technology transfer into practice.. So we started looking at what people do currently when they want to work with, say, a large potentially risky data set. And what we found was, the data recipients tend to just sign contracts and data use agreements. So, would I use data that has been manipulated through some computational mechanism when I can just sign this contract, promise to be a Good Samaritan?”

And it dawned on us that whether the protection strategy is some form of data manipulation, contractual agreements, or a reinforced social norms - all of them can be modeled as deterrents. Based on this observation, we asked the question of how do you bring all of these concepts together and use this as a framework to design data management policies that both the technologies and the policy makers can understand?

We were particularly concerned about biomedical data. The main reason was because the medical space is one of the only places where you have regulation on the books that is quite proscriptive with the way you have to control data before you share it. Specifically, if you are going to share medical data, you have the option of performing a risk assessment, whereby you need to prove that the risk of identifying an individual in a data set is small given who your anticipated recipient is under reasonable means. This is really fascinating because the anticipated recipient component implies that you should be modeling your adversary and their capabilities, which suggests we’re dealing with an optimization problem here. Your adversary is not necessarily everybody in the world with every dollar available to them, but they have resource constraints.

We found game theoretic frameworks provided an excellent representation of the problem. And, given the very large space of possible ways in which data could be manipulated or contracts could be defined, we had to design efficient computational methods to sift through all of the options to find the policies that maximized utility while sufficiently mitigating privacy risk.

Then, we were fortunate in demonstrating the potential of such an approach with real data in real systems. We have been using data from Vanderbilt, but also a consortia of about 10 other other academic medical centers around the country that are hungry for data sharing and supporting biomedical research. We are now in the process of adapting this approach to developing data sharing policies for large biomedical research programs, such as the All of Us Research Program of the National Institutes Health, which is aiming to collect and share data from the electronic medical records, surveys, and biospecimens of over one million US residents.

The potential there has to be exciting. After all, this data doesn’t do us any good if it’s sitting on a dusty server rack in the basement of some building. It’s only once it gets into the hands of oncologists and other teams of collaborative researchers that we can actually do some societal good with this, correct?

That's true. You can never guarantee complete privacy in a system when people are going to continue to work with data. However, we can build technology that enables an assessment of what privacy we are trading off when sharing and using the data.

Still, how do you define what is the optimal balance? That’s something that’s not solely the computer scientists’ decision, but rather it’s up to society to come to agreement on. And I am proud to build technology that enables society to begin the dialogue and put this data to good use, for all of us.

[ To learn more about the vital work Dr. Malin is doing, visit him at his website. To learn more about the critical research our current students and faculty are doing in Societal Computing, check out our research page! ]