View my profile

2012-06-22

Sine Qua Non

Sine qua non -- a Latin expression that literally translated means "without which not" is the theme for this post. It's often defined as "something absolutely indispensable or essential." It's a bit of Latin I learned in high school. 

My first year Latin teacher was Father Bernard, a burly man who taught us classical Latin with a hardbound book usually administered to our skulls when we made a mistake. My second year teacher, Father Terence, chain-smoked cigarettes in every class. (Very different times, indeed.) Every Lent, however, he took a vow of tobacco abstinence for 40 days until Easter. The joy of Easter had a special meaning for the class.

Nevertheless, despite the challenges of book-wheeling and nicotine-deprived teachers, I did learn a bit of Latin. I learned enough to appreciate the literal meaning of sine qua non with its sudden stop at the end. But it was more than just a negative, it means an absence -- nothingness -- rather than negation. Whatever it was, if you did not have it, you had nothing. 

That truth pertains in our project for without data we have nothing. Yet data owners hold on to their data because they consider them very valuable or dangerous if seen by outsiders. So, how do we gain access to something so precious to its owners that they do not want anyone else to see it? Companies and agencies view data as vital to their existence. It is often the power that informs and guides their operations. Data relating to health or business issues are even more closely held because of concerns about privacy or competitiveness. The question is, how can we crack the code to gain access? After a few years working in this area, there seem to be three major schools of thought.

First, require it by law. Quite often people think that passing a law can solve a problem. It can be helpful, but experience shows that it's difficult to achieve uniform compliance. I'll avoid the issue of speed limit compliance and focus on an area more germane to our mission: notifiable diseases. These are diseases that healthcare personnel are legally obligated to report to public health authorities, they form far from a perfect score. A number of studies have investigated what happens. One classic study (Rosenberg ML, Marr JS, Gangarosa EJ, et al. Shigella surveillance in the United States, 1975. J Infect Dis 1977;136:458–6) looked at the compliance of reporting a Shigella outbreak (it's a bacteria that causes severe gastrointestinal distress--don't ask--that is often spread by contaminated food). Here's the breakdown of cases and reporting: 
  • 100 people infected 
  • 76 became symptomatic
  • 28 consulted a physician
  • 9 submitted stool cultures
  • 7 had positive culture results
  • 6 were reported to the local health department
  • 5 five were reported to CDC
Why the attrition? Other studies have observed many reasons such as concerns for patient privacy, the difficulty to process the report, the type of disease involved, and the lack of understanding by the clinician of the public health value of reporting. A countervailing factor is when the clinician sees the value of reporting to serve their patients better. 

Second, rely on volunteerism. Perhaps appealing to people's better natures will encourage them to provide data. That approach has presented its own challenges by asking the data provider to do the lifting with no pragmatic incentive. It takes extra effort since they don't have the motivation that it's required. It relies on sharing the mission with the organization or agency who seek the contribution. Moreover, voluntary efforts are often more successful with only the most serious cases being reported. Even when the data is considered valuable and important, such as ILINet (the voluntary influenza like illness reporting network), providers still have to spend up to 30 minutes to compile the report -- a lot of time for a busy clinician. If legal requirements or broad calls for voluntary participation fail to achieve full and timely data, perhaps looking more closely at the data owners is the path forward. 

Third, rely on data providers' values. Understanding a potential data providers’ values seems to offer a more successful approach to gaining access to data. The value proposition can span tangible to intangible values -- all are usually in play. Tangible values draw upon the pragmatic side of our organizations. Yet, intangible values can work, such as appeals to the public good -- even patriotism. A value proposition that centers on the data provider leverages volunteerism by offering a range of approaches from the intangible to the pragmatic. Since it's never pure, in our experience some combination of the following seems to work:
  • Helping a group fulfill its mission and goals with information derived in whole or in part by their data can be a powerful incentive to participate. 
  • All organizations want to optimize their operations. For example, health care providers are under constant pressure to deliver services more efficiently. Improved and earlier situational awareness can directly benefit them in their response. 
  • Many organizations are more concerned with risk management when thinking about health issues. The food industry is a case in point. Their value proposition is partly to manage risks to their bottom line from contaminated foods. Earlier warnings, even non-specific ones, can offer them greater knowledge of the situation and guide their efforts to protect their firms. Food retailers detecting and responding to food safety issues can avoid unwanted attention from regulators, courts, governments and media. Companies have lost millions in single instances of a food contamination outbreak. At the same time, by addressing the issues of tainted foods, they have also helped protect the public’s health. A classic win-win. 
  • All organizations recognize the value of their data and many will be receptive to the notion that participation with our effort can increase the utility and value of their data. Deriving information across an industry with data drawn from many separate companies can put the individual firms’ data in context as well as offer unexamined or unlocked value of the current data. By combining multiple sources, individual data providers feel more comfortable in sharing their data. 
  • Many organizations are open to the notion that information plus their data helps them develop new tools and capabilities. 
  • Many organizations are open to opportunities where our mission can relate to theirs. They can leverage our use of their data to create new products and services to expand their businesses. 
  • Finally, the obvious one. We can pay for access to data. With all these other incentives in the pipeline, however, this should be our last and least resort.
Our project's approach is to leverage the value that data providers perceive in their data and open them to new possibilities inherent in their data that they usually cannot unlock. Our efforts touch on three value concepts inherent in our operation.

We believe that the first order of business is to understand and serve the value proposition of the data owners on their terms. Our initial engagement we call data discovery. We provide an initial data quality assessment to the data provider. This is partly our process of learning about their data (What to all the elements mean? Do we have a complete set? How are the data used in the provider’s operation? How often are the data generated/collected? Etc.). We have found that delivering a data quality assessment on many millions of records within days sends a message of how much we value their data. Next, we work with the data providers’ subject matter experts to understand their data, its purpose to them, how they use it, and what the values mean. During this step, we often begin applying our analytics tools to understand norms and baselines, so we can work with them to explain and understand the anomalies that surface. Finally, we can take the opportunity to conduct limited-scope investigations to address small questions or issues that the provider has. These have often related to their uses of the data and can prompt their thinking about future capabilities -- even products -- derived from their data.

Of course, our purpose is delivering earlier situational awareness of health threats. We have lots of powerful technology to speed access and process of the data and rapidly produce signals without the usual data noise. Our goal is to achieve real-time response to delivered data. But the value to the data provider is being able to integrate data across similar and dissimilar domains. Because we can deliver earlier signals than other approaches, these organizations can address issues as they appear and avoid depending on the public media for situational awareness.

Finally, our third value is serving as a trusted-third party and a data safe haven. Governmental agencies often have regulatory or oversight functions that they must execute. Moreover, with exceptions such as human clinical or student data, many forms of data are potentially discoverable by outside parties under the Freedom of Information Act of 1966 and its extension to electronic records in 1996. While these are right and necessary for governments in our democracy, they can chill the willingness of a company or organization to share their data with governmental agencies. Moreover, the federal and state governments assert the legal doctrine of sovereign immunity—meaning that they usually cannot be sued. As an organization outside that framework, we provide paradoxical protection. Data providers have recourse through the courts if they can prove misuse or wrongful access of data by us.

These are the value-driven concepts we have identified. We suspect that more will emerge as we widen the scope of our activities and add new and novel types of data to our system. Stay tuned.

2012-04-02

Data, Data, Everywhere, But....

Those who know me well may look at the title of this post and think of Brent Spiner. Actually, I'm paraphrasing Samuel Taylor Coleridge's "Rime of the Ancient Mariner". It's cheesy, but does capture the challenges about getting access to the data we need in our project to provide early detection and situational awareness of threats to health. 

Biosurveillance and other factors in health threats.
But before we can get there, let's raise a happy topic--death. Or more properly, what sickens and kills us. All mortals die, and so must all of us one day. We hope to delay that and hope further to delay the illness that may be the proximate cause of our demise. So, I've listed the things that can kill us in the figure to the right. 

Most of these are easily accepted as contributors of threats to our health. "Time" may take a bit more explanation since it refers to the natural decay of our DNA as "errors" creep in over time when they replicate, causing all sorts of problems. In a general sense, this phenomenon is in line with the thermodynamic notion of entropy, where the state of order in a system declines over time. (When my daughters were children, I often joked that the entropy in them was strong. Mysteriously, they never laughed.)
 

The first five on the list are colored orange because they are of interest to us in our biosurveillance mission. The others are still in play, but outside our reference frame. We seek to detect the effects of these agents that may naturally cause or intentionally be used to cause illness and death. There are many places to look for them in data, but we might consider a variety of sentinels and contexts. These include the following:
Sentinels & Contexts
(I love that picture of the kids)

The five agents of illness and death are in play in the biosphere in the above sentinels as containers of pathogens or toxins, incubators supporting their growth, or vectors allowing them to spread. Each sentinel category could be considered as data awaiting collection. Our society generates data associated with each of these every day, all the time. 

The question is how we can gain access to those data and turn them into information. More on that in the next post. 

2012-03-08

What's a Picture Worth?

I must begin with a story. In 2010, when I left to take this position, Dean Barbara Rimer and my other friends at the UNC Gillings School of Global Public Health gave me an original iPad. I would not have bought one for myself. But, since they were so nice to give me one, how could I possibly refuse? I took it home and started playing with it (the best way to learn about any new technology). It was a couple of months later when I overheard my wife talking on the phone with one of her sisters about me and the iPad. As I recall, she said, "He's always got it with him.... No, really. He carries it all over the house reading email, the newspaper, and watching videos on it. I've never seen him so attached to a gadget." So, I'm hooked and thinking about buying the new iPad that was announced this week.

What's this got to do with the title of this entry? Well, it's connected to the marketing buzz of the new iPad. With the improved display, the device will show over 3 million pixels on the screen. Why is that important? In and of itself, it's not. But consider the question this way: How long does it take a human being to take in and process 4 million data points? Answer: A glance. We are superbly adapted to view images and use them to interpret our environment. Most of us can discern 1.2 million colors. We are visual creatures and have used that capacity to protect ourselves. A slight variation in color, pattern, or movement spelled the difference between our ancestors having lunch or being lunch.

The human ability to process loads of data in visual form is an often untapped ability in analytics. So much attention is rightfully placed on getting the data right, complete, and correct as well as making sure the math provides valid answers that we forget how to make that numerical value accessible to others. On our project, we have heard many times how important it is to deliver our information that allows a stressed leader to understand a complex situation quickly. In other words, at a glance.

Presenting images that are meaningful statistically is just the beginning, however. They also need to be easy to understand, which brings us to what David Ebert at Purdue University's DHS Center of Excellence for Visual Analytics calls the sociology of data visualization. We have to consider more than the usual IT person's notion of the distance between the screen and the back of the chair. We have to understand the values, training, perception, and situation of that person's encounter with our representation of the data. Doing so allows us to move beyond what we hope is a clear representation of the situation using analytics. If we are to be successful, each visualization of the information should tell a story.

A year or so ago, I toured the Emergency Operations Center at CDC. In their situational awareness room, they showed us an animated movie of the spread of influenza outward from a province in southeastern China to Hong Kong to the countries bordering the Pacific to the rest of the world. It was a dramatic representation of an epidemiological disaster that helped us understand how that disease spread geographically and how long it took. The downside with the presentation is that it was a visual representation of an outbreak in the 1950s. They did not have a similar animation of the 2009 H1N1 pandemic. The person who constructed the presentation did it in her spare time as a hobby, but they all liked it because it helped them understand a model of how the disease moved.

We need more such movies. And maps. And graphs. Otherwise we end up in a frightened frenzy.