View my profile


Sine Qua Non

Sine qua non -- a Latin expression that literally translated means "without which not" is the theme for this post. It's often defined as "something absolutely indispensable or essential." It's a bit of Latin I learned in high school. 

My first year Latin teacher was Father Bernard, a burly man who taught us classical Latin with a hardbound book usually administered to our skulls when we made a mistake. My second year teacher, Father Terence, chain-smoked cigarettes in every class. (Very different times, indeed.) Every Lent, however, he took a vow of tobacco abstinence for 40 days until Easter. The joy of Easter had a special meaning for the class.

Nevertheless, despite the challenges of book-wheeling and nicotine-deprived teachers, I did learn a bit of Latin. I learned enough to appreciate the literal meaning of sine qua non with its sudden stop at the end. But it was more than just a negative, it means an absence -- nothingness -- rather than negation. Whatever it was, if you did not have it, you had nothing. 

That truth pertains in our project for without data we have nothing. Yet data owners hold on to their data because they consider them very valuable or dangerous if seen by outsiders. So, how do we gain access to something so precious to its owners that they do not want anyone else to see it? Companies and agencies view data as vital to their existence. It is often the power that informs and guides their operations. Data relating to health or business issues are even more closely held because of concerns about privacy or competitiveness. The question is, how can we crack the code to gain access? After a few years working in this area, there seem to be three major schools of thought.

First, require it by law. Quite often people think that passing a law can solve a problem. It can be helpful, but experience shows that it's difficult to achieve uniform compliance. I'll avoid the issue of speed limit compliance and focus on an area more germane to our mission: notifiable diseases. These are diseases that healthcare personnel are legally obligated to report to public health authorities, they form far from a perfect score. A number of studies have investigated what happens. One classic study (Rosenberg ML, Marr JS, Gangarosa EJ, et al. Shigella surveillance in the United States, 1975. J Infect Dis 1977;136:458–6) looked at the compliance of reporting a Shigella outbreak (it's a bacteria that causes severe gastrointestinal distress--don't ask--that is often spread by contaminated food). Here's the breakdown of cases and reporting: 
  • 100 people infected 
  • 76 became symptomatic
  • 28 consulted a physician
  • 9 submitted stool cultures
  • 7 had positive culture results
  • 6 were reported to the local health department
  • 5 five were reported to CDC
Why the attrition? Other studies have observed many reasons such as concerns for patient privacy, the difficulty to process the report, the type of disease involved, and the lack of understanding by the clinician of the public health value of reporting. A countervailing factor is when the clinician sees the value of reporting to serve their patients better. 

Second, rely on volunteerism. Perhaps appealing to people's better natures will encourage them to provide data. That approach has presented its own challenges by asking the data provider to do the lifting with no pragmatic incentive. It takes extra effort since they don't have the motivation that it's required. It relies on sharing the mission with the organization or agency who seek the contribution. Moreover, voluntary efforts are often more successful with only the most serious cases being reported. Even when the data is considered valuable and important, such as ILINet (the voluntary influenza like illness reporting network), providers still have to spend up to 30 minutes to compile the report -- a lot of time for a busy clinician. If legal requirements or broad calls for voluntary participation fail to achieve full and timely data, perhaps looking more closely at the data owners is the path forward. 

Third, rely on data providers' values. Understanding a potential data providers’ values seems to offer a more successful approach to gaining access to data. The value proposition can span tangible to intangible values -- all are usually in play. Tangible values draw upon the pragmatic side of our organizations. Yet, intangible values can work, such as appeals to the public good -- even patriotism. A value proposition that centers on the data provider leverages volunteerism by offering a range of approaches from the intangible to the pragmatic. Since it's never pure, in our experience some combination of the following seems to work:
  • Helping a group fulfill its mission and goals with information derived in whole or in part by their data can be a powerful incentive to participate. 
  • All organizations want to optimize their operations. For example, health care providers are under constant pressure to deliver services more efficiently. Improved and earlier situational awareness can directly benefit them in their response. 
  • Many organizations are more concerned with risk management when thinking about health issues. The food industry is a case in point. Their value proposition is partly to manage risks to their bottom line from contaminated foods. Earlier warnings, even non-specific ones, can offer them greater knowledge of the situation and guide their efforts to protect their firms. Food retailers detecting and responding to food safety issues can avoid unwanted attention from regulators, courts, governments and media. Companies have lost millions in single instances of a food contamination outbreak. At the same time, by addressing the issues of tainted foods, they have also helped protect the public’s health. A classic win-win. 
  • All organizations recognize the value of their data and many will be receptive to the notion that participation with our effort can increase the utility and value of their data. Deriving information across an industry with data drawn from many separate companies can put the individual firms’ data in context as well as offer unexamined or unlocked value of the current data. By combining multiple sources, individual data providers feel more comfortable in sharing their data. 
  • Many organizations are open to the notion that information plus their data helps them develop new tools and capabilities. 
  • Many organizations are open to opportunities where our mission can relate to theirs. They can leverage our use of their data to create new products and services to expand their businesses. 
  • Finally, the obvious one. We can pay for access to data. With all these other incentives in the pipeline, however, this should be our last and least resort.
Our project's approach is to leverage the value that data providers perceive in their data and open them to new possibilities inherent in their data that they usually cannot unlock. Our efforts touch on three value concepts inherent in our operation.

We believe that the first order of business is to understand and serve the value proposition of the data owners on their terms. Our initial engagement we call data discovery. We provide an initial data quality assessment to the data provider. This is partly our process of learning about their data (What to all the elements mean? Do we have a complete set? How are the data used in the provider’s operation? How often are the data generated/collected? Etc.). We have found that delivering a data quality assessment on many millions of records within days sends a message of how much we value their data. Next, we work with the data providers’ subject matter experts to understand their data, its purpose to them, how they use it, and what the values mean. During this step, we often begin applying our analytics tools to understand norms and baselines, so we can work with them to explain and understand the anomalies that surface. Finally, we can take the opportunity to conduct limited-scope investigations to address small questions or issues that the provider has. These have often related to their uses of the data and can prompt their thinking about future capabilities -- even products -- derived from their data.

Of course, our purpose is delivering earlier situational awareness of health threats. We have lots of powerful technology to speed access and process of the data and rapidly produce signals without the usual data noise. Our goal is to achieve real-time response to delivered data. But the value to the data provider is being able to integrate data across similar and dissimilar domains. Because we can deliver earlier signals than other approaches, these organizations can address issues as they appear and avoid depending on the public media for situational awareness.

Finally, our third value is serving as a trusted-third party and a data safe haven. Governmental agencies often have regulatory or oversight functions that they must execute. Moreover, with exceptions such as human clinical or student data, many forms of data are potentially discoverable by outside parties under the Freedom of Information Act of 1966 and its extension to electronic records in 1996. While these are right and necessary for governments in our democracy, they can chill the willingness of a company or organization to share their data with governmental agencies. Moreover, the federal and state governments assert the legal doctrine of sovereign immunity—meaning that they usually cannot be sued. As an organization outside that framework, we provide paradoxical protection. Data providers have recourse through the courts if they can prove misuse or wrongful access of data by us.

These are the value-driven concepts we have identified. We suspect that more will emerge as we widen the scope of our activities and add new and novel types of data to our system. Stay tuned.