Google and health data
Why the outcry?
Hi! This is a Sonder Scheme newsletter, written by me, Helen Edwards. Artificiality is about artificial intelligence in the wild; how AI is being used to make and break our world. If you haven’t signed up yet, you can do that here. If you like this newsletter, please share, especially on LinkedIn and Twitter.
I have written a whitepaper on AI ethics and governance-by-design. Download it here
This week, the WSJ published a story about Google gathering the personal health data of millions of Americans. Everyone jumped on the story, many were “shocked” and now there’s a federal inquiry into whether Google and the hospital system involved, Ascension, are fully compliant with US privacy law as it relates to health data. This is a fully loaded subject, with an article in The Guardian “I'm the Google whistleblower. The medical data of millions of Americans is at risk.”
Now if you’re a Google or an Ascension exec, or if you are an engineer or a doctor or a nurse involved in the project, the outcry may be perplexing. US law stipulating how health data are shared (HIPAA), is rigorous. It would be insanity to think that these regulations would have been consciously circumvented.
HIPAA makes it clear that data can be shared if it’s to be used for treatment, payment or operational improvement. The partnership between Google and Ascension is formalized under an arrangement called a Business Associate Agreement, which is a well-trodden path in the health industry. The BBA makes it clear for what purposes the data can be used and Google has explicitly stated that it will not use the data for selling advertising. Google even says “patient data cannot and will not be combined with any Google consumer data.” That’s a fairly clear signal upfront that Google won’t be adding this data directly into the AI that powers advertising.
Perhaps even more important is that this was far from secret; Google’s CEO explicitly called it out in its second quarter earnings call. “Google Cloud’s AI and ML solutions are also helping healthcare organizations like Sanofi accelerate drug discovery and Ascension improve the healthcare experience and outcomes.”
So it’s clear there was no secret “hoovering up” of American’s health data. So why the outcry?
Because this represents a kind of “context collapse,” which is a term used to describe what happens when information provided in one context ends up being used in another setting in an unexpected way by an unexpected user. It is severely privacy disruptive.
Transfer of private health data, which is personally identifiable, from a hospital to a technology company is a significant shift in context. The effect is even more so when the user of the data is considered. Instead of doctors, nurses and hospital or insurance company administrators, the data will be viewed by software engineers with no background in health care. We intuitively know this means that someone looking at the data isn’t thinking about people as patients, they are thinking about people as data points to be optimized. This makes it feel dehumanizing, even though it’s not technically different from having a bunch of software developers at an EHR (electronic health record) company look at personal data.
In addition, health data are not subject to something that is encoded in law in Europe’s GDPR; the right to be forgotten. There is no option to remove data from the system because there is no option to be treated in the hospital system without agreeing to the terms of data sharing and release.
So it not surprising that people on Twitter described this news as “shocking.”
Should we be worried?
The honest answer is “maybe.” Here’s why.
Google’s assurances of not using this data for anything other than operational system and quality of care improvement isn’t as binding or as clear-cut in the age of AI. Once the predictive models are built and tuned, they can be used in many contexts. A subset of the US population - a statistically significant one - is represented in this data. This data set can be used to create profiles of many, many types of patient - similar to how Cambridge Analytica created behavioral profiles of the US electorate off a modest sample. The richness of the health record will allow for very deep and nuanced inference building. These inferences can then be used anywhere. There is no information on whether the inferences that are built from this data set are protected as people’s personal data. In fact, even in privacy-centric Europe, the right to know the inferences that a technology company has about individuals is not clear. Basically, once personal data (input) becomes an inference (output), it’s not personal data anymore.
This means that this data could reasonably be used for models that, while not directly combined with consumer data, could well be used in Google’s broader AI. We’d never know. New AI techniques such as transfer learning mean that AI is increasingly opaque. The fact is that AI and big data now means that our information bleeds from one domain to another in unpredictable ways over time. There are no privacy laws that account for this.
If you’re Google, wouldn’t it be oh-so-tempting to use this data to understand how to discern whether an individual’s Google search for “breast cancer” is because they actually have cancer, know someone with cancer or watched a TV show about cancer? Wouldn’t it be tempting to be able to sell ads against this nuanced contextual information based on, what is effectively, a better understanding of someone’s search intent? There would be no need to combine health data with consumer data, only to transfer learning from one model (for health) to another (for search) to the ad system.
Another reason to worry is because of the potential for reducing obscurity. Better predictions on a person’s health, especially short term status, could be used in discriminatory ways - everything from refusing credit to denying someone a job. It’s not their health status today, it’s what the AI says is likely that matters. According to Dean Sittig who is a globally recognized expert in health informatics, the EHR is highly predictive of someone’s mortality in the next year. For anyone making any kind of investment decision in an individual, it would be difficult to resist the temptation to act on knowledge of, literally, a life-or-death prediction. Given how little we understand how data propagates though the credit systems in the US, and the current state of AI governance, adding more prediction behind the scenes is a recipe for proxy discrimination and algorithmic bias. With Google now announcing that it is entering the banking industry, the opportunity for algorithmic pattern recognition to creep into credit is conceptually real.
Should we be pleased?
There are many reasons to celebrate this arrangement. Google has some of the best AI in the world. The company’s strengths in AI could be particularly effective in health care. Unlike Amazon, which is focused on the e-commerce potential of health care, Google has broader objectives. While we can’t know exactly how much of Google’s aim is purely commercial and how much could be classed as “altruistic,” the company does have a strong pedigree in R&D and shares its AI science.
As anyone who has touched the health system in the US can attest, it’s pretty depressing. Form after form to fill out, all asking the same thing, surprise charges, EXORBITANT charges, incomplete information. We all crave change. Perhaps the worst part of the US system is the time-starved doctors, distracted by the screen that often sits between them and their patient.
In his excellent book, Deep Medicine, Eric Topol paints the stark reality of healthcare in metrics between 1975 and now. In 1975, time allotted for a doctor’s office visit was 60 minutes for new patients and 30 minutes for a return visit. Now, these figures are 12 minutes (new) and 7 minutes (return). He quotes Lynda Chin, from MD Anderson: “Imagine if a doctor can get all the information she needs about a patient in 2 minutes and then spend the next 13 minutes of a 15-minute office visit talking with the patient, instead of spending 13 minutes looking for information and 2 minutes talking with the patient.”
Topol’s book is a detailed account of the importance of time with a patient. There’s an important dynamic that has happened with healthcare - the interaction of information systems and people in the diagnostic and care process. In the drive for efficiency (better use of time, fewer errors, better flow of information, transparency of cost), the electronic health record has been placed front and center. Because the EHR was designed primarily for billing, it sucks as a real-time information tool. It has had the effect of turning physicians into people who do a lot of data entry and a lot of clicking to search for information. The promise of AI is to redress this balance and, as Topol says, “give doctors back the gift of time.”
From an AI perspective, Google is perhaps uniquely qualified.
According to Sittig, one of the biggest pain points in using the EHR is the shear amount of search friction. Say a physician wants to find a patient’s sodium result from a lab test. This requires multiple clicks - patient’s name > DOB > lab test > sodium, say. What if it was instead like a Google search? Type in one single bar “Name, Sodium.” Done. It’s intuitive, natural and just a bit 21st century. A Google-like search would also be a better basis for applying voice technologies, something that it currently difficult to do in a doctor’s office for a variety of reasons. Voice AI works best when the underlying information structure provides the AI with context. A more natural Google-enabled search sets the stage for voice as a more effective tool for computer/physician interaction.
This is a big deal. As Topol says, many consider that introducing computers to the doctor’s office has been an “abject failure.” Companies such as ScribeAmerica, which provide human scribes to preserve the human interaction between doctor and patient, are a patch - 20,000 scribes were employed in 2016 and this is forecast to grow to more than 100,000 by 2020. That’s one scribe for every 7 physicians! Topol describes a workflow where a voice AI captures the conversation - no keyboard in sight - produces the text, sends it to the patient for edit and then to the physician. One big upside is this would allow for a far more nuanced and accurate EHR; 80% of office notes are cut and pasted, with mistakes propagated along the chain. There are also many important details that never get captured and can offer important diagnostic clues. This is an obvious and active area of AI research: Topol points to a digital scribe pilot between Stanford and Google which combines various AI algorithms to synthesize the doctor’s note. Again, not a secret.
A 2017 JAMA article published a word cloud describing doctors. Rude. Hurried, Rushed. Busy. Uncaring. Impatient. This isn’t what doctors want and it’s definitely not what patients want. A lot of AI researchers imply that AI will have a greater role in end-to-end medical treatment than it actually will anytime soon. Some AI researchers have a bad habit of implying, say, that curing cancer is as easy as reading more mammograms, more accurately. Many AI scientists fundamentally misunderstand the human processes of medicine and oversimplify the role of AI. So what they miss is the real opportunity - time, context, cause, care.
I hope we hear more about Google’s involvement and that the AI engineers at Google do not try to tell us they are replacing treatment professionals but that they instead focus on showing us how they apply their AI smarts in a human-centered way. They need to demonstrate that they are not just technocrats trying to “disrupt” healthcare, that they are instead trying to build intelligence that enhances the expertise and human role of medical practitioners.
This case highlights a new risk - that we risk losing an optimistic view of technology, primarily because we are losing trust in Big Tech. The tech platforms have struggled to respond to even the most obvious drivers of pessimistic perceptions. Executives and leaders in Google, Facebook, Apple and Amazon are stuck in a mindset of denying the reality - that trust, the perception of good-faith dealing and responsible use of AI on personal data is the issue that will define success in new markets.
In the end, in the absence of regulation that responds to the unique position of digital platforms in our society, there’s no way of knowing whether this data will be used in unethical ways. Some of these risks are theoretical but the stakes are high. Right now, we do just have to trust them.
Other things this week:
The Apple Card saga continued to give all week. We followed it and wrote a bunch on it.
Article: What happened with the AI behind Apple Card?
Article: 15 things to know about Apple Card
Article: Apple, Goldman Sachs and sexist AI
Newsletter: Apple Card and algorithm anger
WSJ put out an article on Google’s search algorithm and how people bias it. It’s an interesting read because it goes into some depth about specific meetings and business processes where humans make the call. There really is no such thing as neutral tech.
Excellent article about AI ethics from the HBR.
Excellent article from VentureBeat on AI ethics and power. Very sound.
For anyone interested in AI, innovation and IP, the USPTO wants to hear your experience.
My talk on AI ethics and governance at the House of Beautiful Business.
I’d like to thank Dean Sittig for his time and insights for this newsletter. We had a super interesting discussion and I’m very appreciative. I’d also highly recommend Topol’s book for its balanced and humanistic view of medicine in the age of AI.
Thanks for reading.