In data science, a pattern of scandals has emerged. Volkswagen’s gaming of emissions data is the latest example.
In July, the CEO of Whole Foods Markets issued a mea culpa after the supermarket was found to have manipulated product data, over-stating the weight of pre-packaged produce and meats. Over the summer, controversy engulfed Ashley Madison, the social network for married people seeking other partners, as hackers managed toextract a huge amount of private data from the company’s servers. General Motors was also revealed to have hidden information about a faulty ignition switch that has been linked to over one hundred deaths.
While top managers take the fall for these scandals, none of the dubious activities could have happened without the active participation of technical teams. Besides engineers, software developers, and product managers, the burgeoning community of data scientists are also complicit in developing the concepts, algorithms, and software to enable the deception.
This story keeps coming back, because the industry treats it as a technological problem requiring a technological solution. Business managers are missing the real issue: the people who collect, store, manage and process our data are not being held to any ethical standards. The emerging data science discipline is expanding so fast that few workers are thinking about the ethical implications of their everyday actions.
From a data perspective, the news about Ashley Madison is the most cogent. This scandal may seem irrelevant to those who disdain the site’s shady business model, but you really should be paying attention. Here are five reasons why:
- Customers of the website presumably believe that the site owner has a strong desire to keep their data private. The website still fails to fend off hackers.
- Users who presume they are anonymous because they use pseudonyms on their profiles learn that data analysts have uncovered their identities via credit cards, and even stored the information in the databases.
- When customers ask for data deletion, even after these users pay the website to remove their data, the data continue to reside on the servers.
- Technologists discover that the programmers have made certain mistakes that allow over 10 million scrambled passwords to be decoded.
- After the hackers release the stolen data to the public, a horde of investigators immediately obtain the data, with the intention of discovering embarrassing personal details. These analysts see it as a rare opportunity to lay their hands on a massive, real-world dataset that typically is guarded tightly by businesses.
In these scandals, the cause of the problem is more human than technical. For example, credit card data do not show up uninvited to an enterprise data warehouse. After the data are stored, software is written to establish the link between a user’s pseudonym and his or her name and address. The technical staff is involved both in designing the linkage algorithms and developing the code for implementation. Development resources are secured based on a technical or business rationale.
At various companies, I have been a part of these conversations. Business and technical managers debate topics such as product innovation, user experience, resource requirements, competitive strategies, and return on investment. Except in rare cases, the ethics of these decisions are never broached. This neglect is typically due to lack of attention, awareness, or sensitivity. Sometimes, ethical concerns are dismissed in the same broad stroke that many companies dismiss their user-customers: if they don’t like what we are doing, they don’t have to use our service!
The recent scandals should bring a serious conversation in the business community about the ethics of data. People can hold different ethical standards but ignoring the issue altogether is no longer viable.
So what can be done? To start, every technical and data team should have on-boarding training that covers the ethics of using data. Exposing engineers and data scientists to the legal obligations set forth in various terms and conditions is a good place to start, but ethical practices need to go beyond that. A culture needs to be developed in which team members feel comfortable to bring up discussions about ethics.
When I reviewed the curricula of data science and analytics degree programs last year for a course I was developing, I found only one school that requires an ethics class. That needs to change, or more data scandals will emerge.
Kaiser Fung directs the MS in Applied Analytics at Columbia University and is author of Junk Charts, a blog devoted to the critical examination of data and graphics in the mass media. His latest book isNumber Sense: How to Use Big Data to Your Advantage. He holds an MBA from Harvard Business School, and degrees from Princeton and Cambridge Universities, and was an analytics leader at Vimeo, SiriusXM Radio and American Express.
IMAGE CREDITS: https://kdawpmedia.storage.googleapis.com/