Moving to college experience: Packed belongings in suitcases and cardboard packages on campus, representing our article unpacking the new data provenance standards.
News Analysis

AI Transparency: Unpacking the New Data Provenance Standards

6 minute read
Dom Nicastro avatar
SAVED
From dataset origins to AI applications: How the new Data Provenance Standards are setting a benchmark for responsible AI deployment.

The Gist

  • Cross-industry collaboration. Developed by experts from top organizations, new data provenance standards aim to ensure transparent and ethical AI data use.
  • Comprehensive metadata coverage. The standards encompass key aspects like source, legal rights, and data lineage, promoting operational efficiency and regulatory compliance.
  • Future impact on business and ethics. Adoption of these standards could lead to enhanced customer trust, innovation in marketing and responsible AI development.

What to do with all this artificial intelligence and data. One industry group thinks it has a good solution. A Data Provenance Standards initiative by the Data & Trust Alliance announced Nov. 30 introduces eight standards to bring transparency to dataset origins for data and AI applications. These proposed standards, developed by experts from 19 organizations, aim to help companies verify data trustworthiness and suitability for use.

A transparent robot head in a piece about AI transparency and data provenance.
A Data Provenance Standards initiative by the Data & Trust Alliance announced Nov. 30 introduces eight standards to bring transparency to dataset origins for data and AI applications.Balerina Stock on Adobe Stock Photos

And it just so happened to debut one year from the public release of ChatGPT, OpenAI's generative AI chatbot that took AI mainstream — and also woke the world up to the ethical use of such technologies.

Who Created These Data Provenance Standards?

The proposed standards were developed by data, AI, ethics, compliance and legal experts from Data & Trust Alliance companies including:

  • AARP
  • American Express
  • Deloitte
  • Howso
  • Humana
  • IBM
  • Kenvue
  • Mastercard
  • Nielsen
  • Nike
  • Pfizer
  • Regions Bank
  • Transcarent
  • UPS
  • Walmart
  • Warby Parker.

All are members of the Data & Trust Alliance, a not-for-profit, cross-industry consortium that develops practices for the responsible use of data and AI.

“As businesses scale and accelerate the impact of AI with trusted data, it is necessary to ensure the technology is developed and deployed responsibly,” Rob Thomas, senior vice president, software and chief commercial officer, IBM and chair of the D&TA Data Provenance initiative, said in a statement. “These practical data provenance standards, co-created by senior practitioners across industry, are designed to help ensure AI workflows are not only compliant with ever-changing government regulations and free of bias, but also developed to generate increased business value. While the standards may not address every application of AI, we believe they fill an important, longstanding need.”

Related Article: Ethical AI in Practice: Shaping a Better Future

Learning Opportunities

What Are the 8 Data Provenance Standards?

The Data Provenance Standards cover metadata on source, legal rights, privacy, generation date, data type, method, intended use, restrictions and lineage, including a unique metadata ID for tracking:

  1. Lineage: Identifiers or pointers of metadata representing the data which comprise the current dataset
  2. Source: Identifies the origin (person, organization, system, device, etc.) of the current dataset
  3. Legal rights: Identifies the legal or regulatory framework applicable to the current dataset, along with the required data attributions, associated copyright or trademark and localization and processing requirements
  4. Privacy and protection: Identifies any types of sensitive data associated with the current dataset and any privacy enhancing techniques applied
  5. Generation date: Timestamp marking the creation of the current dataset
  6. Data type: Identifies the data type contained in the current set, and provides insights into how the data is organized, its potential use cases and the challenges associated with handling and using it
  7. Generation method: Identifies how the data was produced (data mining, machine-generated, IoT sensors, etc.)
  8. Intended use and restrictions: Identifies the intended use of the data and which downstream audiences should not be allowed access to the current dataset

What's Behind the Data Provenance Standards?

Here's the crux behind these proposed Data Provenance standards. 

  • What's the goal of the proposed standards? The standards are designed to improve operational efficiency, regulatory compliance and value generation, and are currently in the testing phase across various industries.
  • What's the Alliance calling on practitioners to do? The Alliance encourages practitioners to review and contribute to the development of these standards, which are expected to be released in early Q2 2024.
  • Introduction of cross-industry data provenance standards: A new set of standards has been proposed to bring transparency to data's origin, which is crucial for enhancing the trustworthiness of data and AI applications across various industries.
  • Impact on operational efficiency: The standards will allow companies to understand the source, history, and rights of their data, aiming to reduce the time and resources currently spent on data preparation and cleansing.
  • Expert collaboration: The standards are developed by a group of experts from diverse fields and top companies under the Data & Trust Alliance, emphasizing the collective expertise and interdisciplinary approach.
  • Compliance and bias concerns: The standards are expected to aid in compliance with regulations and the reduction of bias, contributing to responsible AI development and increased business value.
  • Metadata utilization: The proposed standards will make critical information about data origin and rights more accessible through metadata, thus enabling better data management and decision-making.
  • Facilitating data exchange and collaboration: By adopting these standards, businesses can improve collaboration with data partners and make more informed decisions about data usage.
  • Testing and input solicitation: The standards are currently being tested in various use cases, and the Alliance is seeking input from practitioners to refine and enhance them.
  • Long-term vision: The Alliance aims to release the first version of these standards in 2024, suggesting a forward-thinking approach to data governance.

Related Article: Ethical AI Principles: Balancing AI Risk and Reward for Brands & Customers

How the Data Provenance Standards Impact Marketing and Customer Experience

Marketing leaders like CMOs and customer experience leaders should care about these developments for several reasons, including potential outcomes:

  • Enhanced customer trust: Transparency in data provenance can significantly increase customer trust in AI-driven products and services.
  • Regulatory compliance: As regulations around data and AI evolve, adherence to these standards could ensure companies remain compliant, avoiding potential legal and financial repercussions.
  • Improved decision-making: Accurate and transparent data will lead to better business decisions, which is essential for customer experience leaders who rely on data to understand and improve customer journeys.
  • Competitive edge: Companies that embrace these standards could gain a competitive advantage by demonstrating their commitment to ethical AI and responsible data use.
  • Operational efficiency: Clear standards will streamline data handling processes, reducing redundancy and inefficiency, which is beneficial for marketing operations.
  • Innovation and collaboration: These standards could foster a culture of innovation and collaboration within and across industries, leading to better customer insights and product offerings.

"Understanding where data comes from is critical for marketers and customer experience professionals in the rapidly evolving digital landscape that brings risks and opportunities," Kristina Podnar, senior policy director for the Data & Trust Alliance, told CMSWire. "These standards will help ensure the data driving campaigns and content is accurate, reliable and legally sourced. By prioritizing data provenance, marketers will enhance the credibility and effectiveness of their strategies and foster greater trust with audiences."

Podnar added it’s essential for professionals leveraging data to integrate these standards into their workflows. This includes, she added, requiring third-party data providers to produce metadata associated with the standards as a view into the data provenance and to continually educate themselves about the origins and implications of the data they use.

"Doing so," Podnar said, "will help safeguard the integrity of their campaigns, protect the brand, and address the growing consumer demand for transparency and ethical data practices."

About the Author

Dom Nicastro

Dom Nicastro is managing editor of CMSWire and an award-winning journalist with a passion for technology, customer experience and marketing. With more than 20 years of experience, he has written for various publications, like the Gloucester Daily Times and Boston Magazine. He has a proven track record of delivering high-quality, informative, and engaging content to his readers. Dom works tirelessly to stay up-to-date with the latest trends in the industry to provide readers with accurate, trustworthy information to help them make informed decisions. Connect with Dom Nicastro:

Main image: Ungrim