About

Data-centric AI vs Model-centric AI

What is Data-centric AI (DCAI)?

  • DCAI is an emerging field that focuses on engineering data to improve AI systems with enhanced data quality and quantity.
  • DCAI shifts our focus from model to data.
  • It is important to note that "data-centric" differs fundamentally from "data-driven", as the latter only emphasizes the use of data to guide AI development, which typically still centers on developing models rather than engineering data.
Data in GPT Models

Why DCAI?

  • Many major AI breakthroughs occur only after we have the access to the right training data.
  • Large and high-quality training data are the driving force of recent successes of GPT models, while model architectures remain similar, except for more model weights.
  • When the model becomes sufficiently powerful, we only need to engineer prompts (inference data) to accomplish our objectives, with the model being fixed.

Our Talk at KDD 2023 (Use This Link if Having Issues in Loading)

Presenters

Daochen Zha

Machine Learning Engineer
Rice University, Airbnb

Henry Lai

Machine Learning Engineer
Apple

Fan Yang

Assistant Professor
Wake Forest University

Sirui Ding

PhD Student
Texas A&M University


Na Zou

Assistant Professor
Texas A&M University

Huiji Gao

Sr. Manager
Airbnb

Xia Hu

Associate Professor
Rice University

Resources

Surveys & General Resources

Blogs

Training Data Development

Inference Data Development

Data-centric AI in Graphs

Data-centric AI in Finance