Data-centric AI vs Model-centric AI

What is Data-centric AI (DCAI)?

  • DCAI is an emerging field that focuses on engineering data to improve AI systems with enhanced data quality and quantity.
  • DCAI shifts our focus from model to data.
  • It is important to note that "data-centric" differs fundamentally from "data-driven", as the latter only emphasizes the use of data to guide AI development, which typically still centers on developing models rather than engineering data.
Data in GPT Models


  • Many major AI breakthroughs occur only after we have the access to the right training data.
  • Large and high-quality training data are the driving force of recent successes of GPT models, while model architectures remain similar, except for more model weights.
  • When the model becomes sufficiently powerful, we only need to engineer prompts (inference data) to accomplish our objectives, with the model being fixed.

Our Talk at KDD 2023 (Use This Link if Having Issues in Loading)


Daochen Zha

Machine Learning Engineer
Rice University, Airbnb

Henry Lai

Machine Learning Engineer

Fan Yang

Assistant Professor
Wake Forest University

Sirui Ding

PhD Student
Texas A&M University

Na Zou

Assistant Professor
Texas A&M University

Huiji Gao

Sr. Manager

Xia Hu

Associate Professor
Rice University


Surveys & General Resources


Training Data Development

Inference Data Development

Data-centric AI in Graphs

Data-centric AI in Finance