DCAI is an emerging field that focuses on engineering data to improve AI systems with enhanced data quality and quantity.
DCAI shifts our focus from model to data.
It is important to note that "data-centric" differs fundamentally from "data-driven", as the latter only emphasizes the use of data to guide AI development, which typically still centers on developing models rather than engineering data.
Many major AI breakthroughs occur only after we have the access to the right training data.
Large and high-quality training data are the driving force of recent successes of GPT models, while model architectures remain similar, except for more model weights.
When the model becomes sufficiently powerful, we only need to engineer prompts (inference data) to accomplish our objectives, with the model being fixed.
Our Talk at KDD 2023 (Use This Link if Having Issues in Loading)