Autonomous Data Pipeline Orchestration

Yadwinder Singh Sandhu; Nirdesh Pachoriya; Aditya Rautaray; Bhaskar Reddy Kollu; Raghava Chellu; Sitaram Satapathy

doi:10.70917/ijcisim-2026-2522

Authors

Yadwinder Singh Sandhu Independent Researcher
Nirdesh Pachoriya Fidelity Investments, USA.
Aditya Rautaray Cloud Solution Architect, AIOps & MLOps Architect, Cybersecurity Specialist, CVS Health, Corporate Headquarters, One CVS Drive, Woonsocket, Rhode Island 02895, United States.
Bhaskar Reddy Kollu Enterprise Architect & Researcher, Dallas, Texas 75035, USA.
Raghava Chellu Support Engineer – Specialist, Equifax Inc., USA.
Sitaram Satapathy GIFT Autonomous, India.

DOI:

https://doi.org/10.70917/ijcisim-2026-2522

Keywords:

Autonomous orchestration, data pipeline, agentic data engineering, ETL, self-healing, data governance

Abstract

Data engineering is shifting from simple ETL scripting to a more supervisory approach with human engineers setting strategic objectives, quality requirements, and compliance limits, while automated agents creating, testing, deploying, and fixing data pipelines. The intent of this research is to create and test an Autonomous Data Pipeline Orchestration framework that takes high-level pipeline requirements and transforms them into deployable workflows, adds governance checks to the pipeline generation process, and performs self-healing actions at runtime when things go wrong. The static pipeline templates have been compared with the proposed agentic framework using a controlled benchmark for 48 pipeline runs across order, transaction, sensor, and clickstream workloads. Two main parameters have been chosen to analyze: deployment-ready pipeline generation and data-quality recovery that happens automatically during deployment. The result indicates that the agentic framework deployed at 87.5% while static templates deployed at 25.0%. It also added an average of 11.5 validation controls per run versus 2.0 controls in the baseline, and the number of defects that were not resolved dropped from 5,314.9 to 416.7 per 10,000 records. The results suggest that in a modern data engineering context, the use of autonomous orchestration can help minimize manual debugging tasks, enhance governance by design, and boost reliability.

Downloads

Download data is not yet available.

Autonomous Data Pipeline Orchestration

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

Information