Autonomous Data Pipeline Orchestration
DOI:
https://doi.org/10.70917/ijcisim-2026-2522Keywords:
Autonomous orchestration, data pipeline, agentic data engineering, ETL, self-healing, data governanceAbstract
Data engineering is shifting from simple ETL scripting to a more supervisory approach with human engineers setting strategic objectives, quality requirements, and compliance limits, while automated agents creating, testing, deploying, and fixing data pipelines. The intent of this research is to create and test an Autonomous Data Pipeline Orchestration framework that takes high-level pipeline requirements and transforms them into deployable workflows, adds governance checks to the pipeline generation process, and performs self-healing actions at runtime when things go wrong. The static pipeline templates have been compared with the proposed agentic framework using a controlled benchmark for 48 pipeline runs across order, transaction, sensor, and clickstream workloads. Two main parameters have been chosen to analyze: deployment-ready pipeline generation and data-quality recovery that happens automatically during deployment. The result indicates that the agentic framework deployed at 87.5% while static templates deployed at 25.0%. It also added an average of 11.5 validation controls per run versus 2.0 controls in the baseline, and the number of defects that were not resolved dropped from 5,314.9 to 416.7 per 10,000 records. The results suggest that in a modern data engineering context, the use of autonomous orchestration can help minimize manual debugging tasks, enhance governance by design, and boost reliability.