![]() The Airflow needs a little learning curve (Python, Airflow Operator Syntax) in terms building your pipeline. ![]() Where They Best Fit Inīoth the services are the best used for ETL and ML pipelines. Instead, will focus more on my experience using these two products and where they will fit into.Īlso, its worth to mention this article focus on the Step Functions ETL and ML pipeline capabilities - as Step Function is way more powerful for many web application and business workflow use-cases, we are not covering those capabilities here as Airflow doesn’t compete in those areas. I will not talk about the technical differences between these two great products. Here, is my perspective on both the great tools. But, as the team was evaluating the suitable options, we brought AWS State Machine into the fix to see how it’s going to help over AWS Managed Airflow. So, the team was considering the AWS Managed Airflow as the ETL orchestration tool. Well, very recently I started with a new Team and the team was moving from their legacy spark 2.1 self-managed monolithic processes to AWS EMR and serverless architecture (lambda, etc.) to improve the latency and get rid of the monolithic Spark ETL process (if you are wondering how the monolithic structure looks, see my repo here ). Our team’s work is Airflow heavy and we have 1000+ complex airflow jobs to process our ETL and ML Pipeline. I work for a major streaming app’s Data Team. Before I shoot my perspective, here’s some context.
0 Comments
Leave a Reply. |