Unlock Exclusive Content

Share your details to access the content and stay informed with relevant updates.
By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data engineering
Data engineering
CRM
Process intelligence
CLM
Work with us

Modernizing File Transfer with Amazon MWAA

May 22, 2026
8 min.
data engineering
Author
Alexander Klimovich, Cloud Architect, T1A

The Problem

Data landscape of one of our clients was built around a legacy on-premises Managed File Transfer (MFT) platform, handling data movement between AWS cloud, on-prem, and external systems. It worked, but it came with a growing list of friction points:

  • High licensing costs
  • Operationally heavy: requires a dedicated team with specialized knowledge to support and manage
  • Not cloud-native: increasingly a mismatch with the architecture direction of adopting cloud-native technologies, such as AWS.

The client reached out to us for ideas for a modernized alternative for the MFT platform, and we got to work.

Exploring Solutions

We didn't jump straight to a solution, but started with evaluating options:

After cairfully evaluating the options, we decided on Airflow - based on combination of cost, developer experience, and managed operations.

Airflow? Isn't That Old News?

Apache Airflow has been around since 2014, born at Airbnb to tame increasingly complex data workflows. It's a mature, battle-hardened, open-source orchestration platform with over a decade of production use across the industry.

So why does it feel like a fresh choice? Because Amazon MWAA has changed the game.

For years, Airflow's main friction was operational: you had to host it, scale it, patch it, and maintain it. That overhead was a real barrier for teams without dedicated platform engineering resources. MWAA removes that barrier entirely, by providing a cloud-native managed version of Airflow**.**

What Is Apache Airflow?

The core concept is the DAG (Directed Acyclic Graph) - a Python-defined workflow that specifies:

  • What to run (tasks: extract, transform, load, transfer...)
  • When to run it (cron, event-based, asset-dependency)
  • In what order (task dependencies)
  • How to handle failures and retries

DAGs live in S3, are parsed on startup, and refresh periodically. The entire pipeline logic is code - versionable, reviewable, testable. No proprietary GUI scripting. No vendor lock-in on the logic layer.

MWAA: Managed & Scalable

MWAA is what takes a great-but-operationally-heavy tool and makes it viable for modern cloud teams. Under the hood, the architecture is fully AWS-managed. The core components are:

  • Web Server: The Airflow UI, load-balanced and managed by AWS, accessible via a private or public endpoint
  • Scheduler: Continuously parses DAGs from S3, triggers task execution based on schedules and dependencies
  • Workers: Auto-scaling compute (via AWS Fargate) that actually executes your tasks
  • Metadata Database: A managed Aurora PostgreSQL instance storing DAG runs, task states, logs metadata, and connections
  • S3 Bucket: Your DAGs, plugins, and requirements live here; MWAA watches it and syncs automatically
  • CloudWatch: All logs (scheduler, workers, webserver) are streamed here out of the box
  • Secrets Manager / SSM: Native integration for managing connections and variables securely, without storing credentials in the UI

As a result, MWAA provides enterprise-grade infrastructure with none of the operational burden.

Connectors: The Batteries Are Already Included

One of Airflow's most underappreciated strengths is its Provider ecosystem. Providers are installable packages that ship with pre-built Operators, Hooks, and Sensors - meaning the integration code you'd normally have to write from scratch is already done.

Out of the box (or with a single line in requirements.txt), you get connectors for:

  • AWS: S3, Redshift, Glue, EMR, Lambda, SNS, SQS, Athena, and more
  • Databases: PostgreSQL, MySQL, MSSQL, Snowflake, BigQuery
  • File Transfer: SFTP, FTP, and filesystem operations
  • HTTP / REST APIs: Generic HTTP operator for any web service
  • Notifications: Slack, email, PagerDuty

For example, a typical file transfer task in Airflow looks something like this:

No custom SSH handling, no manual boto3 session management - the heavy lifting is pre-implemented. You configure the connection credentials once in the Airflow UI (or Secrets Manager), and reuse it across every DAG.

This was a decisive factor for us: we weren't building integration code, we were assembling proven building blocks.

The Airflow UI

One underrated aspect of Airflow is the UI. For a file-transfer replacement use case, visibility matters. With Airflow's UI, our team can:

  • Monitor DAG schedules and execution history
  • Drill into task-level logs
  • Trigger manual runs or re-run failed tasks with a single click
  • Manage connections and variables centrally

Having a feature-rich UI isn’t just a win for engineers, it was also a win for operators and support teams who needed visibility without touching code.

The Delivery Gap

Here's the honest part that most tech articles skip: the build is the easy part now. The actual effort looked like this:

Resource Provisioning → Tech Build → Operationalization

"Operationalization" meant: demos, architecture approvals, security reviews, production support sign-offs, responsibility matrices, data privacy impact assessments and many other hurdles.

The engineering effort was a fraction of the total delivery timeline. If you're planning a similar migration, budget time for the organizational effort - it's where projects actually slow down.

Key Takeaways

  1. Don't overlook proven technology: maturity is a feature, not a flaw.
  2. MWAA is the missing piece that makes Airflow self-hosting hesitation irrelevant.
  3. Python-based DAGs give you flexibility that no GUI-based MFT tool can match.
  4. The non-technical delivery overhead is real: plan for it from day one.
No items found.

Get in touch

Schedule a call for

Or fill in the form

By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Get in touch

Schedule a call for
Or fill in the form
By sending this form you agree to our Privacy Policy
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Please turn your phone to see the form
Data engineering
CRM
Process intelligence
CLM
work with us