Amadeus IT Group - Data Engineering Research for Interview Prep
Company Overview
Amadeus IT Group is a global technology company providing solutions for the travel industry (airlines, hotels, travel agencies, airports). Headquartered in Madrid, Spain, with a major R&D center in Bangalore (Amadeus Labs, 4,000+ professionals since 2008) and a primary data center in Erding, Germany.
1. Technologies & Data Engineering Stack
Core Data Platform
- Databricks Data Intelligence Platform on Azure - standardized data management platform
- Delta Lake - storage format for the Amadeus Data Mesh; teams build star schemas from raw business unit data
- Delta Sharing - standard data sharing format across the organization
- Unity Catalog - governance layer; broke down data silos, enabled cross-domain data combination
Programming & Processing
- Spark (Scala) - primary big data processing framework
- C++ - shopping engine (core transaction processing)
- Java - cloud-native applications
- Python - data science and analytics workloads
Data Infrastructure
- Apache Kafka - distributed event streaming
- Apache Airflow - workflow orchestration
- Hadoop HDFS - distributed storage (legacy/transitioning)
- Elasticsearch - distributed search engine
- Couchbase - NoSQL database
Monitoring & Observability
- Prometheus - monitoring
- Grafana - visualization and alerting
- Splunk - log management
Databricks-Specific Optimizations
- Predictive I/O
- Photon engine
- Deletion Vectors
- Partition Pruning
- Dynamic File Pruning
- Focus on reducing read-and-write amplification in Delta Lake pipelines
2. Cloud Strategy (Azure Focus)
Primary Cloud: Microsoft Azure
- 95% of production workloads now run in Azure
- 40% of total applications activated in public cloud (as of reporting)
- Described as "one of the biggest migrations in the Cloud done on Azure"
- Multiple European Azure regions: Ireland and Netherlands (primary), France, Sweden, Germany (satellite)
Azure Cobalt 100 VMs
- 90% of shopping engine runs on Cobalt 100 VMs (custom ARM-based CPUs by Microsoft)
- Migrated from VMware virtualization + containerized workloads in Erding data center
- C++ apps recompiled for ARM with minimal code changes
- Spark/Databricks jobs ran without code modifications on ARM
Performance Gains from Migration
- 8% reduction in response times
- 20% increase in throughput
Architecture Evolution
- Moved from traditional SOA to cloud-native, event-driven architecture
- Thousands of microservices
- Data Mesh architecture for decentralized data management
- Zero-trust security architecture - each application as independent security perimeter
- Separate encryption key management outside Microsoft's platform
- API-first platform strategy
Secondary Partnership: Google Cloud
- Partnership with Google for enriching cloud operations
- Generative AI innovation initiatives
3. Scale of Data
- 2 billion daily transactions at peak times
- 100,000 transactions per second (comparable to Google Search volume)
- Traffic per booking request increased from ~100 requests (15 years ago) to thousands today
- Primary traffic is shopping/search (high look-to-book ratio)
- Customers demand several years of historical data with high refresh rates
- Multiple acquisitions led to diverse systems and potential data silos (motivation for Databricks standardization)
4. Interview Process & Common Questions
Process Structure (3 Rounds)
- Recruiter Screen - Company and position overview
- Technical Round - Technical assessment with team members; architecture/system design for senior roles (45-60 min)
- Hiring Manager Round - Situational questions about requirements gathering and handling
Technical Assessment Areas
- SQL (heavily tested): JOINs, GROUP BY, aggregate functions, window functions, complex queries
- Data Structures & Algorithms: String manipulation, sorting algorithms (Python-focused)
- Machine Learning: Classification, ensemble methods, feature selection
- System Design: Architecture discussions and trade-offs (senior roles)
- Probability & Statistics: Hypothesis testing, distributions
Sample SQL Questions
- Calculate average booking price per month per airline using window functions
- Filter customers meeting multiple criteria (date, airline, hotel, spend comparisons)
- Explain foreign keys and referential integrity
- Calculate average departure delay per airline from scheduled vs. actual times
- Explain database indexes (primary, unique, composite, clustered)
- Count customers by email provider using LIKE
- Compare cross joins vs. natural joins
- Calculate daily cancellation rates for airlines in a given month
Salary Ranges (Reported, likely US-based)
- Data Engineer: ~$125K average base
- Data Scientist: ~$113K average base
- Software Engineer: ~$98K average base
Preparation Tips
- Practice SQL fundamentals extensively (WHERE, AND/OR/NOT, JOINs, window functions)
- Prepare behavioral responses using STAR method
- Study system design for senior roles (data mesh, event-driven architecture, ETL pipelines)
- Review travel industry data scenarios (booking data, flight schedules, cancellations)
- Understand Delta Lake, Unity Catalog, and Databricks concepts if targeting data engineering
5. Public Talks, Blogs & Case Studies
Databricks Case Study
- Amadeus | Databricks Customer Story - Standardizing data management on Databricks Data Intelligence Platform on Azure
Microsoft Case Study
- Amadeus uses Cobalt 100 VMs for 2 billion daily transactions - Details on Azure migration and ARM-based VM adoption
Conference Talks
- Data + AI Summit 2024/2025: Amadeus presented on "Drastically Reducing Processing Costs with Delta Lake" covering optimization of ETL pipelines with Databricks features
Press Coverage
- Computing UK: How Amadeus built a more personal travel experience - Interview with CTO about cloud-native transformation
Amadeus Blog
- Distributed Data Exchange Project (DXP) - Open-source, cloud-agnostic data exchange architecture
6. Key Talking Points for Interview
If interviewing for a Senior Data Engineer role at Amadeus, demonstrate knowledge of:
- Data Mesh on Delta Lake - Understanding of decentralized data ownership with centralized governance (Unity Catalog)
- ETL Pipeline Optimization - Photon, Predictive I/O, Deletion Vectors, partition pruning
- Event-Driven Architecture - Kafka-based streaming in cloud-native microservices
- Scale Awareness - 2B daily transactions, 100K TPS, years of historical data at high refresh
- Azure Ecosystem - Databricks on Azure, Cobalt VMs, multi-region European deployment
- Travel Domain - Look-to-book ratios, shopping engines, booking systems, flight/hotel data models
- Data Governance - Unity Catalog for breaking silos, regulatory compliance, zero-trust security
- Star Schema Design - Transforming raw multi-source data into analytical star schemas on Delta Lake