🧱
Databricks
Amadeus IT Group - Data Engineering Research for Interview Prep
🧱
🧱
Databricks · Section 7 of 17

Amadeus IT Group - Data Engineering Research for Interview Prep

Amadeus IT Group - Data Engineering Research for Interview Prep

Company Overview

Amadeus IT Group is a global technology company providing solutions for the travel industry (airlines, hotels, travel agencies, airports). Headquartered in Madrid, Spain, with a major R&D center in Bangalore (Amadeus Labs, 4,000+ professionals since 2008) and a primary data center in Erding, Germany.

1. Technologies & Data Engineering Stack

Core Data Platform

  • Databricks Data Intelligence Platform on Azure - standardized data management platform
  • Delta Lake - storage format for the Amadeus Data Mesh; teams build star schemas from raw business unit data
  • Delta Sharing - standard data sharing format across the organization
  • Unity Catalog - governance layer; broke down data silos, enabled cross-domain data combination

Programming & Processing

  • Spark (Scala) - primary big data processing framework
  • C++ - shopping engine (core transaction processing)
  • Java - cloud-native applications
  • Python - data science and analytics workloads

Data Infrastructure

  • Apache Kafka - distributed event streaming
  • Apache Airflow - workflow orchestration
  • Hadoop HDFS - distributed storage (legacy/transitioning)
  • Elasticsearch - distributed search engine
  • Couchbase - NoSQL database

Monitoring & Observability

  • Prometheus - monitoring
  • Grafana - visualization and alerting
  • Splunk - log management

Databricks-Specific Optimizations

  • Predictive I/O
  • Photon engine
  • Deletion Vectors
  • Partition Pruning
  • Dynamic File Pruning
  • Focus on reducing read-and-write amplification in Delta Lake pipelines

2. Cloud Strategy (Azure Focus)

Primary Cloud: Microsoft Azure

  • 95% of production workloads now run in Azure
  • 40% of total applications activated in public cloud (as of reporting)
  • Described as "one of the biggest migrations in the Cloud done on Azure"
  • Multiple European Azure regions: Ireland and Netherlands (primary), France, Sweden, Germany (satellite)

Azure Cobalt 100 VMs

  • 90% of shopping engine runs on Cobalt 100 VMs (custom ARM-based CPUs by Microsoft)
  • Migrated from VMware virtualization + containerized workloads in Erding data center
  • C++ apps recompiled for ARM with minimal code changes
  • Spark/Databricks jobs ran without code modifications on ARM

Performance Gains from Migration

  • 8% reduction in response times
  • 20% increase in throughput

Architecture Evolution

  • Moved from traditional SOA to cloud-native, event-driven architecture
  • Thousands of microservices
  • Data Mesh architecture for decentralized data management
  • Zero-trust security architecture - each application as independent security perimeter
  • Separate encryption key management outside Microsoft's platform
  • API-first platform strategy

Secondary Partnership: Google Cloud

  • Partnership with Google for enriching cloud operations
  • Generative AI innovation initiatives

3. Scale of Data

  • 2 billion daily transactions at peak times
  • 100,000 transactions per second (comparable to Google Search volume)
  • Traffic per booking request increased from ~100 requests (15 years ago) to thousands today
  • Primary traffic is shopping/search (high look-to-book ratio)
  • Customers demand several years of historical data with high refresh rates
  • Multiple acquisitions led to diverse systems and potential data silos (motivation for Databricks standardization)

4. Interview Process & Common Questions

Process Structure (3 Rounds)

  1. Recruiter Screen - Company and position overview
  2. Technical Round - Technical assessment with team members; architecture/system design for senior roles (45-60 min)
  3. Hiring Manager Round - Situational questions about requirements gathering and handling

Technical Assessment Areas

  • SQL (heavily tested): JOINs, GROUP BY, aggregate functions, window functions, complex queries
  • Data Structures & Algorithms: String manipulation, sorting algorithms (Python-focused)
  • Machine Learning: Classification, ensemble methods, feature selection
  • System Design: Architecture discussions and trade-offs (senior roles)
  • Probability & Statistics: Hypothesis testing, distributions

Sample SQL Questions

  1. Calculate average booking price per month per airline using window functions
  2. Filter customers meeting multiple criteria (date, airline, hotel, spend comparisons)
  3. Explain foreign keys and referential integrity
  4. Calculate average departure delay per airline from scheduled vs. actual times
  5. Explain database indexes (primary, unique, composite, clustered)
  6. Count customers by email provider using LIKE
  7. Compare cross joins vs. natural joins
  8. Calculate daily cancellation rates for airlines in a given month

Salary Ranges (Reported, likely US-based)

  • Data Engineer: ~$125K average base
  • Data Scientist: ~$113K average base
  • Software Engineer: ~$98K average base

Preparation Tips

  • Practice SQL fundamentals extensively (WHERE, AND/OR/NOT, JOINs, window functions)
  • Prepare behavioral responses using STAR method
  • Study system design for senior roles (data mesh, event-driven architecture, ETL pipelines)
  • Review travel industry data scenarios (booking data, flight schedules, cancellations)
  • Understand Delta Lake, Unity Catalog, and Databricks concepts if targeting data engineering

5. Public Talks, Blogs & Case Studies

Databricks Case Study

Microsoft Case Study

Conference Talks

  • Data + AI Summit 2024/2025: Amadeus presented on "Drastically Reducing Processing Costs with Delta Lake" covering optimization of ETL pipelines with Databricks features

Press Coverage

Amadeus Blog

6. Key Talking Points for Interview

If interviewing for a Senior Data Engineer role at Amadeus, demonstrate knowledge of:

  1. Data Mesh on Delta Lake - Understanding of decentralized data ownership with centralized governance (Unity Catalog)
  2. ETL Pipeline Optimization - Photon, Predictive I/O, Deletion Vectors, partition pruning
  3. Event-Driven Architecture - Kafka-based streaming in cloud-native microservices
  4. Scale Awareness - 2B daily transactions, 100K TPS, years of historical data at high refresh
  5. Azure Ecosystem - Databricks on Azure, Cobalt VMs, multi-region European deployment
  6. Travel Domain - Look-to-book ratios, shopping engines, booking systems, flight/hotel data models
  7. Data Governance - Unity Catalog for breaking silos, regulatory compliance, zero-trust security
  8. Star Schema Design - Transforming raw multi-source data into analytical star schemas on Delta Lake