What is Data Transformation
Converting data from one format to another
Data Transformation is the process of converting data from its source format or structure to a target format for analysis, integration, or storage.
Types of Transformations
- Structural — changing data schema (normalization, denormalization)
- Format — converting between formats (JSON, XML, CSV)
- Semantic — mapping to unified reference codes
- Aggregation — grouping and summarizing data
- Cleansing — removing duplicates, fixing errors
ETL/ELT Processes
Transformation is a key step in ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) pipelines for loading data into warehouses.
Tools
- Apache Spark, Apache Beam
- dbt (data build tool)
- Talend, Informatica
- Python (pandas, PySpark)
Quality transformation ensures data consistency and readiness for analytics.