What is Data Quality
Ensuring data accuracy and completeness
What is Data Quality
Data Quality is a set of data characteristics that determine its fitness for use in business processes and analytics.
Data Quality Dimensions
| Dimension | Description | |-----------|-------------| | Accuracy | Correspondence to real world | | Completeness | Completeness of population | | Consistency | Consistency across systems | | Timeliness | Currency and freshness | | Validity | Conformance to business rules | | Uniqueness | Absence of duplicates |
Types of Checks
- Schema validation — structure verification
- Range checks — values within allowed limits
- Pattern matching — format conformance
- Referential integrity — relationship integrity
- Business rules — business logic
Tools
| Tool | Type | |------|------| | Great Expectations | Python framework | | dbt tests | SQL-based | | Apache Griffin | Open-source | | Talend DQ | Enterprise | | Soda Core | Modern DQ |
Quality Metrics
- Data Quality Score (DQS)
- Error rate by field
- Completeness percentage
- Freshness (time since last update)
Implementation Practices
- Data profiling at ingestion
- Automated checks in pipeline
- Alerting on quality degradation
- Data stewardship processes
- Data dictionary documentation