Ten Root Conditions of Data Quality Problems
Ten Root Conditions of Data Quality Problems:
- Multiple data sources. Multiple data sources of the same information produce different values for this information. This can include values that were accurate at a given point in time.
- Subjective judgment in data production. Information production using subjective judgment can result in the production of biased information.
- Limited computing resources. Lack of sufficient computing resources limits accessibility to relevant information.
- Security/accessibility trade-off. Easy access to information may conflict with requirements for security, privacy, and confidentiality.
- Coded data across disciplines. Coded data from different functions and disciplines is difficult to decipher and understand. Also, codes may conflict.
- Complex data representations. Algorithms are not available for automated content analysis across instances of text and image information. Non-numeric information can be difficult to index in a way that permits location of relevant information.
- Volume of data. Large volumes of stored information make it difficult to access needed information in a reasonable time.
- Input rules too restrictive or bypassed. Input rules that are too restrictive may impose unnecessary controls on data input and lose data that has important meaning. Data entry clerks may skip entering data into field (missing information) or arbitrarily change a value to conform to rules and pass an edit check (erroneous information).
- Changing data needs. As information consumers’ tasks and the organization environment (such as new market, new legal requirements, new trends) change, the information that is relevant and useful changes.
- Distributed heterogeneous systems. Distributed heterogeneous systems without proper integration mechanisms lead to inconsistent definitions, formats, rules, and values. The original meaning of data may be lost or distorted as data flows and is retrieved from a different system, time, place, data consumer, for same or different purposes.
Lee, Yang et.al. Journey to Data Quality. Cambridge: The MIT Press, 2006. 80-81