Common data formats that we can work with include:
- Microsoft Access
- FileMaker Pro
- Microsoft Excel
- MySQL
- MYOB
- TAB or comma-delimited text files
What is data cleansing ?
Data cleansing or data scrubbing is the act of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, modifying or deleting this dirty data.
The actual process of data cleansing may involve removing typographical errors or validating and correcting values against a known list of entities. The validation may be strict (such as rejecting any address that does not have a valid postal code) or fuzzy (such as correcting records that partially match existing, known records).
Administratively, incorrect or inconsistent data can lead to false conclusions and misdirected investments on both public and private scales....Many companies use customer information databases that record data such as contact information, addresses, and preferences. If for instance the addresses are inconsistent, the company will suffer the cost of resending mail or even losing customers.
High quality data needs to conform to quality criteria, including:
- Accuracy
- Completeness
- Validity
- Consistency
- Uniformity
- Uniqueness (removing duplicates)