Cross Chain Control Collaboration (4C)

Survey on methods for data correction, data clustering, and data obfuscation

Large databases are common nowadays. The data that is stored in those databases can be of great help for explanatory research and making decisions to approach the ideal situation for the owner or user of the database. In order to come to explanations of behaviour and decision that will lead to improvement, analysis of the data is required. Poor data quality can lead to misleading data analysis and incorrect decision making (J. Chen, W. Li, A. Lau, J. Cao, K. Wang , 2010). Therefore, it is important that data are complete and give a correct representation of reality. Unfortunately, this is often not the case (J.R. Carpenter, M.G. Kenward, S. Vansteelandt, 2006); databases contain
incorrect data values or miss certain data. Incorrect values can be entered into a database on purpose (e.g. in case the real value of the observation is unknown but the observer is forced to fill in a value) or accidentally (i.e. typing or measurement errors). Observations can be missing by design or because, for one reason or another, the intended observations were not made (J.R. Carpenter, M.G. Kenward, S. Vansteelandt, 2006).

Be the first to comment