Skip to content

Introduction to Data Analysis

By Adesile Ajisafe, PhD CEng MIMechE

Data analysis is the process of inspecting, cleaning, transforming, and modelling data with the objective of discovering useful information, arriving at conclusions, and supporting the decision-making process. Generally data analysis in statistics are divided into exploratory data analysis (EDA) and Confirmatory data analysis (CDA).

What is EDA?
EDA involves the analysis of datasets based on various numerical methods and graphical tool, exploration of data for patterns, trends, underlying structure, deviations from the trend, anomalies and strange structures. EDA facilitates discovering unexpected as well as conforming the expected.

EDA steps
• Generate good research questions
• Data restructuring: You may need to make new variables from the existing ones
• Try to understand the data structure, relationships, anomalies, unexpected behaviours.
• Try to identify confounding variables, interaction relations and multicollinearity, if any.
• Handle missing observations
• Decide on the need of transformation (on response and/or explanatory variables).
• Decide on the hypothesis based on your research questions.

What is CDA?
Confirmatory Data Analysis is the part where you evaluate your evidence using traditional statistical tools such as significance, inference, and confidence.
Confirmatory Data Analysis involves things like: testing hypotheses, producing estimates with a specified level of precision, regression analysis, and variance analysis.

CDA steps
• Defining individual construct.
• Developing the overall measurement model theory.
• Designing a study to produce the empirical results.
• Assessing the measurement model validity.

Differences between EDA and CDA

In summary Confirmatory data analysis (CDA) and exploratory data analysis
(EDA) are similar techniques, but in exploratory data analysis (EDA), data is simply explored and provides information about the numbers of factors required to represent the data. In exploratory factor analysis, all measured variables are related to every latent variable. But in confirmatory data  analysis (CDA), analysts can specify the number of factors required in the data and which measured variable is related to which latent variable. Confirmatory data analysis (CDA) is a tool that is used to confirm or reject the measurement theory. In reality, exploratory and confirmatory data analysis are not performed one after another, but continually intertwine to help you create the best possible model.