Basics of Factor Analysis for Data Scientists

This tutorial would cover Exploratory Factor Analysis.

We would cover the following sub-topics:

  1. What is Factor Analysis(FA)
  2. Difference between Exploratory FA and Confirmatory FA
  3. Application of Factor Analysis
  4. Steps of Factor Analysis
  5. Investigating Correlation

In the second part, we would consider Rotations. We would also compare Factor Analysis with Principal Components Analysis. Then we introduce Common Factors Analysis. Finally we would take an example of Factor Analysis


1. What is Factor Analysis

Factor Analysis, FA is a statistical procedure used to make inference about unobservable quantities in a given data. Factor Analysis investigates the correlation or interrelationship among variables. These unobservable variables are also known as latent variables.

The goal of FA is to describe the correlation between the the measured features in terms of variation in few underlying factors. It identifies groups of variables or items that share a common feature.


2. EFA vs CFA

Exploratory factor analysis (EFA) is used by research scientist to find structure among a set of variables, or as a data reduction method. The number of factors/components is not specified beforehand.

Confirmatory Factor Analysis (CFA) is used when a hypothesis is on ground and is based on theory or existing research about the number of dimensions that underlie the data as well as which variables are measuring each dimension. Here, the researchers already have an expected (hypothesized) structure of the data. So the purpose of CFA is to determine the extent to which the given data fits the expected structure.


3. Application of Factor Analysis

The main application of factor analysis is:

  1. To reduce the dimension of data. That is reduce the number of variables
  2. To detect the structure of relationship between the variables.


4. Steps of Exploratory Factor Analysis

The following are typical steps followed in carrying out EFA.

  • Select variables
  • Determine Sample size: should be between 50 and 100
  • As a general rule: at least 5 cases/variables should be used per variables
  • Compute the correlations between the variable  included. This is a correlation matrix. So, for each variable, we find its correlation with every other variable
  • Extraction: identify the dimension of the data. Identify the groups of similar items
  • Rotation: Series of computational procedures aiming to optimize the factors
  • Interpretation

Some things to consider

  • Multivariate normality
  • Homoscedasticity
  • Linearity
  • Continuous variable


5. Investigating Correlation

Factor Analysis would yield good results only if there is strong correlation between the variables included in the study. This means that before, perform factor analysis, we need to measure the correlation between the variables.

Ways of investigating the correlation between the variables includes:

  • Measure of Sampling Adequacy (MSA).
  • Bartlett Test for Sphericity

Measure of Sampling Adequacy(MSA)

This is a test that gives us a value that explains the correlation among variables. One of such tests is the Kaiser-Meyer-Olkin(KMO) test for sampling adequacy. The test produces an index with value that between 0 and 1 with values close to 1 indicating less error in predicting one variable base on the other variables. So if the value is close to 1, then it can be deduced that there is strong correlation between the variables. However, values of close to 0 indicates weak correlation and therefore, factor analysis would not yield good results.

Te summary of the interpretation of the possible values of the KMO statistic is given below:

  • 0 to 0.49 –  unacceptable.
  • 0.5 to 0.59 –  miserable.
  • 0.6 to 0.69 –  mediocre.
  • 0.7 to 0.79 –  middling.
  • 0.8 to 0.89 – meritorious.
  • 0.9 to 1.0 marvelous.


Batlett Test for Sphericity

This is also another test for the presence of correlation among variables. It provides the statistical probability that the correlation matrix has significant correlation for at least some of the variables. Small values, say less than 0.05 of significance level indicates that factor analysis would be suitable for the data.



Kindson Munonye is currently completing his doctoral program in Software Engineering in Budapest University of Technology and Economics

View all posts by kindsonthegenius →

2 thoughts on “Basics of Factor Analysis for Data Scientists

Leave a Reply

Your email address will not be published.