Why Bother Explaining a Black-Box Model?

November 3, 2021


In recent years, the availability of large volumes of data and computational resources, combined with the optimization technology, has caused deep learning models to succeed in numerous representation learning and decision-making problems from various domains. In some cases, these models have even exceeded human-level accuracy, demonstrating that artificial intelligence can perform tasks comparable to human domain experts. To name a few examples, take a look at the Microsoft’s recent achievement in the field of natural language processing which is building the world’s largest and most powerful generative language model to date with 530 billion parameters, or DeepMind’s computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known.

However, the "no free lunch theorem" always exists, and the success of deep learning models is achieved with several costs. Not only do deep learning models require an enormous amount of high-quality input data and extensive computational resources to be trained, but also these models are highly complex because they have millions or even billions of trainable parameters. This complexity turns deep learning models into black-boxes that are hard to analyze . In other words, the lack of transparency of their mechanics makes it difficult to understand their chain of reasoning that leads to certain decisions, predictions or actions. Consequently, it is hard to trust these models and reliably apply them in sensitive situations where it is necessary to understand the problem’s context, such as healthcare and transportation. For example, it has been shown that deep learning models are fragile under small targeted perturbations called "adversarial examples", which drastically decrease their performance

Due to the weakness of such black-box models, there has been an emerging field in the machine learning community called "explainable artificial intelligence". The goal of explainable artificial intelligence is to produce more explainable models while maintaining their high performance.

Why do we need explainability?

The term explainability is often used in parallel with the term interpretability. Still, there is no mathematically rigorous definition of explainability in the machine learning community since it depends on the context of the problem and the audience to which these explanations are provided. However, there are several requirements that machine learning explainability has to address, namely, trustworthiness, causality, reliability, fairness, and privacy.

  • From a psychological perspective, explanations are the currency that humans exchange beliefs. That is to say, explanations are answers to “why” questions, and the answers are accepted only if they make sense. This point of view suggests that trustworthiness is necessary for the machine learning models, or equivalently, it is crucial to know “how often a model is right” and “for which examples it is right”. This aspect of explainability of machine learning models is aligned with the social right to explanation which refers to the individual rights of presenting a reason for decisions that significantly affect an individual, particularly legally or financially.
  • Explainability is also important when you want to leverage a machine learning model to generate hypotheses about causal relationships between variables. Therefore, it is often desirable for the model to pick up causal relationships rather than mere associations and correlations.
  • Reliability is another need for machine learning models. Machine learning systems should be robust to noisy inputs and shifts in the input data domain. The behavior of a black-box model is unpredictable under these circumstances.
  • Moreover, machine learning models must be fair when applied in decision-making settings such as social, medical, or economic environments. To be more precise, the outputs of a machine learning model must not be affected by the biases in the training datasets (such as possible demographic and racial biases).
  • Finally, machine learning models must preserve the privacy of sensitive personal data and, therefore, it is essential for these models to have transparent mechanics.

Who benefits from explainability?

Producing explanations for machine learning models that are deployed in industrial settings depends on the specific audience of those machine learning models. Various stakeholders are demanding explanations of machine learning models, and so far the machine learning community seems to be falling short of expectations. These stakeholders include,

  • End-users of the machine learning model which consume the output of these models directly and require explanations to trust those outputs. Helping end-users build trust in the way that these models make decisions leads to a better user experience.
  • Executives and decision-makers of an enterprise who use the results of such models to develop the business strategy of the enterprise.
  • Data scientists and machine learning engineers who design and implement these models and must fully understand the mechanics of such models.
  • Domain experts who are often asked to audit the performance of these models.
  • And finally, regulators who may demand that these models satisfy certain criteria prior to applying them in real-world environments.

According to a study among approximately fifty organizations, most enterprises that deployed explanatory techniques within their organization utilize the explanations as a means to guide data scientists and machine learning engineers to design models rather than present them to the end-users. This indicates that there are still many efforts that need to be taken by the machine learning community to achieve explainable artificial intelligence.

Explainability through the lens of data scientists

The explainability of machine learning models empowers the designing process of those models from a technical perspective. Data scientists and machine learning engineers can take advantage of an explainable model in several use cases. One of the major applications of explainability is model debugging. A data scientist needs to understand the behavior of a model, particularly when applied to the specific inputs that result in low performance. Having an explainable model helps data scientists to know the relationship between various features and measure the contribution of each feature in the resulting output. Also, the explainability of a model guides data scientists through the feature engineering process that leads to better performance of the model.

Another benefit of explainability is to monitor the model’s performance after it is deployed. Since in real-world environments, the distribution of the model’s input may change over time, it is important to have an understanding of the model’s response when drifts occur in input features distributions and anticipate when a system fails. Furthermore, an explainable model makes it possible for data scientists to present their model’s behavior to other organization teams and collaborate with them to audit the model and improve its performance.

Overview of methods

So far, we have discussed the explainability of a machine learning model and its motivations. Now, let us have a high-level overview of the techniques of achieving it. It is worth mentioning that not all machine learning models have opaque mechanisms. Some basic models are intrinsically transparent, such as logistic/linear regression, k-nearest neighbors, decision trees, Bayesian models, rule-based learners, and general additive models. Although these models can explain their behavior, they might not achieve the performance of more complex models such as artificial neural networks in various machine learning tasks. On the other hand, complex models such as neural networks have a black-box nature and require explanations. In general, there’s a tradeoff between the performance of the machine learning model and its explainability. Nevertheless, there are settings where explainability of the model is as important as its performance and the choice of machine learning model depends on the context of the problem.

Explaining complex machine learning models may be achieved by post-hoc approaches that try to extract useful information regarding the mechanics of the model after it is trained. There are various taxonomies of the explainability methods in machine learning literature depending on the different points of view. In the most general perspective, one can classify explainability methods into model-agnostic and model-specific methods. As their names suggest, model-agnostic methods refer to those techniques that can be applied to any machine learning model, whereas model-specific methods are tailored for specific models. Each of these methods can have local or global variants. Local explainability techniques aim to explain model behavior for a specific input sample, while global explainability techniques attempt to understand the high-level concepts and reasoning used by a model. Local explainability techniques are the most consumed methods in organizations.

The most common local explainability technique is attribution methods. Attribution methods employ the gradient information of the model with respect to the inputs implicitly or explicitly and measure each input feature’s contribution to the model’s output. Two well-known attribution methods are LIME (Local Interpretable Model-Agnostic Explanations) and SHAP (SHapley Additive exPlanations). We will elaborate on these techniques in our next blog posts. Other examples of local explainability techniques are counterfactual explanations which try to find the data point close to the input for which the decision of the classifier changes, and influential samples which try to find the most influential training data point to the model’s output for a specific test data. For a thorough review on the explanation techniques you can see this survey.


In this blog post we discussed the explainability of a machine learning model, its motivation and needs, and an overview of its methods. Explainable artificial intelligence is an active area of research both in industry and academia, and there is a whole spectrum of various methods trying to achieve it. We will pragmatically dive into more details in future blogs and see these explanations in action.

Made with + in Amsterdam

Back to top