Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Testing Model Assumptions: Residual Analysis

Assumptions Concerning the Error Term ε\varepsilon [1]

In this section, we will focus on the error term ε\varepsilon. The distribution of the regression coefficients depends on the distribution of the error term ε\varepsilon. The distribution of the error term ε\varepsilon hence determines to which level our model predictions are correct. In this section, we assume that

Y=f(X)+εY = f(X) + \varepsilon

where the function f(X)f(X) is assumed to be known. In practice, this is never the case. To focus on the assumptions concerning the error term ε\varepsilon, we will consider rather simple functions f(x)f(x).

What is the meaning of ε\varepsilon?

If we measure YY at position xix_i, then the following equation

yi=f(xi)+εiy_i = f(x_i) + \varepsilon_i

holds, where εi\varepsilon_i represents the error of the function ff at position xix_i, or to be more concise: it represents the deviation of the measured value yiy_i from the predicted value of f(xi)f(x_i). If we repeat the measurement several times at the same position xix_i, we will observe for every measurement a different value of yiy_i due to the error term εi\varepsilon_i.

R2R^2 Statistic [2]

The RSE provides an absolute measure of lack of fit of the linear regression model

to the data. But since it is measured in the units of YY, it is not always clear what constitutes a good RSE. The R2R^2 statistic provides an alternative measure of fit. It takes the form of a proportion—the proportion of variance explained—and so it always takes on a value between 0 and 1, and is independent of the scale of YY.

To calculate R2R^2, we use the formula

where TSS=i=1n(yiyˉ)2\text{TSS}=\sum_{i=1}^n (y_i-\bar{y})^2 is the total sum of squares and RSS=i=1n(yiy^i)2\text{RSS}=\sum_{i=1}^n (y_i-\hat{y}_i)^2 is the residual sum of squares. TSS measures the total variance in the response YY, and can be thought of as the amount of variability inherent in the response before the regression is performed. In contrast, RSS measures the amount of variability that is left unexplained after performing the regression.

If the model fits the data perfectly, then we have y^i=yi\hat{y}_i=y_i for i=1,,ni=1,\ldots,n. In this case, the RSS becomes 0, and hence R2=1R^2=1. As a consequence, an R2R^2 value of approximately 1 means that a large part of the variance in the data is explained by the model. Conversely, an R2R^2 value near 0 indicates that little of the variance in the data is explained by the model. This might occur because the linear model is wrong, or the inherent error σ2\sigma^2 is high, or both.

Compute R2R^2 with statsmodels [3]

We find the R2 R^{2} -value for the example 6.1.2.

Source
import numpy as np
import statsmodels.api as sm

# Set random seed
np.random.seed(0)
# Create Random data allong y = x^2 + 4 
x = np.arange(-4, 2.4, .4)
y = -(x * x) + 4 + np.random.normal(0, 2, len(x))

# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(y,x_sm).fit()

# Print Model Summary
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.516
Model:                            OLS   Adj. R-squared:                  0.481
Method:                 Least Squares   F-statistic:                     14.91
Date:                Mon, 03 Nov 2025   Prob (F-statistic):            0.00173
Time:                        16:42:42   Log-Likelihood:                -40.363
No. Observations:                  16   AIC:                             84.73
Df Residuals:                      14   BIC:                             86.27
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          2.6174      0.917      2.855      0.013       0.651       4.584
x1             1.6876      0.437      3.861      0.002       0.750       2.625
==============================================================================
Omnibus:                        3.331   Durbin-Watson:                   0.690
Prob(Omnibus):                  0.189   Jarque-Bera (JB):                1.592
Skew:                          -0.447   Prob(JB):                        0.451
Kurtosis:                       1.740   Cond. No.                         2.53
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The R2 R^{2} value is 0.516 0.516 .

E[εi]=0E[\varepsilon_{i}] = 0 [4]

The linear model assumes that there is a straight-line relationship between the predictor and the response. If the true relationship is far from linear, then virtually all of the conclusions that we draw from the fit are suspect. In addition, the prediction accuracy of the model can be significantly reduced.

Residual plots are a useful graphical tool for identifying non-linearity of the regression function ff. Given a simple linear regression model, we can plot the residuals ri=yiy^ir_i = y_i - \hat{y}_i versus the fitted or predicted values y^i\hat{y}_i. We thus plot the points (y^i,ri)(\hat{y}_i, r_i) for i=1,,ni = 1, \ldots, n. The resulting plot is named Tukey–Anscombe-Plot after its inventors.

Tukey–Anscombe-Plots [5]

On the left-hand panel of the Figure the scatter plot for the Advertising data set is displayed. The right-hand panel displays the corresponding Tukey-Anscombe plot.

Source
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

# Load data
df = pd.read_csv('data/Advertising.csv')
x = df['TV']
y = df['sales']

""" Find Predictions and Residuals """
# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(y, x_sm).fit()
# Find the predicted values for the original design.
yfit = model.fittedvalues
# Predict values based on the fit.
ypred = model.predict(x_sm)
# Residuals of the model
res = model.resid

""" Plots """
# Create figure and subplots
fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(121)
# Plot data and regression
sns.regplot(x=x, y=y, 
            scatter=True, ci=False, lowess=False, 
            scatter_kws={'s': 40, 'alpha': 0.5},
            line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})
# Set Labels
ax1.set_xlabel('TV')
ax1.set_ylabel('Sales')

# Plot Residuals
ax2 = fig.add_subplot(122)
df = pd.concat([yfit, res], axis=1)
sns.residplot(x=yfit, y=res, data=df, 
              lowess=False, scatter_kws={'alpha': 0.5})
ax2.set_xlabel('Fitted Sales')
ax2.set_ylabel('Residuals')

# Show plots
plt.show()
<Figure size 1200x500 with 2 Axes>

We observe that the regression line on the left-hand panel corresponds to the horizontal axis on the right-hand panel of the Figure. We now can easily read off the values of the residuals for every predicted value y^i\hat{y}_{i}.

The linear model fits the data well if the points in the Tukey-Anscombe plot scatter evenly around the r=0r = 0 line. “Evenly distributed” means, that within a small area of the r=0r = 0 line there are approximately as many points above the line as there are points below the line. These points should ideally have the same distance from the r=0r = 0 line. In other words, in such an area for the average of the residuals

rˉ0\bar{r} \approx 0

should hold. Since we are estimating the error terms by means of the residuals, this relation finally is equivalent to the model assumption E[εi]=0E[\varepsilon_i] = 0.

Smoothing Approach [6]

How can we decide on the basis of the Tukey-Anscombe plot whether the model assumption E(ϵi)=0 E (\epsilon_{i})=0 is fulfilled? To visualize the relation between the residuals ri r_{i} and the predicted response values y^i\hat{y}_{i}, we will take advantage of the smoothing approach.

In the following, we will roughly illustrate the principle of the smoothing approach. A simple yet intuitive smoother is the running mean. It involves taking a fixed window width on the y^ \hat{y} -axis, and compute the mean of the r r -values of all the observations that fall into this window. This value then is the estimate for the function value at the window center. The left-hand panel of the Figure below displays two thin stripes : the average of all residuals falling into such a stripe is calculated and is plotted as a red point at the center of the corresponding stripe. If the stripes are sliding along the y^ \hat{y} -axis, connecting the red points for every window position then results in the red curve displayed in the right-hand panel of Figure below.

Source
# This example builds on onto Example 2.4
import numpy as np

""" Plots """
# Create Figure and subplots
fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(121)

# Visualize Moving average:
# Plot Residuals
sns.residplot(x=yfit, y=res, data=df, 
              lowess=False, scatter_kws={'alpha': 0.5})
# Plot vertical lines and average to visualize moving average
bandwidth = 0.5

for x_avg in [9.5, 17.5]:
    # Find indices where ypred is within range
    index_res = (np.where(ypred > (x_avg - bandwidth)) and 
                 np.where(ypred < (x_avg + bandwidth)))
    # Find mean of Residuals
    res_mean = res[index_res[0]].mean()
    # Plot lines
    plt.plot((x_avg-bandwidth, x_avg - bandwidth ), 
             [-10, 10], color='darkgrey')
    plt.plot((x_avg+bandwidth, x_avg + bandwidth ), 
             [-10, 10], color='darkgrey')
    # Plot average
    plt.plot(x_avg, res_mean, color='red', marker='o', markersize=8)

# Set labels
ax1.set_xlabel('Fitted Sales')
ax1.set_ylabel('Residuals')

# Find moving average over full domain
# create x vector
x_avg = np.linspace(ypred.min() + bandwidth, 
                    ypred.max() - bandwidth, 100)
res_mean = []
for x_ in x_avg:
    # Find indices where ypred is within range
    index_res = (np.where(ypred > (x_ - bandwidth)) and 
                 np.where(ypred < (x_ + bandwidth)))
    # Find mean of Residuals
    res_mean.append(res[index_res[0]].mean())

# Second subplot:
ax2 = fig.add_subplot(122)
# Plot Residuals
sns.residplot(x=yfit, y=res, data=df, 
              lowess=False, scatter_kws={'alpha': 0.5})
# Plot moving average:
plt.plot(x_avg, res_mean, 'r-', linewidth=5)
# Set labels
ax2.set_xlabel('Fitted Sales')
ax2.set_ylabel('Residuals')

# Show plot
plt.show()
<Figure size 1200x500 with 2 Axes>

LOESS smoother [7]

We generate for the Income data the Tukey-Anscombe plot along with the LOESS smoothing curve (see the Figure below).

Source
import pandas as pd
import seaborn as sns
import statsmodels.api as sm
from matplotlib import pyplot as plt
from TMA_def import plot_residuals, plot_reg

# Read data
df = pd.read_csv('data/Income.csv')
x = df['education']
y = df['income']

# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(y, x_sm).fit()
# Find the predicted values for the original design.
yfit = model.fittedvalues
# Find the Residuals
res = model.resid

# Create Figure and subplots
fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(121)
# Plot data using definition from TMA_def
plot_reg(ax1, x, y, model, x_lab="Years of Education",
         y_lab="Income", title="Linear Regression")

# Second Subplot
ax2 = fig.add_subplot(122)
# Plot Residuals using definition from TMA_def
plot_residuals(ax2, yfit, res, n_samp=0, x_lab="Fitted Income")

# Show plots
plt.show()
<Figure size 1200x500 with 2 Axes>

The red smoothing curve is not passing any more near to the dashed r=0 r=0 line. We are now confronted with the question whether the observed deviation of the smoothing curve from the r=0 r=0 line is plausible when assuming an underlying linear model. In other words, did the observed deviation simply occur due to a random variation or is there a systematic deviation from a linear model?

If we repeat the measurements, then we would observe a different distribution of the data points and the smoothing curve would follow a different path. If the new smoothing curve then passes next to the r=0r = 0 line, we would not have any reason to question the linearity assumption.

But how can we decide whether a smoothing curve systematically deviates from the r=0 r = 0 line, or when is this just due to a random variation? Generally, this is an expert call based on the magnitude of the deviation and the number of data points which are involved. An elegant way out of these (sometimes difficult) considerations is given by a resampling approach.

The principle idea of our resampling approach consists of simulating data points on the basis of the existing data set. For the simulated data points we fit a smoothing curve and add it to the Tukey-Anscombe plot.

Source
import random

random.seed(0)
n_samp = 100   # Number of resamples

# Create Figure and subplots
fig = plt.figure(figsize=(6,5))
ax1 = fig.add_subplot(111)

# For every random resampling
for i in range(n_samp):
    # 1. resample indices from Residuals
    samp_res_id = random.sample(list(res), len(res))
    # 2. Average of Residuals, smoothed using LOWESS 
    sns.regplot(x=yfit, y=samp_res_id, 
                scatter=False, ci=False, lowess=True, 
                line_kws={'color': 'lightgrey', 'lw': 1, 'alpha': 0.8})
    # 3. Repeat again for n_samples 

# Plot original smoothing curve
sns.residplot(x=yfit, y=res, data=df, 
              lowess=True, scatter_kws={'alpha': 0.5},
              line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})

plt.xlabel("Fitted Income")
plt.ylabel("Residuals")

# Show plots
plt.show()
<Figure size 600x500 with 1 Axes>

These simulated smoothing curves illustrate the magnitude which a random deviation from the r=0 r = 0 line can take on. It may help us to assess the smoothing curve constructed on the basis of the original residuals.

If the underlying regression function f f is linear, then the (grey) band resulting from the simulated smoothing curves should follow the r=0 r=0 line and contain the original (red) smoothing curve fitted on the basis of the original data set.

The red smoothing curve displayed in the Figure above lies within the (grey) band of simulated smoothing curves. We therefore conclude that the wiggly shape of the (red) smoothing curve is not critical. We thus may assume a linear underlying regression function f f .

Var[εi]=σ2Var[\varepsilon_{i}] = \sigma^2

Scale Location Plot [8]

The Figure below displays the scale location plot for the Advertising data.

Source
import pandas as pd
import numpy as np
import seaborn as sns
import statsmodels.api as sm
from matplotlib import pyplot as plt

# Load data
df = pd.read_csv('data/Advertising.csv')
x = df['TV']
y = df['sales']

# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(y, x_sm).fit()
# Find the predicted values for the original design.
yfit = model.fittedvalues
# Residuals of the model
res = model.resid
# Influence of the Residuals
res_inf = model.get_influence()
# Studentized residuals using variance from OLS
res_standard = res_inf.resid_studentized_internal
# Absolute square root Residuals:
res_stand_sqrt = np.sqrt(np.abs(res_standard))

# Create Figure and subplots
fig = plt.figure(figsize=(6, 5))
ax1 = fig.add_subplot(111)

# plot Square rooted Residuals
plt.scatter(yfit, res_stand_sqrt, alpha=0.5)
# plot Regression usung Seaborn
sns.regplot(x=yfit, y=res_stand_sqrt, 
            scatter=False, ci=False, lowess=True, 
            line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})
ax1.set_title('Scale-Location plot')
ax1.set_xlabel('Fitted Sales values')
ax1.set_ylabel('$\sqrt{\|Standardized\ Residuals\|}$')

# Show plot
plt.show()
<Figure size 600x500 with 1 Axes>

We observe that the magnitude of the residuals tend to increase with the fitted values, indicating a non-constant variance of the error terms ϵi\epsilon_{i}. To test whether this deviation is due to a random variation or whether it is of systematic nature, we will run again simulations by resampling the data.

To see whether a deviating smoothing curve in a scale location plot may be related to a random effect, we run again simulations by sampling the data with replacement and by fitting smoothing curves on the basis of the resampled data. The Figure below displays a band of 100 simulated smoothing curves that were fitted on the basis of the resampled data.
Since the (red) curve does not follow a path contained within this (grey) band of curves, we conclude that there is a systematic increase of the variances.

Source
from TMA_def import plot_scale_loc

n_samp = 100   # Number of resamples

# Create Figure
fig = plt.figure(figsize=(6, 5))
ax1 = fig.add_subplot(111)

# Plot Standardized Residuals using definition from TMA_def
plot_scale_loc(ax1, yfit, res_stand_sqrt, n_samp=100,
               x_lab="Fitted Sales values")

# Show plot
plt.show()
<Figure size 600x500 with 1 Axes>

εiN(0,σ2)\varepsilon_{i} \sim N(0, \sigma^2)

Histogram [9]

The residuals of the Advertising data set are plotted as a histogram in the Figure below. A normal density function with mean and variance estimated on the basis of the data is shown in green.

Source
import pandas as pd
import numpy as np
import statsmodels.api as sm
from matplotlib import pyplot as plt
from scipy.stats import norm

# Load data
df = pd.read_csv('data/Advertising.csv')
x = df['TV']
y = df['sales']

# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(y, x_sm).fit()
# Find the predicted values for the original design.
yfit = model.fittedvalues
# Residuals of the model
res = model.resid

# Create Figure and subplots
fig = plt.figure(figsize=(6, 5))
ax1 = fig.add_subplot(111)

# Plot histogram
res_bins = np.arange(-10, 12, 2)
plt.hist(res, bins=res_bins, 
         facecolor="lightgrey", edgecolor="k" )
plt.xlabel('Residuals')
plt.ylabel('Frequency')
plt.xlim(-10, 10)
plt.ylim(bottom=0)

# Plot estimated Normal density function
# new x axis
ax2 = ax1.twinx()
# Estimate mean and standard deviation
mu = np.mean(res)
sigma = np.std(res)
x_pdf = np.arange(-10, 10.1, 0.1)
# Plot Normal density function
ax2.plot(x_pdf, norm.pdf(x_pdf, mu, sigma), 
         '-', color="darkcyan", alpha=1)

# Show plot
plt.show()
<Figure size 600x500 with 2 Axes>

It is in general difficult to judge on the basis of a histogram whether the data is normally distributed. In addition, histograms are sensitive with respect to the chosen interval width.

Q-Q plot [10]

A Q-Q plot or normal plot is another graphical tool to verify whether the data is normally distributed. In our context, we plot the quantiles of the empirical residuals versus the theoretical quantiles of a normal distribution. Instead of the residuals we however use the standardized residuals r~i\widetilde{r}_i in our Q-Q plot.

Source
import seaborn as sns
from statsmodels.graphics.gofplots import ProbPlot

# Influence of the Residuals
res_inf = model.get_influence()
# Studentized residuals using variance from OLS
res_standard = res_inf.resid_studentized_internal

# QQ plot instance
QQ = ProbPlot(res_standard)
# Split the QQ instance in the seperate data
qqx = pd.Series(sorted(QQ.theoretical_quantiles), name="x")
qqy = pd.Series(QQ.sorted_data, name="y")

# Create Figure and subplots
fig = plt.figure(figsize=(6, 5))
ax1 = fig.add_subplot(111)

sns.regplot(x=qqx, y=qqy, scatter=True, lowess=False, ci=False,
            scatter_kws={'s': 40, 'alpha': 0.5},
            line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})
ax1.plot(qqx, qqx, '--k', alpha=0.5)

# Set labels
plt.title('Normal Q-Q')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Standardized Residuals')

plt.show()
<Figure size 600x500 with 1 Axes>

If the data actually originates from a normal distribution, then the data points will scatter just slightly around the straight line in the Q-Q plot.

If we observe a deviation from the straight line in the Q-Q plot, then the question comes up whether such a deviation is systematic or due to a random variation. In order to answer this question, we run again simulations. In particular, we draw 100 random samples of length nn (number of residuals) from a normal distribution that shares mean and standard deviation with the standardized residuals. The Figure below displays a band of simulated (grey) curves in the case of normally distributed residuals : these curves may be observed due to random variations. Since the points for the Advertising data set lie within this band, we consider the residuals to originate from a normal distribution

Source
# Split the QQ instance in the seperate data
qqx = pd.Series(sorted(QQ.theoretical_quantiles), name="x")
qqy = pd.Series(QQ.sorted_data, name="y")
# Estimate the mean and standard deviation
mu = np.mean(qqy)
sigma = np.std(qqy)

# Create Figure
fig = plt.figure(figsize=(6, 5))
ax1 = fig.add_subplot(111)

n_samp = 100   # Number of resamples
# For ever random resampling
for lp in range(n_samp):
    # Resample indices 
    samp_res_id = np.random.normal(mu, sigma, len(qqx))
    # Plot
    sns.regplot(x=qqx, y=sorted(samp_res_id),
                scatter=False, ci=False, lowess=True, 
                line_kws={'color': 'lightgrey', 'lw': 1, 'alpha': 0.4})

# Add plots for original data and the line x = y
sns.regplot(x=qqx, y=qqy, scatter=True, lowess=False, ci=False,
            scatter_kws={'s': 40, 'alpha': 0.5},
            line_kws={'color': 'red', 'lw': 1, 'alpha': 0.8})
ax1.plot(qqx, qqx, '--k', alpha=0.5)

# Set limits and labels
plt.title('Normal Q-Q')
plt.xlabel('Theoretical Quantiles')
plt.ylabel('Standardized Residuals')

plt.show()
<Figure size 600x500 with 1 Axes>

εi\varepsilon_{i} Independence [11]

An important assumption of the linear regression model is that the error terms, ε1,,εn\varepsilon_1, \ldots, \varepsilon_n, are independent. What does this mean? For instance, if the errors are independent, then the fact that εi\varepsilon_i is positive provides little or no information about the sign of εi+1\varepsilon_{i+1}. The standard errors that are computed for the estimated regression coefficients or the fitted values are based on the assumption of independent error terms. If in fact there is correlation among the error terms, then the estimated standard errors will tend to underestimate the true standard errors. As a result, confidence and prediction intervals will be narrower than they should be. For example, a 9595% confidence interval may in reality have a much lower probability than 0.95 of containing the true value of the parameter. In addition, p-values associated with the model will be lower than they should be; this could cause us to erroneously conclude that a parameter is statistically significant. In short, if the error terms are correlated, we may have an unwarranted sense of confidence in our model.

Outliers and High Leverage Points [12]

Outlier

An outlier is a point for which yiy_i is far from the value y^i\hat{y}_i predicted by the model. Outliers can arise for a variety of reasons, such as incorrect recording of an observation during data collection.

High Leverage Point

We just saw that outliers are observations for which the response yiy_i is unusual given the predictor xix_i. In contrast, observations with high leverage have an unusual value for xix_i. For example, observation 41 in the Figure below has high leverage, in that the predictor value for this observation is large relative to the other observations. Contrary to observation 41, observation 5 has an unusually large y5y_5-value relative to y^5\hat{y}_5. Observation 5 however is an outlier and does not have a high leverage since the predictor value is near to the other predictor values.

Leverage Statistic

In order to quantify an observation’s leverage, we compute the leverage statistic. A large value of this statistic indicates an observation with high leverage. For a simple linear regression the value of the leverage statistic hih_i associated with the iith observation is given by

It is clear from this equation that hih_i increases with the distance of xix_i from xˉ\bar{x}. We interpret a high leverage point as a point having a large distance from the center of gravity xˉ\bar{x} and thereby having a high “momentum to turn the regression line around”.

Cook’s Distance

By means of the leverage statistic hih_i we can define another statistic to measure the influence of an observation: Cook’s distance. Cook’s distance measures to which extent the predicted value y^i\hat{y}_i changes if the iith observation is removed:

di=1σ^2(y^(i)y^)T(y^(i)y^)d_i = \dfrac{1}{\hat{\sigma}^2} \cdot \left( \hat{y}*{(-i)} - \hat{y} \right)^{T} \left( \hat{y}*{(-i)} - \hat{y} \right)

where y^(i)\hat{y}_{(-i)} denotes the vector of predicted values if the iith observation is removed. Cook’s distance can efficiently be calculated by means of the leverage statistic hih_i and the standardized residual r~i\tilde{r}_i:

The larger the value of Cook’s distance did_i is, the higher is the influence of the corresponding observation on the estimation of the predicted value y^i\hat{y}_i. In practice, we consider a value of Cook’s distance larger than 1 as dangerously influential.

Summary

We have seen 4 possibilities how to analyze residuals graphically:

  1. Tukey–Anscombe-Plots

  2. Scale location plot

  3. Q-Q plot

  4. Cook’s Distance

For the Advertising data set these residual plots are displayed in the Figure below [13]

Source
import statsmodels.api as sm
from TMA_def import *

# Read data
df = pd.read_csv('data/Advertising.csv')
x = df['TV']
y = df['sales']

# Reformat Data
dataframe = pd.concat([x, y], axis=1)

""" Find Predictions, Residuals and Influence on the Residuals """
# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(np.log(y), x_sm).fit()

tma_plots(model, n_samp = 200)
<Figure size 1400x1800 with 4 Axes>

For the Income data set the residual plots are displayed in the Figure below [14]

Source
import statsmodels.api as sm
from TMA_def import *

# Read data
df = pd.read_csv('data/Income.csv')
x = df['education']
y = df['income']

""" Find Predictions, Residuals and Influence on the Residuals """
# Fit Linear Model
x_sm = sm.add_constant(x)
model = sm.OLS(np.sqrt(y), x_sm).fit()

tma_plots(model, n_samp = 200)
<Figure size 1400x1800 with 4 Axes>
Footnotes
  1. Birbaumer et al., 2025, p. 129

  2. Birbaumer et al., 2025, p. 137

  3. Birbaumer et al., 2025, Example 6.2.3

  4. Birbaumer et al., 2025, p. 140

  5. Birbaumer et al., 2025, Example 6.2.4

  6. Birbaumer et al., 2025, Example 6.2.5

  7. Birbaumer et al., 2025, Example 6.2.6

  8. Birbaumer et al., 2025, Example 6.2.9

  9. Birbaumer et al., 2025, Example 6.2.11

  10. Birbaumer et al., 2025, Example 6.2.12

  11. Birbaumer et al., 2025, p. 158

  12. Birbaumer et al., 2025, p. 160

  13. Birbaumer et al., 2025, Example 6.2.14

  14. Birbaumer et al., 2025, Example 6.2.15

References
  1. Birbaumer, M., Frick, K., Büchel, P., & van Hemert, S. (2025). Predictive Modeling - Lecture Notes.