In this article, you will learns how to perform time series analysis using the ARIMA (AutoRegressive Integrated Moving Average) method. The dataset we’ll use for this tutorial is the london-daily-temperature dataset which you can get for free from here.
We would cover the following:
- Obtain and Prepare Your Dataset
- Perform Seasonal Decomposition
- Create and Fit the Model
- Plot the Model Performance
- Examine the Model Metrics
- Further Topics in Time Series
1. Obtain and Prepare Your Dataset
The dataset for this analysis would be the london-daily-temperature dataset which contains records for temperatures from the year 1979 to 2023. We read it into Pandas dataframe
import pandas as pd from datetime import datetime #1. Load the dataset data = pd.read_csv('london_daily_temperature.csv') #2. Extract just the DATE and TX Columns data = pd.DataFrame(data=data, columns=['DATE', 'TX']) #3. convert the DATE column to DateTim data['DATE'] = pd.to_datetime(data['DATE'], format='%Y%m%d') #4. Rename the columns to meaningful names data.rename(columns={'DATE': 'Date', 'TX': 'Temperature'}, inplace=True) #5. Set the index of the dataframe data.set_index('Date', inplace=True)
Note that in #5, we set the index of the dataset to the Date column instead of leaving the default integer type index. This is because in a time-series analysis, the date/time when the data is received is generally expected to be unique.
2. Perform Seasonal Decomposition
Seasonal decomposition allows us to see the 4 different components of the data. These includes:
- Observed – the original data series you provided
- Trend – the long-term progression of the series. It provide a view of the long-term patterns.
- Seasonality – the repeating short-term cycles. In this example, we use a period of 365 which represents effect the repeat in a yearly cycle.
- Residuals – the remaining part after removing the trend and seasonality. Represents irregular random fluctuations in the data.
# Decompose the time series data import matplotlib.pyplot as plt from statsmodels.tsa.seasonal import seasonal_decompose decomposition = seasonal_decompose(train['Temperature'], model='additive', period=365) fig, axes = plt.subplots(4, 1, figsize=(12, 12)) # Create a figure and 4 subplots decomposition.observed.plot(ax=axes[0], title='Observed') decomposition.seasonal.plot(ax=axes[1], title='Seasonal Component') decomposition.trend.plot(ax=axes[2], title='Trend Component') decomposition.resid.plot(ax=axes[3], title='Residual Component') plt.tight_layout() plt.show()

3. Create and Fit the ARIMA Model
Now, we will create the ARIMA model. ARIMA stands for AutoRegressive (AR) Integrated (I) Moving Average (MA) which is made up of three components:
- AutoRegressive (AR): This indicates that the model provides a relationship between the current value and it’s previous values.
- Integrated (I): This indicated how much the data is differenced to achieve stationarity
- Moving Average (MA): This component models the relationship between the current value and the past forecast errors.
# Create and fit an ARIMA model from statsmodels.tsa.arima.model import ARIMA model = ARIMA(train['Temperature'], order=(1, 1, 1)) model_fit = model.fit()
4. Plot the model performance
Once we fit the model through our dataset, we can access the predictions via the fittedvalues() method of the model. In the code snippet below, the create the two plots:
- the original temperature values
- the predicted values from the model’s fittedvalues()
From the plot, we can see that the fittedvalues closely matches the observed values indicating that our model has a decent performance
# Create both th original and fitted plot train['Temperature'].plot(figsize=(14, 6), title='Daily Temperature in London') model_fit.fittedvalues.plot(color='red') # fitted plot plt.show()

# Plot the performance for a 5-months slice of data train['Temperature'][(train.index >'2010-01-01') & (train.index <= '2010-05-28')].plot(figsize=(12, 6), label='Original') model_fit.fittedvalues[(train.index > '2010-01-01') & (train.index <= '2010-05-28')].plot(label='Fitted') plt.legend() plt.show()

5. Examine Model Metrics
We examine the model metrics. Here we look at the MSE (Mean Squared Error), RMSE (Root Mean Squared Error), MEA (Mean Absolute Error) and R_squared scored. From the outputs we can see an R2 of 0.87.
However, we note a significant value for the MSE and this would likely be due to the present of outliers in the original dataset.
# Access the model metrics from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score import numpy as np mse = mean_squared_error(train['Temperature'], model_fit.fittedvalues) rmse = np.sqrt(mse) mae = mean_absolute_error(train['Temperature'], model_fit.fittedvalues) r2 = r2_score(train['Temperature'], model_fit.fittedvalues) print('MSE:', mse) print('RMSE:', rmse) print('MAE:', mae) print('R2:', r2)
The output is given below
MSE: 566.5997912577614 RMSE: 23.80335672248268 MAE: 18.618313872554147 R2: 0.868029565993606
6. Next Steps in Time Series
Having covered the basics of Time Series, we would continue with a deeper dive in subsequent articles. The following topics would be covered:
- Stationarity and Differencing
- Test for Stationarity – Augmented Dickey-Fuller test (ADF)
- Autocorrelation and Partial Autocorrelation
- Interpreting Autocorrelation Plots
- Seasonal ARIMA
- Prophet for Business Forecasting