Gas concentration forecasting: ARIMA-LSTM hybrid model

Mine gas concentration series often mix linear trends and nonlinear fluctuations; a single ARIMA or LSTM model alone may miss part of the structure. Li et al. (2023), Processes, propose ARIMA-LSTM: ARIMA extracts the linear forecast and residuals; LSTM models the residual nonlinearity; outputs are summed for the final prediction. Below is a structured reading note.

1. Problem background

1.1 Limits of single models

| Model | Strength | Weakness on gas series | |-------|----------|------------------------| | ARIMA | Linear dynamics, interpretable | Poor on nonlinear patterns | | LSTM | Nonlinear, long dependencies | May underuse linear structure |

Prior work (SVM, ant colony, random forest, standalone LSTM) improves accuracy but rarely decomposes linear vs. nonlinear components explicitly.

1.2 Hybrid idea

Decompose the series as y = linear + nonlinear:

ARIMA → linear forecast l and residual δ = y − l
LSTM → nonlinear forecast nl from δ
Combined → ŷ = l + nl

2. Data and preprocessing

Site: corner of 803 working face, one mine (China)
Period: 1–6 March 2021; 30 min sampling → 288 points
Split: train 1–4 Mar (192 points); test 5–6 Mar (96 points)
Cleaning: 3σ (Laida) rule for gross errors; adjacent-mean imputation for outliers

3. Method pipeline

Raw series → 3σ clean → ADF + differencing → ARIMA(3,1,0) → residuals → LSTM → sum → forecast

3.1 ARIMA order selection

ADF: original series non-stationary → first-order differencing
ACF/PACF + BIC → optimal ARIMA (3, 1, 0)
Ljung–Box on residuals (p ≈ 0.55); Q–Q plot supports white-noise residuals

3.2 LSTM on residuals

Fits ARIMA residuals (nonlinear component)
ReLU activation; batch_size = 1; up to 100 epochs; loss stabilizes by ~epoch 30
Input normalized to typical neural-network ranges

4. Results (Table 2, test set)

| Model | R² | MAPE | RMSE | |-------|-----|------|------| | ARIMA alone | 0.3648 | 1.4135 | 1.5769 | | LSTM alone | 0.5244 | 0.4253 | 0.7823 | | ARIMA-LSTM | 0.9825 | 0.0124 | 0.0830 |

The hybrid clearly beats both single models on reported metrics. Authors note remaining error may reflect unmodeled environmental and production drivers (ventilation, mining rate, etc.).

5. Limits and reproduction notes

Short window: six days at one corner—validate on longer, multi-face data before deployment.
Univariate: no exogenous factors yet; authors plan multivariate extensions.
Metric context: RMSE/MAPE depend on concentration scale and normalization—do not treat as universal thresholds.
High R² risk: check for leakage, overfitting on small test sets, and out-of-time splits.
Safety chain: forecasts support early warning; they do not replace methane sensors, ventilation interlocks, or statutory limits.

6. Engineering takeaways

For gas concentration series, try linear model first + deep net on residuals before end-to-end black boxes.
Keep ARIMA diagnostics (ADF, ACF/PACF, Ljung–Box) in the pipeline for auditability.
Pair predictions with threshold/alarm workflows at the same sensor location.
Next step: add ventilation, advance rate, extraction as exogenous inputs—similar to RFECV-Bi-LSTM emission work in this blog series.

Reference

Li, C.; Fang, X.; Yan, Z.; Huang, Y.; Liang, M. Research on Gas Concentration Prediction Based on the ARIMA-LSTM Combination Model. Processes 2023, 11 (1), 174.