Mine gas concentration series often mix linear trends and nonlinear fluctuations; a single ARIMA or LSTM model alone may miss part of the structure. Li et al. (2023), Processes, propose ARIMA-LSTM: ARIMA extracts the linear forecast and residuals; LSTM models the residual nonlinearity; outputs are summed for the final prediction. Below is a structured reading note.
1. Problem background
1.1 Limits of single models
| Model | Strength | Weakness on gas series | |-------|----------|------------------------| | ARIMA | Linear dynamics, interpretable | Poor on nonlinear patterns | | LSTM | Nonlinear, long dependencies | May underuse linear structure |
Prior work (SVM, ant colony, random forest, standalone LSTM) improves accuracy but rarely decomposes linear vs. nonlinear components explicitly.
1.2 Hybrid idea
Decompose the series as y = linear + nonlinear:
- ARIMA → linear forecast
land residualδ = y − l - LSTM → nonlinear forecast
nlfromδ - Combined →
ŷ = l + nl
2. Data and preprocessing
- Site: corner of 803 working face, one mine (China)
- Period: 1–6 March 2021; 30 min sampling → 288 points
- Split: train 1–4 Mar (192 points); test 5–6 Mar (96 points)
- Cleaning: 3σ (Laida) rule for gross errors; adjacent-mean imputation for outliers
3. Method pipeline
Raw series → 3σ clean → ADF + differencing → ARIMA(3,1,0) → residuals → LSTM → sum → forecast
3.1 ARIMA order selection
- ADF: original series non-stationary → first-order differencing
- ACF/PACF + BIC → optimal ARIMA (3, 1, 0)
- Ljung–Box on residuals (p ≈ 0.55); Q–Q plot supports white-noise residuals
3.2 LSTM on residuals
- Fits ARIMA residuals (nonlinear component)
- ReLU activation; batch_size = 1; up to 100 epochs; loss stabilizes by ~epoch 30
- Input normalized to typical neural-network ranges
4. Results (Table 2, test set)
| Model | R² | MAPE | RMSE | |-------|-----|------|------| | ARIMA alone | 0.3648 | 1.4135 | 1.5769 | | LSTM alone | 0.5244 | 0.4253 | 0.7823 | | ARIMA-LSTM | 0.9825 | 0.0124 | 0.0830 |
The hybrid clearly beats both single models on reported metrics. Authors note remaining error may reflect unmodeled environmental and production drivers (ventilation, mining rate, etc.).
5. Limits and reproduction notes
- Short window: six days at one corner—validate on longer, multi-face data before deployment.
- Univariate: no exogenous factors yet; authors plan multivariate extensions.
- Metric context: RMSE/MAPE depend on concentration scale and normalization—do not treat as universal thresholds.
- High R² risk: check for leakage, overfitting on small test sets, and out-of-time splits.
- Safety chain: forecasts support early warning; they do not replace methane sensors, ventilation interlocks, or statutory limits.
6. Engineering takeaways
- For gas concentration series, try linear model first + deep net on residuals before end-to-end black boxes.
- Keep ARIMA diagnostics (ADF, ACF/PACF, Ljung–Box) in the pipeline for auditability.
- Pair predictions with threshold/alarm workflows at the same sensor location.
- Next step: add ventilation, advance rate, extraction as exogenous inputs—similar to RFECV-Bi-LSTM emission work in this blog series.
Reference
Li, C.; Fang, X.; Yan, Z.; Huang, Y.; Liang, M. Research on Gas Concentration Prediction Based on the ARIMA-LSTM Combination Model. Processes 2023, 11 (1), 174.