Absolute gas emission feeds ventilation sizing, drainage design, and safety management. Field data often mix geology, mining disturbance, and drainage operation with strong nonlinearity and temporal correlation. Single-factor history models can drift when operating conditions change. Lin et al. (2024), Reliability Engineering & System Safety, frame the task as multifactor time-series forecasting and propose RFECV (recursive feature elimination with cross-validation) followed by Bi-LSTM. Below is a structured reading note—problem, data, method, results, limits, and engineering use—without substituting for the full paper.

1. Why move beyond single-factor series

1.1 Role of emission forecasts

The authors stress that emission prediction supports ventilation-system reliability and gas-extraction design, not accuracy alone. Underestimating emission can under-size dilution airflow; overestimation can drive excessive drainage or frequent fan adjustments.

1.2 Limits of earlier practice

They highlight:

  • Many coupled drivers, hard to fix one “universal” feature set by hand;
  • Nonlinearity and temporal structure co-exist;
  • Much prior work uses only emission history, ignoring observable exogenous drivers (output, advance, extraction rate, etc.).

Their response: compress informative factors first, then apply a deep sequence model.

2. Factor framework (how the paper is situated)

The article defines a multifactor time series. Related face-emission work from the same research line often splits primary indicators into:

| Class | Typical variables (symbols common in related studies) | Role | |-------|------------------------------------------------------|------| | Geology | Seam thickness M, depth H, dip D, gas content GC, floor elevation BLV, interlayer spacing SD, adjacent seam thickness ML, etc. | Storage and permeability context | | Mining | Daily output DO, daily advance V, pure extraction EP, etc. | Disturbance and de-gassing intensity |

RFECV selects data-driven subsets from such a pool instead of a one-off manual pick. The paper embeds Ridge regression and random forest (RF) inside RFECV, yielding four multifactor combinations (Ridge-RFECV and RF-RFECV paths) before the neural stage.

3. Method pipeline

Multifactor series → RFECV (Ridge / RF embedder) → 4 factor sets → Bi-LSTM → emission forecast

3.1 RFECV

RFE iteratively trains and drops weak features; cross-validation estimates generalization across folds.

Key settings in the paper:

  • Two embedders: Ridge (linear, interpretable) and RF (nonlinear, robust);
  • Output: four emission-oriented input combinations;
  • Goal: balance dimensionality and interpretability before Bi-LSTM.

In production, RFECV output can be frozen as a feature allow-list checked during data QA (missing rates, scaling, lag alignment).

3.2 Bi-LSTM

On each selected combination, a bidirectional LSTM uses past and (within-window) future context, then regresses target emission. Reported best stack: RF-RFECV-Bi-LSTM.

3.3 Splits and metrics

Training fractions 60%, 70%, and 80% are compared. Reported figures for RF-RFECV-Bi-LSTM on their dataset:

| Metric | Reported | Note | |--------|----------|------| | RMSE | 0.2455 | Interpret with units / normalization | | MAE | 0.1914 | Mean absolute error | | R² | 0.9897 | Validate out-of-time and out-of-face before deployment | | Model stability | 0.9431 | Consistency across splits (see original definition) | | Runtime | ~12.20 s | Hardware-dependent |

Treat these as site-specific; do not use as universal acceptance thresholds.

4. How this fits among related approaches

| Track | Idea | This paper | |-------|------|------------| | Univariate series | Emission history only | Explicit exogenous factors | | Feature selection + shallow ML | LASSO/RFE + SVR/RF | RFECV + Bi-LSTM | | End-to-end deep nets | All sensors at once | RFECV first, then sequence model |

Later work on concentration monitoring often adds decomposition, graphs, or attention; for absolute emission with interpretable factors, this paper is a useful “RFECV + RNN family” baseline.

5. Limits and reproduction notes

  1. Alignment: geology and mining series must be time-aligned; lag choice affects interpretation.
  2. Validation discipline: use rolling time splits and hold-out faces; avoid shuffle leakage with high R².
  3. RFECV stability: Ridge vs RF embedders yield different subsets—review physical plausibility, not error alone.
  4. Emission vs concentration: different sensors and safety chains; define the prediction target clearly.
  5. Operations: offline modeling still needs a forecast → ventilation check → alert → human confirm loop.

6. Engineering takeaways

  • Order of work: clean data → RFECV → sequence model.
  • Couple forecasts to air quantity needs or network regulation, not only reports.
  • When extraction rate EP is a driver, drainage changes feed back—plan rolling retrain or update.
  • Version the four RFECV feature sets for auditability.
  • High model scores do not replace methane monitoring, interlocks, or ventilation redundancy.

Reference

Lin, H.; Li, W.; Li, S.; Wang, L.; Ge, J.; Tian, Y.; Zhou, J. Coal Mine Gas Emission Prediction Based on Multifactor Time Series Method. Reliability Engineering & System Safety 2024, 252, 110443.