A systematic futures trading system built on 14 peer-reviewed research papers spanning market microstructure, time-series momentum, volatility forecasting, and statistical validation.
Four non-negotiable principles that constrain every design decision in the system.
Price is the only ground truth. When any analytical output contradicts observed price behavior, the analysis is wrong. Informed by Bouchaud, Farmer & Lillo (2009): most market information comes from supply and demand dynamics, not external news.
The primary objective is surviving long enough for edge to compound. Every strategy must pass the Deflated Sharpe Ratio test (Bailey & López de Prado, 2014), which corrects for selection bias under multiple testing. Position sizing uses a conservative fraction of the Kelly Criterion—maintaining a safety margin against edge overestimation.
Every component is independently testable and replaceable. The processing interface remains invariant whether consuming historical data or live market feeds. Validated through CPCV (López de Prado, 2018), which demands performance be testable across combinatorial data splits.
A portion of each position takes a fixed exit for base-rate profitability. The remainder trails with a wider stop, capturing the full extent of trending moves. Consistent with Moskowitz, Ooi & Pedersen (2012): time-series momentum in futures persists for 1–12 months. The system’s job is to stay in the trade long enough to capture that persistence.
Every trade passes through a multi-layer pipeline. No single signal triggers execution—multiple independent conditions must converge before capital is deployed. The majority of candidate signals are rejected at the first layer.
Multiple higher timeframes must confirm directional alignment before the execution layer considers any entry. This gate draws from research on time-series momentum persistence in futures markets (Moskowitz, Ooi & Pedersen, 2012) and the empirical finding that order flow exhibits long memory across timescales (Bouchaud, Farmer & Lillo, 2009). The gate is the system’s most powerful filter—the majority of candidate signals never pass it.
Once directional alignment is confirmed, the system evaluates structural price action patterns for entry quality. Each candidate trigger is scored against multiple criteria. Only patterns meeting the quality threshold advance to confluence evaluation.
Multiple independent signals must converge before capital is deployed. Each signal contributes to a weighted confluence assessment; the system requires sufficient agreement before acting. This multi-factor approach reduces the probability of acting on spurious signals (Gu, Kelly & Xiu, 2020).
Stop-loss placement and position sizing are calibrated to current market volatility, not fixed parameters. The system uses ATR-based calculations that adapt dynamically, ensuring consistent risk per trade in dollar terms regardless of market conditions (Corsi, 2009).
The system analyzes price across a hierarchy of timeframes, from macro trend down to execution context. Higher timeframes carry more authority—the execution layer only acts when the broader hierarchy confirms directional bias.
The gate detects the observable footprint of institutional order splitting. Large parent orders from institutional participants are broken into thousands of child orders executed over extended periods. This creates a long-memory property in order flow (Bouchaud, Farmer & Lillo, 2009)—meaning directional bias on higher timeframes has genuine predictive power for the execution timeframe. Not because of pattern recurrence, but because of persistent supply-and-demand imbalance that takes time to fully absorb.
This is consistent with research showing time-series momentum persists for 1–12 months across dozens of futures markets (Moskowitz, Ooi & Pedersen, 2012). The system does not attempt to predict when momentum will end—it participates while the higher-timeframe hierarchy confirms the trend is intact, and exits when it doesn’t.
Within the decision pipeline, specific signal types are evaluated at the trigger and confluence layers. Each signal type is grounded in market microstructure research.
Price sweeps above known highs or below known lows trigger clusters of resting stop orders, providing counterparty liquidity for institutional entries. These events create a temporary dislocation between price and underlying order flow. The mechanics are consistent with Kyle’s (1985) model of informed trading and price impact, and with the Almgren-Chriss (2001) framework for temporary versus permanent impact decomposition. When a sweep exhausts available liquidity at a level, the resulting price movement carries information about the true supply-demand balance.
The linear relationship between order flow imbalance and contemporaneous price changes is one of the strongest regularities in market microstructure. Cont, Kukanov & Stoikov (2014), published in the Journal of Financial Econometrics, established that order flow imbalance dominates raw volume as a short-horizon price predictor. The system leverages this relationship to assess the directional conviction behind observed price movements.
Large parent orders from institutional participants are split into thousands of child orders, creating a distinctive pattern of persistent directional flow. Lillo & Farmer (2004) documented this long-memory property, and Bouchaud, Farmer & Lillo (2009) showed that markets “slowly digest” these supply-and-demand changes over extended periods. The system identifies the observable signatures of this institutional splitting process across multiple timeframes.
Every position is divided into multiple contracts with distinct exit strategies, balancing the fundamental tension between reliability and magnitude.
The fixed-target contract reduces variance by locking in gains at a predefined, volatility-adjusted level. This establishes the system’s base-rate profitability—even if the trailing portion is stopped out at breakeven, the fixed exit has already captured value.
The trailing-runner contract captures the full extent of trending moves. Research on time-series momentum (Moskowitz, Ooi & Pedersen, 2012) shows that trend persistence in futures markets is both statistically significant and economically meaningful across dozens of instruments over decades. The runner’s wider, adaptive stop is designed to stay in the trade long enough to capture this persistence—it converts occasional large winners into the primary driver of portfolio returns.
Risk management is not a feature of this system—it is the system. Every profit-generating mechanism operates within hard constraints that cannot be overridden by signal strength, conviction, or any other factor.
The Kelly Criterion (Kelly, 1956) defines the theoretically optimal bet size for maximizing long-term geometric growth. However, Thorp (2008) demonstrated that even modest overestimation of edge at full Kelly produces catastrophic drawdowns. The system uses a conservative fraction of the Kelly-optimal size, deliberately sacrificing expected growth rate in exchange for materially lower variance and reduced probability of ruin. The Kelly fraction is re-estimated periodically from realized trade statistics.
Per-trade, daily, and weekly drawdown limits form nested containment layers. Each layer operates independently—a breach at any level triggers automatic protective action regardless of what other layers indicate. When a daily limit is hit, all positions are closed. When a weekly limit is hit, the system enters observation-only mode. There is no override mechanism and no “one more trade” logic.
Stop-loss distances and position sizes adapt to current market volatility using the HAR-RV model (Corsi, 2009). This model captures volatility persistence across daily, weekly, and monthly horizons, producing more accurate forecasts than single-horizon approaches (Andersen, Bollerslev, Diebold & Labys, 2003). In high-volatility regimes, positions are smaller and stops are wider; in low-volatility regimes, the inverse applies. This ensures consistent risk per trade in dollar terms.
Market behavior is not stationary. The system classifies prevailing conditions along two axes—trend strength and volatility level—and adapts its parameters accordingly.
The HAR-RV model (Corsi, 2009) decomposes realized volatility into daily, weekly, and monthly components, capturing the multi-timescale structure of volatility clustering. This produces superior forecasts compared to single-horizon GARCH-family models, particularly during regime transitions. Research by Andersen, Bollerslev, Diebold & Labys (2003) established that realized volatility computed from high-frequency data provides a more accurate measure of true latent volatility than daily-close estimators.
The system adapts stops, sizing, and entry thresholds based on which regime quadrant is detected. In trending, low-volatility environments, standard parameters apply. In ranging, high-volatility environments, the system stands aside entirely—the expected cost of whipsaw losses exceeds the expected benefit of attempted trades. The jump-diffusion decomposition (Andersen, Bollerslev & Diebold, 2007) further separates the continuous volatility component (which drives predictability) from the jump component (which does not).
A backtest is only as trustworthy as the assumptions embedded in it. Look-ahead bias is the most common source of inflated historical performance—and the hardest to detect. The system’s causal replay engine is designed to eliminate it structurally.
The causal replay engine builds higher-timeframe bars incrementally from raw data—a bar only “closes” when all its constituent data has been processed, exactly as it would in real time. At no point does the engine have access to a completed higher-timeframe bar before the underlying data has arrived. Signals generated at bar N produce entries at bar N+1; there is no same-bar execution.
All fills assume adverse-direction slippage with a realistic commission structure. This means backtest results represent a conservative estimate of achievable performance. This approach produces lower performance metrics compared to pre-computed data, which is the expected and correct behavior when artificial information advantage is removed.
Statistical significance in backtesting is necessary but not sufficient. The system must demonstrate robustness across multiple independent validation frameworks before live capital is deployed.
The following papers form the empirical foundation of this system. Each citation includes its relevance to the architecture described above.
Every claim on this site is backed by auditable data. These are the headline metrics from the causal backtest engine, validated across 38 months and 4,740+ trades on two timeframes with zero look-ahead bias.