How to Collect Quality Historical Data for Backtesting

Every backtest is only as good as the data behind it. Poor-quality historical data leads to false results, overconfidence, and wasted time. To build reliable strategies, traders need accurate, clean, and consistent data. This article explains where to get historical data, what to check, and how to prepare it for backtesting.


Why Data Quality Matters

  • Missing candles create fake signals.
  • Wrong timestamps distort indicators.
  • Unrealistic spreads inflate results.
  • Incomplete tick data hides slippage risks.

In short, bad data = bad strategy.


Sources of Historical Data

  1. Broker-provided data
    • Usually free, but may have gaps or limited depth.
  2. Specialized providers
    • Companies like Tickstory, Dukascopy, and Quandl provide high-quality tick and bar data.
  3. Exchange data
    • Best for stocks, futures, and crypto, since it comes directly from exchanges.
  4. APIs
    • Some brokers and services provide direct APIs for downloading market history.

Types of Data You Need

  • Tick data: Every price change. Best for scalping and HFT strategies.
  • Minute data: Balanced option for most retail backtests.
  • Daily/weekly data: Useful for swing and position trading systems.

How to Prepare Historical Data

  1. Check completeness – no missing days or hours.
  2. Normalize timestamps – make sure time zones match your platform.
  3. Include spreads and commissions – for realistic results.
  4. Filter out bad ticks – some feeds have outliers (e.g. price spikes).

Example of Bad Data Impact

A moving average strategy on EURUSD looks profitable with free 1-minute data (+25% annual return). But when tested on tick data with real spreads, the result drops to -3% per year. Data quality alone changed the outcome.


Best Practices

  • Always validate data before using it.
  • Use at least 5-10 years of history if available.
  • For forex, prefer tick data with variable spreads.
  • For stocks, always check dividends and splits are included.

Conclusion

High-quality historical data is the foundation of reliable backtesting. Collecting, cleaning, and validating your data ensures that your strategies reflect real trading conditions. Without it, even the smartest algorithm is built on sand.