In the dynamic landscape of financial markets, the generation of synthetic financial time series data plays a pivotal role for traders and strategy designers alike. It offers a controlled environment to test and refine trading strategies, ensuring model adaptability, and providing insights into performance under diverse market scenarios and economic conditions that may be difficult to observe in reality.

The challenge in generating synthetic data lies in the fact that most financial time series of interest exhibit ‘stylized’ features including fat tails (or ‘excess kurtosis’), meaning that extreme events or outliers are more likely to occur than in a normal distribution, and volatility clustering, the tendency for large returns to follow large returns (of either sign), and small returns to follow small returns. For example, consider Tesla (TSLA) prices over the period 1 Jan 2021 to 1 Jan 2024. The following figure shows the price series P(t), the corresponding return series, r(t) = (P(t) – P(t-1)) /P(t-1), and a histogram showing the distribution of returns. Volatility and fat-tails respectively can clearly be seen in the plots of the return series and the return distribution .

A naïve approach to generating a synthetic price series would be to model the distribution of returns, sample a sequence of returns from the model, and then convert the sequence of returns to a price series. But there are two major problems with this. Firstly, it is not straightforward to model—let alone sample from—fat-tailed (and possibly skewed) distributions. Secondly, and more importantly, even if we could successfully sample a sequence of returns, this will not replicate recurring patterns and statistical regularities such as volatility clustering or mean reversion (the tendency of prices to return to a ‘normal’ level). This is because returns sampled in this way will be independent and identically distributed. In reality, financial returns that display these stylized features are not independent, but are influenced by past returns. To capture temporal features such as volatility clustering and mean reversion we need therefore to estimate a ‘conditional’ distribution; that is, a distribution that takes into account previous values of returns.

While ARCH/GARCH models are used to model and forecast volatility, they have not typically been used for actually generating time series. Neural network-based approaches such as GANs (generative adversarial networks), VAEs (variational autoencoders) and RNNs (recurrent neural networks) have emerged as a popular approaches to synthetic financial time series generation but suffer limitations and difficulties including mode collapse and the requirement for large training sets. In contrast to neural network-based approaches, in which the conditional returns distribution is estimated implicitly, the UNCRi financial series generator uses a probabilistic approach in which the conditional distribution of returns is estimated explicitly, allowing it to capture features such as fat tails, volatility clustering and mean reversion that are observed in real financial data.

Daily close prices of Apple Inc (AAPL) from 1 Jan 2021 to 1 Jan 2024

The figure below shows prices series information for daily close prices of Apple Inc. shares (AAPL) on the NYSE for the 3-year period from 1 Jan 2021 to 1 Jan 2024. The top row corresponds to the historical series, the second and third rows to synthetic series generated by the UNCRi synthetic financial time series generator. From left to right, plots show (i) the return distribution; (ii) the returns series; (iii) the price series trajectory corresponding to the return series, (iv)  the autocorrelation of returns plot (which indicates whether any serial correlation is present), and (v) the autocorrelation of absolute returns plot (which reflects the presence and degree of any volatility clustering that may be present).

Historical series
Synthetic series

Daily close prices of S&P500 (^GSPC) from 1 Jan 2010 to 1 Jan 2020

In this case we consider a longer time period (10 years) and use an index (S&P500) as opposed to a single asset price.  As above, we show the original series and two synthetic series. Both synthetic series display fat-tails and volatility clustering similar to the original series, and again, the evolution of prices is realistic and reflects the general exponential pattern of price increase present in the original series.

Historical series
Synthetic series

The fat tails and volatility clustering in the synthetic series closely reflects that present in the historical data, and there is no significant serial correlation present in either the historical or synthetic series. The two synthetic price trajectories are quite different to each other and to the historical price series, but they are both confined to a reasonable range, indicating that the mean reversion has been captured.

Multiple Correlated Time Series

There are situations (e.g., portfolio management) in which it is necessary to consider the price series of multiple assets simultaneously. In many cases these asset prices will be correlated to some degree, either positively or negatively. So, it is important that the synthetic series derived from these are also correlated in the same way. Generating the series independently won’t work. However, the UNCRi framework is able to simultanously generate multiple correlated price series while maintaining the stylized features of each asset. The figure below show the original prices series and three synthetic price series for the tech stocks Apple (AAPL), Google (GOOG), Facebook (META) and Microsoft (MSFT) over the three-period from 1 Jun 2021 to 1 Jan 2024. The price changes of these stocks are strongly positively correlated with each other, as can be seen for the original price series. The synthetic series have clearly captured these correlations.

Original series
Synthetic series 1
Synthetic series 3
Synthetic series 2

For comparison, the following figure shows synthetic prices trajectories for the same four companies, but WITHOUT correlations.

UNCORRELATED price series for the 4 tech stocks

Conclusions

Financial time series often display stylized features such as fat tails, volatility clustering and mean reversion. To capture these features in a synthetic dataset it is not sufficient to model the unconditional distribution of returns; rather, one must be able to model the distribution of returns conditional on past values. The UNCRi framework has been designed specifically to model conditional probability distributions, and when applied to financial data is able to generate time series which accurately reflect those of the original series. It is also possible to use the framework to simultaneously generate multiple asset series which capture the correlations between those asset prices while at the same time maintaining the stylized features of the individual series.