Generating synthetic financial time series can be a valuable tool in various aspects of financial modeling and analysis. For example, trading strategy designers may want to test strategies under extreme conditions such as market crashes or other extreme events. But the problem is that there is limited historical data available, and this historical data will contain few examples of such extreme events. Synthetic data can be used to fill these gaps.

The challenge in generating synthetic data lies in the fact that most financial time series of interest exhibit ‘stylized’ features including fat tails (or ‘excess kurtosis’), meaning that extreme events or outliers are more likely to occur than in a normal distribution; volatility clustering, the tendency for large returns to follow large returns (of either sign) and small returns to follow small returns; and mean reversion, the tendency for prices to correct themselves, returning to more ‘normal’ levels based on fundamentals like price/earnings ratios. For example, consider Tesla (TSLA) prices over the period 1 Jan 2021 to 1 Jan 2024. The following figure shows the price series P(t), the corresponding return series, r(t) = (P(t) – P(t-1)) /P(t-1), and a histogram showing the distribution of returns. Mean  reversion, volatility clustering, and fat-tails respectively can clearly be seen in the three plots.

Synthetic time series are typically generated not from the actual price series, but from the returns series. A naïve approach would be to model the distribution of returns, sample a sequence of returns from the model, and then convert the sequence of returns to a price series. But there are two major problems with this. Firstly, it is not straightforward to model—let alone sample from—fat-tailed (and possibly skewed) distributions. Secondly, and more importantly, even if we could successfully sample a sequence of returns, this will not replicate recurring patterns and statistical regularities such as volatility clustering and mean reversion. This is because returns sampled in this way will be independent and identically distributed, whereas in reality, returns are influenced by past returns. To capture temporal features such as volatility clustering and mean reversion we need therefore to estimate ‘conditional’ distributions; that is, distributions that take into account the previous values of returns.

While ARCH/GARCH models are used to model and forecast volatility, they have not typically been used for generating financial time series. Neural network-based approaches such as GANs (generative adversarial networks), VAEs (variational autoencoders) and RNNs (recurrent neural networks) have emerged as popular approaches, however their performance in this domain has been inconsistent, particularly lacking in their ability to capture features such as volatility clustering and mean reversion. They also suffer limitations and difficulties including training instability, the requirement for large training sets, and limited interpretability.

As with all the UNCRi tools, the UNCRi financial time series generator uses a non-parametric approach. Since there is no actual ‘training’ phase, it is more computationally tractable than parametric approaches, and far more modest in the size of the input dataset required. And because the conditional distribution of returns is explicitly estimated, it can be directly visualized, so interpretability is not a problem. Most importantly, it can easily capture long-term dependencies between returns, meaning that it can produce synthetic financial time series that are far more ‘realistic’ than those produced by parametric methods such as GANs and VAEs.

Daily close prices of Apple Inc (AAPL) from 1 Jan 2021 to 1 Jan 2024

The figure below shows prices series information for daily close prices of Apple Inc. shares (AAPL) on the NYSE for the 3-year period from 1 Jan 2021 to 1 Jan 2024. The top row corresponds to the historical series, the second and third rows to synthetic series generated by the UNCRi synthetic financial time series generator. From left to right, plots show (i) the return distribution; (ii) the returns series; (iii) the price series trajectory corresponding to the return series, (iv)  the autocorrelation of returns plot (which indicates whether any serial correlation is present), and (v) the autocorrelation of absolute returns plot (which reflects the presence and degree of any volatility clustering that may be present).

Historical series
Synthetic series

Daily close prices of S&P500 (^GSPC) from 1 Jan 2010 to 1 Jan 2020

In this case we consider a longer time period (10 years) and use an index (S&P500) as opposed to a single asset price.  As above, we show the original series and two synthetic series. Both synthetic series display fat-tails and volatility clustering similar to the original series, and again, the evolution of prices is realistic and reflects the general exponential pattern of price increase present in the original series.

Historical series
Synthetic series

The fat tails and volatility clustering in the synthetic series closely reflects that present in the historical data, and there is no significant serial correlation present in either the historical or synthetic series. The two synthetic price trajectories are quite different to each other and to the historical price series, but they are both confined to a reasonable range, indicating that the mean reversion has been captured.

Multiple Correlated Time Series

There are situations (e.g., portfolio management) in which it is necessary to consider the price series of multiple assets simultaneously. In many cases these asset prices will be correlated to some degree, either positively or negatively. So, it is important that the synthetic series derived from these are also correlated in the same way. Generating the series independently won’t work. However, the UNCRi framework is able to simultanously generate multiple correlated price series while maintaining the stylized features of each asset. The figure below show the original prices series and three synthetic price series for the tech stocks Apple (AAPL), Google (GOOG), Facebook (META) and Microsoft (MSFT) over the three-period from 1 Jun 2021 to 1 Jan 2024. The price changes of these stocks are strongly positively correlated with each other, as can be seen for the original price series. The synthetic series have clearly captured these correlations.

Original series
Synthetic series 1
Synthetic series 3
Synthetic series 2

For comparison, the following figure shows synthetic prices trajectories for the same four companies, but WITHOUT correlations.

UNCORRELATED price series for the 4 tech stocks

Conclusions

Financial time series often display stylized features such as fat tails, volatility clustering and mean reversion. To capture these features in a synthetic dataset it is not sufficient to model the unconditional distribution of returns; rather, one must be able to model the distribution of returns conditional on past values. The UNCRi framework has been designed specifically to model conditional probability distributions, and when applied to financial data is able to generate realistic time series which accurately reflect the stylized features present in the original series. It is also possible to use the framework to simultaneously generate multiple asset series which capture the correlations between those asset prices while at the same time maintaining the stylized features of the individual series. 

The examples in this case study have been based on using only numerical input data, specifically lagged returns. However, the UNCRi framework is very flexible, and it is straightforward to also include categorical features. Thus, in addition to returns we might also wish to include ‘auxiliary’ features which can be a mix of numerical and categorical. These might include fundamentals such as price/earning ratios, economic variables, sentiment, and so on.