Welcome back to our Research Insight series on everything Trade Ideas math and probabilities. Today’s discussion was a hotly requested topic from our Option Alpha Community. Traders wanted to know if all probability is created equal, and if not, where should they focus their trading? With the S&P 500 skyrocketing 10% to the upside in a few short weeks after the Fed's FOMC meeting, late October through November 2023 offered a fantastic opportunity to study this very question.
Right now is the best time for us, the impartial researchers, to evaluate how probability metrics are doing in some of the worst market conditions imaginable. Yes, you read that correctly. Outsized moves to the upside or downside are equally brutal for options traders attempting to remain neutral in the market. Long-only investors rejoice!
We will dive into calibration plots galore and discover how our coveted Trade Ideas have performed.
Collective Expected Value Over Time
To begin, let’s take a look at how the entire database of Trade Ideas Expected Value (EV) performance has evolved since we started the collection in mid-July 2023, a few days shy of the year-to-date peak in the middle of Figure 1.
If you’re just joining us, we launched Trade Ideas 2.0 on July 1st, 2023. Shortly after that, I started collecting the end-of-day Trade Ideas published at 3:55 PM ET and stored them in a database. Trade Ideas consists of the best short put/call spread and iron condor trade setups from approximately 155 underlying symbols anywhere from 1 to 45 days to expiration.
The intent is to let each trade setup expire, calculate its profitability at expiration, and then evaluate the opening metrics’ power in determining profitability at inception (probability, expected value, alpha, reward/risk, etc.).Â
One of the best visual tools we need is a calibration plot, in which we group trades into bins or buckets based on the metrics we’re studying and compare them to the “perfect” representation of that value. In other words, in a perfect world, if we expect to make $10 per trade, then on average, over thousands of trades, we should make $10 – that’s the perfect line.
In Figure 2, we’re observing the Expected Value (EV) of the growing Trade Ideas data set over time in approximately 2-week increments.
In the graph, we can see that as trade counts increased, the predictive power of EV was collapsed beautifully down to the perfect average up until mid-October. But in the spirit of full transparency, we’re showing you how the November rally killed the near-perfect line Trade Ideas had drawn on October 15th. The good news is that although the Average P/L dropped in November, it’s still consistently in positive territory since July. However, our analysis is only as good as our assumptions.Â
I have a theory: The nature of Trade Ideas generation is such that short-dated contracts will appear and expire more frequently than long-dated contracts. And if that’s true, then the more abundant and frequently expiring short-dated contracts are responsible for dragging down the average in the November run-up.
Let’s see if there’s any validity to that statement.
EV and Alpha Broken Down by Days to Expiration
To test my hypothesis, we need to break down the big calibration plot into discrete parts. Since Trade Ideas typically finds opportunities from 1 to 45 days to expiration (DTE), we can divide the DTE ranges into nine distinct groups of 5 days. First, we’ll look at the performance of Expected Value.
In Figure 3, we can see a pattern emerge. Two distinct groups or clusters formed: one that continues to track the perfect EV line, and the other trending toward the negative P/L territory. Interestingly, both groups are bifurcated at approximately the 20-day mark, meaning poor performance only starts to occur inside of 20 DTE.
In Figure 4, we test a similar methodology against Alpha, which is the normalized Expected Value defined as EV divided by the absolute value of the maximum loss. Alpha is quoted as a percentage and is most aptly described as the anticipated Return on Risk (RoR) over time. We can plot Alpha the same way we do for EV by comparing Alpha vs. Average RoR instead of EV vs. Average P/L.
The results here are less clear-cut than they were in Figure 3. However, there are definite similarities. The cluster above 20 DTE hugs the perfect RoR line much more closely than below 20 DTE. Very short-dated positions continue to be the least predictable, dipping well into negative territory.
The bright spot of both graphs is the reconfirmation of both +EV and +Alpha we found to be favorable in previous research. Both still exist, achieving positive RoR over time. Data points close to y-zero also have the highest sample density, so their averages are reliable. We posit the dip in higher Alpha may be a result of the November drag, but regardless still remains in a very favorable positive RoR range.
Since we’ve discovered the 20 DTE split, let’s dive deeper and investigate how it relates to the probabilities presented in Trade Ideas.
Splitting EV and Alpha Above/Below 20 DTE by POP
Let’s first look at ol’ reliable, our standard calibration plot for all Trade Ideas captured from mid-July through the end of November. We will apply a single filter to disregard any opportunities lower than 20 DTE, effectively splitting the dataset in half.
Figure 5 includes all expired Trade Ideas in the data set, up to and including through the end of November, split into $10 EV buckets. This graph looks exactly how I’d expect (and hope) a well-fitting EV trend line to look.Â
We may be on to something here. Let’s break down EV and Alpha by Probability of Profit (POP) ranges and split the results above/below 20 DTE. Recall that POP, as defined by Option Alpha, is the probability of any profit, or at least $1 in profit. In other words, it’s the probability that the underlying price is above or below the breakeven price at expiration.
Figure 6 is a fascinating image. It clearly shows the nearly identical clustering we saw in Figure 3, confirming the DTE theory.Â
Trade Ideas opportunities under 20 DTE are underperforming any/all opportunities at or above 20 DTE.
Looking at only the ≥ 20 DTE group, we can now see differences in probability performance. The more closely an individual line tracks the Perfect EV line, the better. The clear winners are in the 50-80% range. The 80-90% range had lackluster performance in comparison. But logically, this makes sense – it’s not possible to predict anything with 90% confidence all of the time.
But if that’s the case, why do Trade Ideas produce 90+% probabilities? Good question. The answer is imperfect modeling. We are trying to fit a non-Gaussian financial market that doesn’t want to be modeled into a Gaussian Black-Scholes (normal distribution) model. We know from the outset that it won’t be perfect, so we have to search for bright spots where it does work most of the time.
I’ve found it very difficult to convey the true nature of estimated probabilities to traders. Just because something is quoted as having a POP of 95% does not necessarily mean the position is going to expire in profit 95% of the time. We are right to be skeptical of that number.
Figure 7, which is identical to Figure 6 except adjusted for Alpha instead of EV, exhibits many of the same characteristics as the EV plot. We notice a similar performance and clustering of the groups above and below 20 DTE.
What’s interesting is that all categories appear somewhat logarithmic in nature, as if there’s an upper bound on actual RoR around 20%. In other words, trading higher than 20% theoretical Alpha did not produce a real average RoR higher than 20-25% in these extremely difficult market conditions.
Another bright spot on the graph is that regardless of which probability range and DTE group traded, positive alpha yielded an average RoR above the zero line in almost all cases. This means that regardless of how you do it, trading +Alpha taken to expiration should at least break even over time.
Conclusions
Our investigation into Trade Ideas' Expected Value (EV) and Alpha across various Days to Expiration (DTE) in the volatile late 2023 market reveals key insights. We observed a significant performance split at the 20 DTE mark, with longer DTE trades aligning more closely with our models, particularly in the 50-80% Probability of Profit (POP) range. Short-dated trades under 20 DTE demonstrated reduced predictability and profitability.Â
This analysis highlights the importance of considering DTE and moderate POP values in trading strategies, emphasizing a cautious approach to high-probability predictions. Overall, our findings stress the need for a nuanced understanding of probability metrics in trading, particularly in challenging market conditions.
It remains to be seen if short-dated options probabilities can be predicted accurately. But maybe the discoveries made in this article are a self-fulfilling prophecy. The volatility input we use for Black-Scholes is the 30-day standard deviation of returns, or 30-day historical volatility (HV). So it stands to reason that around 30 days to expiration is going to be the most accurate and predictive range. Perhaps it is flawed to think the 30-day HV can represent probabilities of short-dated options. But if that is true then so too must it be for black-box implied volatilities, which are usually comprised of a weighted average of strikes from at-the-money (ATM) in the front and back months.Â
An area I’d like to explore in future research is whether the predictive EV is more accurate by choosing a “reasonably good” timeframe to represent most options trades, like 30 days. Or perhaps an HV exactly equal to the days to expiration of the contract would be superior.
So far, all of the research we’ve done seems to indicate the most favorable opportunities for how Trade Ideas is currently defined to be:Â
- +Alpha trades
- POP between 50-80%
- > 20 DTE
Both the EV and Alpha results are reconfirmations from previous research we’ve done, so neither is surprising. However, the November 2023 outsized move to the upside has allowed us to focus on DTE and probability performance. For the first time, we’re seeing a real set of comprehensive guidelines emerge.