The Data
Let’s start by discussing the pool of race data we’re using in these new metrics as it’s a question that pops up often in discord:
- All 1600m race data excluding donkey tourneys, discovery races, and no-prize, non-tourney frees
- All other distance data since 8/24 excluding donkey tourneys, discovery, an no-prize, non tourney frees
The main reason we have to look at 8/24 for non-1600m data is the racing algo change on that date. Pre-8/24, the negative rolls were so frequent/strong that it added artificial variance to horses with stronger distance preference, making both VAR and DP much more convoluted and difficult to isolate. We’ll always have a bit of DP race roll noise in the data, but it’s effects are now minimal and we’re at least looking at a horses “skill” under the current algo, apples to apples. Because 1600m data is so crucial to BA calculation AND immune from DP noise, we’re using all of that data.
The Challenge
When trying to accurately approximate a horse’s base ability, distance preference and variance, sample size across all distances is king. Ideally with base ability we have good 1600 sample. With DP, strong sample on either side of 1600m. With Var, we just want as much data as possible. So simply said, the perfect horse has a ton of data everywhere. Well, that list is incredibly short. Like ~800 out of 200k+ short. Our challenge was to build a multitude of BA, VAR, and DP metrics with logic that would pull data from wherever it had the most sample and use the most supported one. Before we get to how that works, let’s look at the “race distribution” archetypes we were trying to solve for:
#1 Data everywhere (limited population but easy)
#2 Solid data at either extreme but little to nothing at 1600 (typically mid/low BA, mid/high DP…horses with enough DP they are running to their strong side to win and weak side to down)
#3 Solid data at one extreme and little to no data at 1600m or the opposite side (high BA, mid/high DP…usually low Z/high BA horses with no ability to down-class so don’t bother with the weak side)
#4 Little to no data anywhere (you’re screwed, we’re good but we’re not Aloha Tim)
In order to provide the most useful data we needed to calculate a shit ton of metrics to support each potential race archetype. For BA alone we have zBA (pop average for that Z/B/B), eBA (puts the parents’ zBA into our breeding formula), 16BA (1600 data), dpBA (puts BADP at extremes into a BA formula), and a logic BA that defaults to eBA until an acceptable sample is met in one of the others.
For DP, we required a u14BADP, o18BADP, shortDP, longDP, DP, and a logic DP that selects the highest sample option using a laundry list of conditions. I don’t mention any of this to confuse or wow, moreso to explain why it’s taken a bit longer than expected. Every metric spider-webbed into a need for a new metric or additional correlation research to tweak a formula here and there.
Perhaps the biggest challenge we face now is maintaining the competition weighting needed in an ever-changing meta. What used to change monthly turned weekly, and now every day the levels of competition in various class/fee races are changing. ZED implementing a SANE version of ELO was supposed to go a long way in helping us weight every single race by the ELO average of the horses in that race. Unfortunately, ZED’s “ELO” implementation has completely disconnected a horse’s ability from it’s ELO. At least in the short term, this is going to cause some turbulence. Until those ELOs represent ability, we’re forced to stay on our class/fee weighting system that is ALSO now fairly disconnected from the talent in each class. Addressing this is our top priority in the coming weeks/months.
The Results
A badass tool that gives you the best approximation of these racing traits in 3-4 summary metrics. Is it perfect, definitely not. But having spent months on this stuff, there will never be perfect. That said, it does the best it can with the data available.
Back to the 4 race distribution archetypes:
For archetype #1, we have everything we need and the results and metrics are straightforward.
For #2, we have a good sense of DP so we use that to back into BA using dpBA (using a formula that ended up being more complicated than averaging the two). This output ended up correlating to eBA about as strongly as 16BA does.
For #3, we’re just out here doing the best we can. We rely on the horse’s eBA for it’s BA metric and then use its higher sample u14 or o18 BADP score to back into a DP metric. With these horses, we know their ability at their stronger extreme, but we don’t know what proportion of that BADP score is BA vs DP. And without any way to approximate BA with its racing sample, the best we can do is use its eBA, now based on parents ability, to use the “most likely” scenario. We’re working on a few tweaks to improve the logic in choosing a BA metric in these cases. They’re usually strong DP horses (or they’d have 1600 data) and a wonky BA score can skew those scores abnormally high or low.
For #4, ffs please race more.