Data collection and analytics in eventing

punchy · September 1, 2020, 12:13am

aycaramba:

@Janet , thank you, I will reach out to them! Hopefully they can make that data available.
@enjoytheride , that’s a good point about yellow cards, maybe a clear rubric or criteria could help (I’m not sure if they currently have that, but that’s a good idea to look into it). I think it would help to even record warnings that don’t result in yellow cards. And you do have a good point about the jump judges - it’s easy to forget that it’s not always horse people. Maybe we could create greater incentives for riders to jump judge, or create more flexibility for them to jump judge for just one level’s worth of divisions (I know time commitment is the biggest reason I don’t jump judge more often - if I am riding, it’s hard to volunteer even an entire half-day). Or maybe we could collect data in a different way, but jump judges seem like an underutilized resource if we want jump-level data since they are already there at each fence. This kind of brainstorming is exactly the kind of start we need.

Video recordings are so ubiquitous, why don’t officials watch videos of riders after an event and call out any behavior that wasn’t noted or marked on the day? You can give a yellow card or a warning a week later.

RAyers · August 30, 2020, 10:21pm

You all do realize we were hashing this out 15 years ago? IFG even developed a rubric to enable statistical comparison for all incidents.

Record keeping at events is spotty at best and wholly incomplete at worst. There is no consistent reporting, e.g. uniform nomenclature, that allows effective comparison.

I’ll support you all if you can get it done this time. But as one who has a couple of clinical research trials going, I can tell you that the data you need to collect will need to be consistent and clear (IFG is the person who can create that).

Data analytics is only as good as the available data. And that is where this effort will fail without the development of a consistent system and training of TDs to fill out the required forms correctly (welcome to trying to do epidemiology).

Additionally, you need to have upfront the statistical parameters that will be measured (a parameter is not a measurement) in a matrix that enables testing. Think about exactly what your hypotheses will be and what parameters will need to be quantified in order to compare those parameters.

enjoytheride · August 30, 2020, 10:24pm

15 years is a long time. Maybe now is the time for people to listen. We are who changes our sport. Maybe those who tried in the past can give us a jump

anon69112984 · September 1, 2020, 12:14am

Seems like US and Europe are the same on this, what a shame

rockyriver7 · August 30, 2020, 10:25pm

totally in support of the effort! I’d love to see eventing as easily tracked and published as horse racing, and we have the tech to do it. Here are some articles that I remembered that might help your quest, or maybe contact the author of them to see where she got the information.
These have a data category of horse falls and rider falls broken down by event
https://eventingnation.com/eventing-analytics-the-math-of-moving-up-east-coast-edition/
https://eventingnation.com/eventing-analytics-the-math-of-moving-up-west-coast-edition/

I suspect the USEA and FEI have far greater data than they let on, it’s just gaining access to it that will be the challenge.

aycaramba · August 30, 2020, 10:25pm

First off, thank you to everyone who has contributed to this discussion! I have been reading all the responses and seriously thinking about this all week. I’ll respond more directly to people this weekend when I get some more time, but I do want to say that I’m working on a proposal of where I’d like to start with the data. One idea I have, and the one I think would be easiest to start with, is getting some data about every rider-xc jump-show combination and putting it through a machine learning model called a random forest. I think we could use this to give us a variable importance ranking, which would then at least point us in the right direction in figuring out which variables are correlated to falls the most. I’m putting together a list of the variables I’d like to use, how hard they would be to get in the current climate, and what an action item might be if we found it to be highly correlated to falls. This would not be used to advise any single person, but rather to advise the system as a whole on what to focus on. I’ll post the write up this weekend and would love to get everyone’s thoughts on it!

RAyers · August 30, 2020, 10:25pm

You still have to develop the parameters of the learning set. You just don’t take random data. Thus, you still need to have uniform parameters that may not be reported, e.g. type of fence - a vertical or palisade could be the same or different fences depending on who filled out the report. Hence the need for uniform reporting. Otherwise you will need to go through all written reports and develop your own nomenclature.

Make array of parameters. I bet you would have somewhere up to 25-50 with 10 having significant principle component effect.

Libby2563 · September 1, 2020, 12:15am

Thanks for those links, they were really interesting!

It would be great if the USEA or USEF could use the show results they already have to calculate and publish rates of horse falls, rider falls, refusals, and completion at each venue and level. Perhaps that would encourage some accountability for venues and course designers with abnormally high fall rates.

Janet · August 30, 2020, 10:29pm

There is a lot of analysis associated with the ERQI ratings, but it is focused on identifying horses that are likely to fall, rather than fences that are likely to cause falls. You might be able to piggy back on their analysis.

RAyers · September 1, 2020, 12:16am

How do they actually do that? Do they incorporate the accident data into their algorithms? Do they take, fence, weather, and venue data? Are the rider qualifications considered?

Janet, how can the system identify specific horses? Has it ever predicted a horse and rider fall before it ever happened? What sort of validation is done? Has this ever been done at low level, small events, e.g. did it ever predict a fall at BN in Arizona?

Analysis without verification is just that. It is economics. It is man made numbers that GUESS as to an outcome.

Marigold · September 1, 2020, 12:16am

ERQI isn’t perfect, and to my knowledge it doesn’t include every single one of the variables you discussed above (although it is proprietary, so there is only so much information we have about the exact calculation). This is what has been disclosed publicly:

The ERQI value takes into account the class at which a horse is competing, the rider who is competing on the horse and the level of performance displayed by all of those who competed in the same class. An ERQI can react to whether a competition was statistically harder or easier than the average for that level of competition.

Source: https://www.eventingireland.com/Portals/0/EasyDNNNewsDocuments/936/ERQIs%20FAQ%20feb16%20v2.pdf

Early data indicates it is more effective than MERs alone as qualifications, explained here:

Eventing Ireland (EI) was the first and only national federation to utilize the ERQI during the 2016 season, targeting all national levels. They saw a 56% reduction at the national two- and three-star levels, with a staggering 66% reduction in horse falls at the national two-star level alone.

Source: https://eventingnation.com/equiratings-quality-index-uses-risk-analysis-for-a-safer-sport/

ERQI does not predict that a horse will fall, simply that a horse is at a higher risk of falling than others. That said, since you asked for an example of a horse and rider fall it predicted before it happened, one I know of was Lauren and Veronica at Rio. Equiratings had alluded to that combination several times, as clearly as they could without outright naming them, prior to that fall. Colleen Rutledge and Escot 6 at Kentucky in 2016 was another they had not-so-subtly flagged up prior to their horse fall there. That’s just off the top of my head.

If you would like to investigate further, additional links are below (depending on how much time you have on your hands):

Detailed explanation: https://useventing.com/news-media/news/equiratings-quality-index-explained
FAQs: https://useventing.com/safety-education/safety/equiratings-quality-index-faq
Podcast: https://usea.podbean.com/e/introducing-erqi-the-equiratings-quality-index/

RAyers · August 30, 2020, 10:34pm

Yeah. Yeah. I’ve seen and read all that. The questions still stand. You use a very high level example. How about prelim at St. John’s in New Mexico. You example is one where I would ask if there is a high level bias which suggest there is algorithm bias.

At the same time, how does one account for any officials bias, e.g. a dressage judge who is consistently 10% different than another?

EQRI is a good validation of risk management but is not actually a risk management tool.

I have yet to see how Ireland actually did their comparisons, was it normalized to participant rate? How do you know of that claim is a real effect of simply an outlier of random chance?

This is where the real research must be done if we want data driven rules and standards.

ReadOnRight · August 30, 2020, 10:34pm

I’d love to help out. I have a background in data analysis and would love to help where I can- been thinking along these same lines for a while.

clivers · September 1, 2020, 12:17am

Here it is in case anyone is having difficulty finding it:
https://inside.fei.org/system/files/Eventing%20Audit%20-%20Charles%20Barnett%20-%20Final%20Report%2026.07.16.pdf

gardenie · August 30, 2020, 10:43pm

https://useventing.com/news-media/eventing-tv/erqi-reports-for-officials-explained

FlaxenChestnut · September 1, 2020, 12:17am

Thanks for this link. It’s interesting - especially the risk analysis on fences. It would be nice to see this report updated every two to three years. I see the FEI has an online reporting tool that is supposed to be extended to Eventing this year.

Gnep · August 30, 2020, 10:44pm

I sat down for several years and collected videos of accidents. Probably 200 or so and did a stride by stride analyses.
Most of the time I look at as much video I could get of every ride and than concentrated on the jump and than how it developed during the ride.
I send those analyses to the FEI, USEF and so on. Hundreds of hours of work. 2008, 2009 and into 2010.
Never heard a word.
I might have killed the plastic big log imitation, when it was used at the WEG, by analyzing the video of the South American rider, which was heralded as such a big success.
The log nearly killed him.

Do not expect any response from the powers.
They are to much used to work on their self interest, which is big money and the good old boy network.

Gnep · September 1, 2020, 12:18am

A risk analysis of fence is BS. You have to understand the course design and the terrain and how the jump was or is used.
Besides eventing I did built jumps and worked with course designers, sobering experience.

FlaxenChestnut · September 1, 2020, 12:18am

A methodology was included in this report. The analysis included those factors (to the extent that they had the information) and made recommendations on how to improve the data collected. The report also recommends yearly analysis, as data collection is improved and updated.

Jealoushe · September 1, 2020, 12:18am

Sad but true

Menu

Unlimited access >

Menu

Data collection and analytics in eventing

Get your Digital Subscription here - only $29.99 / year

Sign up to our monthly newsletter!

Menu

Unlimited access >

Menu

Data collection and analytics in eventing

Get your Digital Subscription here - only $29.99 / year

Sign up to our monthly newsletter!

Follow us on

Sections