Musings on the meanings of scores

I am not interested in “passing the L-judge with distinction.” I am interested in training horses.

So tell me how is that a negative if I’m a hard-ass in training to make sure that we nail the movements 1000%

Accoring to the USDF article…Jack LeGoff once said "Americans
don’t want to learn how to ride but how to compete."

3 Likes

Whether it is “necessary” or not is a matter of opinion and open for discussion…and everyone is entitled to an opinion.

As far as whether it is possible to quantify…it is absolutely possible and not that difficult. I spent ~10 years doing this professionally and have certifications

1 Like

Some people offered to do this for free to USEF/USDF…so it is possible…but I do agree there is no will to act on this.

2 Likes

So it’s possible to quantify objective safety procedures.

It’s likely not possible to quantify who did the safety procedures with the most elegance, or the most aesthetic routine, or how much to penalize the person who had a scuffly slouchy walk en route to the fire extinguisher etc

8 Likes

As a famous statistician says, all models are wrong, but some are useful. And statistics is just modeling. It can be useful, but it’s not the be all end all…

16 Likes

George Box’ famous quote

Someone once told me, “If you are not keeping score, you are just practicing.”

We are dealing with an Olympic sport. I think the quality of the judging merits some scrutiny. As a statistician, aren’t you curious about what would be the Kendall’s Coefficient of Concordance (KCC) to account for the magnitude of the difference among scores in a set of judges?

1 Like

Explain to me, in simple terms, how this would work.
Bigfoot does an 8 flying change in front of me at C in a PSG test. It is clean, through, at the marker, but perhaps not expressive enough or the counter canter is not supple.
Now, you ask 1000 of my colleagues what their scores for the change would be? If they were not sitting at C, their perspective is different from mine. Bigfoot isnt a machine. We cannot use the other 87 changes at C he has done in other PSG tests to compare. The only way to compare is to do a real time, live comparison.
So, in a perfect world, 900 of my colleagues sitting at C at the exact moment would also give it an 8. Some may give it a 7 because they felt the cc before was lacking balance. Others may give it a 9. (ignore for a moment that we have half scores, because that makes it even more cumbersome). There is no absolute, correct answer as to the value of the change score. There is only an approximation.
That is the best we can do.

14 Likes

You have a bit of a git off mah lawn vibe. Stuff changes, some for better, some for worse. And I get it, it’s not going to make everyone happy. Personally I have a hard time watching people ride backwards to a fence, no matter how nice their horse jumps when they actually get TO the fence, so modern day hunters are not for me!! I think it sucks, but there’s a whole generation of people who are so happy to go slow, so there’s that.

But in the discussion about scores below 5, what’s lost in this conversation is sometimes more than one movement is grouped into a box. Sure you had a craptacular halt, but before and after that was not craptacular. The halt might have been a 2, but the collected trot and working walk was not a 2, leaving the judge to do some quick math that usually shows up in the gasp 5-7 range. Just a few days ago I had some painful errors in my CAI1* test at Live Oak (driving). In the double deviation (shallow serpentine for you ridden folks) my apex was slightly past the letter on the second deviation. MAJOR error. But I got the coveted 2 straight steps at E, nailed the testsays10mbutwereallywant11m and managed to get the pony nose to the rail before H… So part of that movement was BAD. Possibly even Very Bad. But there were a lot of things right and some downright good, so it was a 5.5 to 6 from the 5 judges. Not surprisingly, the 5.5s were from those who got an eyeful of the error past S.

Also, if you want some solid good old fashioned evil dressage scores, just take up driven dressage. I have been to most of the top shows in this country and have never seen an 80% I think it happens, but it’s kind of a unicorn, more myth than reality. I think there were a handful of 70 range scores at the last world championships for teams, but it was like 3 -4 teams if that.

6 Likes

Thanks, was 99% sure it was him but didn’t feel like looking it up and/or getting blasted for saying the wrong name.

And I’m not saying we shouldn’t analyze judge’s scoring. I do agree with you that it would probably be a good idea, especially as it’s an Olympic sport as you say. I just don’t think you’re doing yourself any favors in the way you’re presenting the argument for it.

3 Likes

Do you show or ride? Do you feel the scoring of your rides is accurate? Have you scribed recently?

A protocol to quantify variation would be done in a controlled environment, not at a show. So let’s take a group of judges (30 is a statistically good number) let’s say during license renewal. Put them in a room to watch a video of a 4-3 test (highest in the national tests).

Since we have agreed that there is a “judging standard,” we assume that the video has an “official” score for each of the movements according to “the standard” for each of the 22 movements.

Each judge enters their score. Their scores are compared with “the standard” for each of the 22 movements. A Kendall’s Coefficient of Concordance is given for each of the 22 movements. Same can be done for Collectives.

You can then make determinations on whether there is disagreement (lack of concordance) for some movements, you can see if there is variability with the standard, you can compute how much variability there is judge-to-judge.

Those variations would be topics to be discussed as to why the different judges see things differently and then some consensus opinion would be arrived at.

3 Likes

But your basic assumption that there is an ‘official’ score is incorrect. There is a range of acceptable scores. Who determines what the ‘official’ score is?

10 Likes

The head should remain in a steady position, as a rule slightly in front of the vertical , with a supple poll as the highest point of the neck

Here is the dilemma. The standard says “as a rule,” not “must be.” I ran into this years ago when I judged a little schooling show. A very good trainer rode a lovely TB that had a difficult head/neck connection that caused the horse to carry the nose slightly behind the vertical. The horse performed a lovely test. I did give a slightly lower score bacause of it, but now I probably would not to do that. If the horse is engaged through the back and poll high, I would overlook a horse slightly behid the vertical.

Not everyone would agree. Unfortunately, I’ve seen too many “classical riders” who pull the head up to get the poll high and nose in front of the vertical with the result being a dropped back and trailing hocks.

8 Likes

But the question wasn’t about how to train the movement, it was how to judge the movement. And according to the judging standards in place, your judging of the movement in question would be incorrect.

7 Likes

Ah!!! You have hit the nail on the head!

Early on this thread, I was postulating that there is no “standard” in dressage judging and that judges have a myriad of opinions. Then I was repeatedly told there is “a standard” by multiple posters, so that is what I went with. … and now you say there is lack of agreement even within the judging “Illuminati.”

As to who determines the “official score?” Well, I assume there are dressage decisionmakers, like the people who do the L-Judge’s training who would be qualified to say this movement is a “6”.

But regardless…you can still measure the amount of concordance among the rating judges.

1 Like

Minus the calculation of the Kendall’s Coefficient of Concordance, this is pretty much what happens in the L program and at Judges’ Forums and continuing education programs. Have everyone score it, then ask the group who gave it a 5? Who a 4? Anyone higher or lower? Then discuss why. Then compare to the official score and comments (i.e. those given by the very senior judges teaching the program). We know whether there is disagreement without any statistics. It wouldn’t hurt to do the calculations, but not sure it would help either except to formally reject or fail to reject a hypothesis that there was or wasn’t disagreement. In that case you would like as large a sample size as possible (all 900 of Dot’s colleagues?). You would also need videos showing all the possible things that a horse might do during a movement, for each possible combination of gaits, training basics, rider faults and errors, weather conditions, footing, time of day, etc. Then do this for every movement of every test :slight_smile: It would in fact be quite costly and probably not worth what it cost to most of us. Or, we could just trust that the lone person who gives it a 2 or a 9 and hears that everyone else gave it a 5 or a 6 (along with reasons why it was not either higher or lower) would learn from the discussion that they were an outlier and why. And hopefully the outlier would move back toward the standard the next time they saw a similar situation. Unless, like you, they think everyone else should shift away from the standard that the experts have agreed upon over time because one person thinks it their very low score is more correct.

9 Likes

Sorry I’m dense and like paragraphs so skimmed your post. I think the point @pluvinel is making is there Is a way to have a base standard. Then exuberance or whatever you want to call it will push that horse that has it over the other horse with all things being equal.

I think what is being argued is that correct work should be rewarded above all else and flashy fancy movement should be the tie breaker with all else being equal.

4 Likes

There IS a judging standard. You have quoted it several times. You are having difficulty because this is a HORSE in a dynamic situation. There is no “if horse’s head is BTV for 1.2 seconds that is a 4. 5.3 seconds that is a 3” etc. MANY different things factor into scoring, as Long Time Lurker mentioned.

Also, you are arguing with an actual judge who has been through the program and judges dozens of shows a year. Referring to judges as illuminati doesn’t exactly ender them to your request. What do you even mean by an “official” score?

13 Likes

YESSSSSSSS. I think the fact that someone who admits that they are a “hard ass” and would score movements markedly lower than other judges because they have their own standards and disagrees with the L program, yet at the SAME TIME wants an “official” score that all judges must be trained to reach is amusing.

12 Likes

That’s exactly how it does work now.

8 Likes