Ablation: Predictions Based Solely on Lower Court Data

September 19, 2025 • jed
Often it will be of interest to predict what the Court would do before oral argument, before amicus or party briefs arrive, or before even a cert petition. A lower court judge might wonder, what would the court do with this decision? Or a party might wonder, is it even worth filing a cert petition? For this kind of prediction objective, we have relatively few features to work with: mostly basic information about the lower court and its decision, along with the learned voting behavior of judges. Notably missing from this analysis would be data from the briefs, amicus positions, or oral arguments, all of which may be predictive of voting patterns. Earlier, I created a customized model to predict votes using those near-decision features, yielding competitive performance at the vote level and SOTA performance at the case level. Vote level accuracy was 71 percent and case level accuracy was 79 percent. Now let’s ablate those near-decision features and see how performance changes. As expected, performance degraded substantially without near-decision features. Applying the model to the held out OT2024, the vote-level accuracy drops to 67 percent, and case-level accuracy drops to 71 percent. More detailed vote and case level performance results are below. Here is the vote-level performance.
ClassPrecisionRecallF1Support
Affirm0.530.270.36173
Reverse0.690.880.78328
Macro (weighted)0.56
Macro (unweighted)0.63

Vote-level performance

And here is the case-level performance.
ClassPrecisionRecallF1Support
Affirm0.430.20.2715
Reverse0.760.90.8241
Macro (weighted)0.67
Macro (unweighted)0.55

Case-level performance

Calibration also worsens without the near-decision data. The vote-level Brier score is 0.21 (up from 0.19 in the fuller model). The vote-level calibration plot is below.
Vote-level calibration
Vote-level calibration
Okay, two items next up. I have two ideas to improve model performance I want to try. Then I will forecast the term that just started using the best model. Until soon.