Leakage and Prediction Objectives

Most published Supreme Court vote predictions include near-term information, including oral argument and amicus data. The earlier predictions I posted included that information, too, and using that information the model roughly matched SOTA on the vote level and exceeded both algorithmic and human performance at the case level. However, using near-decision information is problematic for two reasons. First, depending on your prediction objective, if not carefully managed, there is a risk of leakage. For instance, some predictions use the date of decision as a predictive feature. But the outcome of the case might lead to the decision coming out at a different time. The Court typically issues its most contentious decisions in June. So knowing that a decision came out in June may tell us something about the likely outcome in the case. I did not use date of decision as a feature in the earlier predictions for this reason. Second, relatedly, it is not clear that the near-decision prediction is the relevant prediction objective. When do we want to be able to make a prediction? It depends. It is possible to imagine being in June and wondering how the blockbuster cases will turn out. If so, using all near-decision information is relevant and correct. We don’t care about the integrity of predictions for decisions issued in May or earlier, they’ve already passed—we just want to know what happens in the few remaining, known June cases. But there are other possible prediction objectives. We might, instead, imagine ourselves sitting at the start of the term and wondering how it will unfold. In that case, we will want to ensure integrity of predictions for all decisions and to carefully guard against leakage. We will also not have oral argument or amicus data. Predictions will be based solely on lower court information, possibly some but not all filings, and the learned behavior of the justices. Another closely related possible prediction objective is to imagine someone wondering how a lower court decision would fare before the Court, were they to hear the case. A lower court judge, for instance, might wonder what the Court would do if it reviewed their draft opinion. Or a litigant might wonder, should I appeal this lower court decision? In these situations, there would be only lower court information, no cert petitions or other briefs. I see this as the most minimalistic prediction objective of interest. Next, I will post predictions based on this minimalistic, lower-court-only prediction objective.