Sample Sizes and Arbitrary End Points

We are getting to the time of the year – if we have not already reached it – where anyone who follows baseball for a living (or hobby) will roll out their pre-season predictions for the upcoming season.  As we saw in 2012, it is best to not put too much weight in these things.   Nobody has a crystal ball and even more to the point, there is not one individual walking the earth who has the time, expertise, and knowledge to correctly analyze all 30 Major League teams before the first pitch of the season as even been thrown.  Nonetheless, these articles are written and we all read them.  We gnash our teeth in anger when a writer dismisses the Orioles’ chances while singing the praises of a writer that predicts the team is playoff bound.  These predictions aren’t a science but it can be fun and informative to read the different predictions that people arrive at.  It can also be fun (or infuriating depending on your outlook) to dissect and debate any opinion that runs in in stark contrast to your own. There is little doubt in my mind that we will read many articles before April 1st that have the Orioles near the bottom of the American League East with a record noticeably worse than the 93-69 mark the O’s put up in 2012.  Based on some of the articles that have been written already, some of the people that express that particular opinion will do so largely based on these two factors:

(1)    The Orioles had extraordinary success in one-run and extra-inning games during the 2012 season.  Their Pythagorean record suggests they should have won about 82 games instead of the 93 they actually won.

(2)    The Orioles failed to make any significant upgrades over the winter.

On the surface, it is a logical argument.  The Pythagorean record has proven to be a fine estimator of “expected” wins and large deviations between expected and actual record are ultimately uncontrollable and unsustainable.  It is very reasonable to say that the 2012 Orioles were an 82 win team that could have won 10 additional games with some good fortune and lost 10 more games with some fortune.  It is also true that the Orioles did not make many significant additions over the winter.  Put two and two together and it seems like a perfectly sound argument to state that the Orioles are likely to win around 82 games in 2013.

Only there is one glaring problem that people who use that logic tend to overlook.  82-wins is used as a “base” for the 2013 Orioles because we make the assumption that the Orioles of 2012 were an 82-win team and that the Orioles that begin the 2013 season are the roughly the same team.  The problem is that the Orioles that ended the 2012 season were a significantly different team than the one that began the season.  Not just in performance – the Orioles had an expected winning percentage of .455 pre-All Star break and .558 post-All Star brake – but in personnel.

For example, as a trio Tommy Hunter, Jake Arrieta, and Brian Matusz made 60% of the team’s starts during the first half of 2012 to very poor results.  They made only 10.5% of the team’s starts in the second half with pitchers like Miguel Gonzalez, Chris Tillman, Zach Britton, and Steve Johnson picking up the majority of their starts to much better results.  Hunter and Matusz are not expected to start the season in Baltimore’s rotation and likely only end up there if the team is ravaged with injuries.  Arrieta will only find himself in a significant starting role if he pitches significantly better than he did in 2012.  Gonzalez and Tillman are virtual locks for the rotation while Britton is the favorite in many fan’s eyes for the 5th starter spot come April.  The 2012 Orioles were an 82-win team with three very poor starters in the rotation for half of the season and several solid performing starters in the rotation for only one-half of the season.  Why should the negative contributions of Arrieta, Matusz, and Hunter count against the 2013 Orioles when none of those three are expected to be significant contributors to the 2013 rotation (at least not if they perform to 2012-levels)?

That is somewhat of a rhetorical question.  The answer is that statistical analysis in baseball is very aware of sample size these days.  We know that player performance can fluctuate significantly from year to year and can fluctuate even more dramatically from month to month.  As humans, it is extremely difficult – if not impossible – to be completely objective.  We see what we want to see a lot of times.  If a General Manager wants a player to have a good 2013 season because he is on his team, he might ignore that the player has been woefully average throughout his career and concentrate on the fact that he had a really strong September the year before which could lead to a break out next season.  That September success is likely just noise and the player’s career averages are more in-line with what we should realistically expect from the player in 2013.  The larger the sample size, the less noise there is and the more accurate picture we get.

In trying to avoid small sample size bias, however, an inappropriate larger sample can sometimes be utilized.  I believe this is what is happening with some of the analysis being done with the 2013 Orioles.

I would like to assume that most baseball writers who are projecting future performance are aware – at least on some level – that the Orioles were a stronger team during the latter months of the 2012 season.  I’d imagine that many even realize that increased performance was largely a function of player personnel turnover that occurred during the season (ie. the rotation changes mentioned above as one example among many others).  The Orioles that begin the 2013 season are largely the Orioles that ended the 2012 season, along with some additions that are largely viewed as positive (a potentially healthy Nolan Reimold for example).  They are the Orioles that had a .558 expected winning percentage – not the Orioles that had a .455 expected winning percentage in the first months nor the Orioles that had a combined .507 expected winning percentage over the entire season.  Analysts prefer to look at 6-months of data over 3-months of data to avoid small sample size bias but in doing so, they are looking at a sample that is not representative of what it is they are trying to analyze.

The truth is that if you want to get a good baseline prediction of how the 2013 Orioles might perform, the only real sample we have is the final two-three months of the 2012 season.  Those months were the only period when the group of players expected to start the 2013 season for the Orioles were largely together.  Analysts are uncomfortable relying on such a relatively small sample size, which is a perfectly valid concern.  The answer, however, is not to utilize a larger sample size that is misrepresentative of the population which is what they do by taking the using the entire 2012 season in their analysis.  The most accurate evaluation of the 2013 Orioles – using 2012 performance as the starting point – is probably something along these lines:

“The 2012 Orioles had a Pythagorean record of 82-80.  However, the team experienced significant turnover during the course of the season.  Using a roster that is largely representative of the roster the team will begin 2013 with, the Orioles had .558 expected winning percentage post-All Star break in 2012, which works out to a 90-win pace over a 162-game season.  Given that the team that ended the 2012 season was significantly different from the team that began it, it is not necessarily accurate to use performance from the full 2012 season in evaluating the 2013 Orioles.  At the same time, a sample size of 2-3 months of strong performance is not nearly large enough to draw any significant conclusions from.”

From that jumping off point, it is perfectly reasonable for anyone to make an argument for the Orioles actual talent level being worse than what they demonstrated over the final two months, the same, or better.  The problem is, few of the articles I have read so far make that clarification and my guess would be that few future articles will as well.  Instead, the articles begin from the starting point of the 2013 Orioles being an “expected” 82-win team in 2012 which is not necessarily the case.  For example, someone might argue that the Orioles will have to use less talented pitchers due to injuries during the season so in the end, Tommy Hunter might very well make 20+ starts with an ERA over 5.50 like he did in 2012.  That is an argument that can be made – as long as the same evaluation on starting pitching depth is made universally for all teams – but at least the discussion would begin from the appropriate baseline.

We use a seasons’ worth of data because a full 162-game season provides nice and tidy beginning and end points.  The term “arbitrary end points” gets thrown around a bunch in baseball statistical analysis.  I can make a case for Player A being a great player because from May 5th to June 12th, he lead the league in OPS.  The problem with that is I am specifically choosing May 5th as the start date and June 12th as the end date because Player A lead the league in OPS during that period and I am trying to make a case that he is a great player.  I am leaving out – on purpose – his performance before May 5th and after June 12th so I am not telling the whole story – only the story I want to tell.  To avoid that, people tend to look at a full season of data.  Of course, aren’t Opening Day and the last day of the season arbitrary start and end points as well?

For example, if the Orioles continued on after the 5th game of the ALDS last season and played 40 more games before wrapping up their season in November, this entire conversation might be rendered moot.  Maybe the Orioles bag 25 more victories and reinforce the notion that they are a very good team.  On the flip side, maybe the team goes 15-25 in those final 40 games and helps strengthen the idea that their post-All Star break run was just a small sample size and not indicative of the team’s true talent level going forward.  Maybe those 40 additional games don’t give us any clearer of an answer.  The point being, we only use the end of the season as an end point because it is convenient.  We shouldn’t use it to write off performance from a small sample size.  It has yet to be proven that the post-All Star break sample was too small and not a true indicator of the team’s actual talent.  We simply do not know yet – it is to be determined.

Writers get paid to write these columns and make firm predictions.  Nobody is going to write “the Orioles’ situation is far more muddied and confusing than most other teams so I have no idea what their true talent level is” and I don’t expect them to.  Nonetheless, everyone would best be served to keep in mind that that the Orioles’ unique situation in 2012 in terms of roster turnover makes figuring out their true talent level heading into the 2013 season a difficult – if not impossible – task.