By The Numbers: What We've Done, What Must We Do

Columbus is the only team in the east that has to play 4 western teams and play us twice. With good results in those games we can catch them. Every game against a team ahead of us in the standings who we can catch mtl, ne, col is a huge game just like the orl game was.

We were such a dramatically better team with pirlo and mix on the fieldinstead of ballouchy and calle. That quality is going to carry through the season. Nate silver has an nba model in which having good players healthy and available to play has a huge impact on likelihood to win and make playoffs. Something like that would work well for mls. Likelihood of winning with pirlo mix lampard iraola and a ngelino on the field is more significant than past perfornance without those players
. Huge game against montreal this weekend.
Agreed on the most important point: A win over Montreal, following the win over Orlando will be huge for reasons we've both covered.
As for the Nate Silver reference, I'd like to see someone try to build a model that incorporates changes to team strength on the fly. It's beyond my ambition.
I also think it's harder to predict soccer than other sports even when you have static team composition. I think soccer scores relatively low on 2 scales: (1) how likely is the team that is generically better overall to win any given game, and (2) how likely is the team that outperforms in a given game likely to win.
They don't go hand in hand. For example, baseball ranks low on #1 but high on #2. The better teams teams tend to win on average fewer than 60% of games but a team that dominates any given game, usually by pitching, wins a lot.
Soccer is hurt on #1 by the possibility of ties which cuts down on wins and losses, and on #2 by the somewhat random nature of goals going in. Dominating teams (game by game) win more than they lose or tie, but not as much as other sports I believe.*

FWIW, in my view a team that "parks the bus" and wins does not necessarily do so against the run of play. If you employ that strategy and end up flailing about yet win that's luck. But if you manage to control and set the tempo of the game then you outplayed your opponent that day.

In any event, I recommend checking out the site Midas Mulligan Midas Mulligan found above. I'm not sure exactly what info the site owner relies on or whether he incorporates roster changes on the fly, but he is clearly running multiple full season simulations and aggregating the results.
 
Agreed on the most important point: A win over Montreal, following the win over Orlando will be huge for reasons we've both covered.
As for the Nate Silver reference, I'd like to see someone try to build a model that incorporates changes to team strength on the fly. It's beyond my ambition.
I also think it's harder to predict soccer than other sports even when you have static team composition. I think soccer scores relatively low on 2 scales: (1) how likely is the team that is generically better overall to win any given game, and (2) how likely is the team that outperforms in a given game likely to win.
They don't go hand in hand. For example, baseball ranks low on #1 but high on #2. The better teams teams tend to win on average fewer than 60% of games but a team that dominates any given game, usually by pitching, wins a lot.
Soccer is hurt on #1 by the possibility of ties which cuts down on wins and losses, and on #2 by the somewhat random nature of goals going in. Dominating teams (game by game) win more than they lose or tie, but not as much as other sports I believe.*

FWIW, in my view a team that "parks the bus" and wins does not necessarily do so against the run of play. If you employ that strategy and end up flailing about yet win that's luck. But if you manage to control and set the tempo of the game then you outplayed your opponent that day.

In any event, I recommend checking out the site Midas Mulligan Midas Mulligan found above. I'm not sure exactly what info the site owner relies on or whether he incorporates roster changes on the fly, but he is clearly running multiple full season simulations and aggregating the results.
I think we would need access to utilization rates and something like WAR, but it still wouldn't help the NYC calculation. Even with those figures for our changed squad, there wouldn't be a way to normalize our player ratings into our team.

Other things that makes soccer particularly hard, and this goes to both your 1 & 2 predictability inhibitors, is that a relatively low scoring game is always going to be subject to more randomness. Additionally, as we all know, tactics on any given day tend to be a big factor. Finally, there's no other game where a single bad call or even a good call on a random misstep can effectively grant a team an undeserved win. Again, these are all underlying causes really of why it's unpredictable as hell.

Random is Johannsson missing a header from 3 yards in front of an open net, sending the U.S. supporters into doldrums. He buries that, the U.S. likely goes on to play Mexico and likely actually show up motivated. It's the cruelest of games.

Don't get me started on the idea of "laws" over rules. The idea of referee as fairness arbiter is insane to my mind. It's maybe the only point I'd concede to someone arguing that soccer is "un-American". There is so much left to interpretation. It can be maddening to those of us accustomed to having specifically applicable laws dictate outcomes rather than post facto applications of subjective interpretations.
 
Last edited:
It's two completely different questions. Mgarbowskis work tells us what we have to do to reach the playoffs. The question of predicting whether we will will specific games is totally unrelated to that.

I read recently, but I forget where that soccer is the least predictable of major sports due to low scores and a high level of random events.
 
Dominating teams (game by game) win more than they lose or tie, but not as much as other sports I believe.*
Saw this part of your post and it got me thinking a bit about other "American sports"
  • Baseball - I'm going to go ahead and ignore this one because baseball is boring
  • Football - I think this can be somewhat prevalent in football where one team dominates but is ultimately beaten. However, I feel its much different in football because there are usually reasons that show up statistically that help explain this (red zone efficiency, turnovers, special teams being the big items). I'll never forget my Steelers years ago losing to the Texans 24-3 when outgaining them 400+ yards to 47 yards. That's what 3 defensive touchdowns will do. But again, I don't think this falls as much into your reasoning with soccer above, because there are definitive statistical means that help to explain why one team dominated yet still lost
  • Basketball - This one is plain and simple, the better team almost always wins. Especially if one team "dominates", there is no chance they are losing, its just the nature of the game. I'm sure there are cases where one team may play better in a game, miss a couple key shots and have some calls go against them and lose, but I would argue that their performance would be if anything, marginally better than the opposing team.
  • Hockey - I don't really know a ton about the sport, but I would chalk up cases where one team dominates and loses to the other generally as running into a hot goalkeeper, doesn't let anything past. Goals in hockey can be as random as they come with deflections and whatnot, but again, I think its the performance of one individual that explains the outcome
In summary, I definitely agree that results in soccer can be a lot more random and difficult to explain/project with statistical modeling.
 
This is a good discussion, and one I've thought about quite a bit.

I believe soccer provides the most random of outcomes, largely because of the low scoring. You are much more likely to have an upset in soccer than in any other sport.

I think baseball is a close second, largely because the outcomes of its different plays are so affected by chance - baseball is essentially a game of playing odds over a large number of occurrences (left handed batter vs. right handed pitcher, for example).

For the opposite reason, basketball has fewer upsets than other sports because of the large amount of scoring. I think American Football has comparatively few upsets, largely because there is less randomness in the outcome of different elements of the game.
 
  • Like
Reactions: Sabo
When we think about the rest of the season, and what is a reasonable expectation regarding outcomes, let's not forget the following.

Our record to date includes 3 games without David Villa, and 4 games when he was hurt and didn't go the full 90.

With a healthy David Villa, we are 6-3-5, with 23 points on 14 games, or 1.64 ppg. Without a fully healthy David Villa, we are 0-6-1, with 1 point on 7 games or 0.14 ppg. If we simply perform at the same level we have with Villa playing full speed, we will end up at 45 points. To get to the "magic level" of 42 points, we need to average 1.38.

Of course, our closing schedule is tougher, and we may well end up losing Villa or another key player to further injuries. Nevertheless, our results to date were almost entirely without Pirlo, Lampard, Iraola, and Angelino, so there is every reason to be optimistic.
 
Responding to all 4 of Midas Mulligan Midas Mulligan T Tom in Fairfield CT SoupInNYC SoupInNYC and Gotham Gator Gotham Gator. All make strong points.
  • I can't believe I forgot to mention the effect of low scoring games and randomness. This is often discussed in the context of baseball where a team that wins or loses a lot of 1-run games will be expected to revert to the mean the next season. Definitely a soccer factor.
  • Ref effects. Huge in soccer. I read an article once that made the case that home field advantage correlated perfectly with the fact that home teams get more penalty kicks. When you controlled for that, the home advantage nearly disappeared. It was based on a study of English league games going back decades.
  • The various analyses of different sports is spot on and I would add a couple of points. Basketball and football score high on the better team winning in part because effort directly correlates to result. So if a team with more skill works hard they will most often win. OTOH, you can actually say a baseball player is "trying too hard" and make sense. Hockey is similar to soccer in that effort correlates to scoring opportunities but not always scores, and as Soup noted, goalies in hockey also have a big impact, more so generally than soccer keepers. Bball has highest pct of better teams winning over gridiron football because it is less susceptible to the big play effect. Every possession and play is equal. In football 5-6 plays in the course of a game can have more impact than the rest of the game combined.
By the way - so happy that folks picked up on this. I've been thinking about this type of stuff -- randomness/effort/predictability in sports -- since I was a kid and have rarely found others who find it interesting.
 
By the way - so happy that folks picked up on this. I've been thinking about this type of stuff -- randomness/effort/predictability in sports -- since I was a kid and have rarely found others who find it interesting.

You must not be a baseball fan. You would have had a lot of fun on fan forums over the past 15 years.

I used to be a statistical analyst for fantasy baseball with RotoWire right out of college. So much fun.
 
You must not be a baseball fan. You would have had a lot of fun on fan forums over the past 15 years.

I used to be a statistical analyst for fantasy baseball with RotoWire right out of college. So much fun.

picked up on this. I've been thinking about this type of stuff -- randomness/effort/predictability in sports -- since I was a kid and have rarely found others who find it interesting.

Baseball IS the definitive sport of statistical analysis, they drive down so far in some cases the statistics even get pretty meaningless. For example:

Mgarbowski is hitting .800 on Tuesday day games against left handed pitchers who are coming off of four days rest and over 28 years old.
 
Baseball IS the definitive sport of statistical analysis, they drive down so far in some cases the statistics even get pretty meaningless. For example:

Mgarbowski is hitting .800 on Tuesday day games against left handed pitchers who are coming off of four days rest and over 28 years old.
Well, I'm definitely over 28 years old ;). But I take you point.
 
Baseball IS the definitive sport of statistical analysis, they drive down so far in some cases the statistics even get pretty meaningless. For example:

Mgarbowski is hitting .800 on Tuesday day games against left handed pitchers who are coming off of four days rest and over 28 years old.

Any serious statistician would obviously agree with you that's a meaningless split. Even just L/R splits for batters, you usually need at least two seasons of samples to get any meaningful insight into the hitter's ability, even for low confidence intervals.
 
You must not be a baseball fan. You would have had a lot of fun on fan forums over the past 15 years.

I used to be a statistical analyst for fantasy baseball with RotoWire right out of college. So much fun.
Until the last 5-10 years baseball was my favorite sport, but even though it has always been the most stat-happy, it has not always been a haven for probability analysis. When I was a kid-teenager (1970s) nobody was talking about this. In fact if you tried to argue that stringing hits together into a one-inning rally was random rather than clutch people got offended. Some still do but not as universal.
Bill James and Elias started publishing in the mid-80s when I was a young adult and I immediately grabbed on to what they were doing but it took a while to catch on. By the time it was mainstream I was losing interest in the sport for a variety of reasons. Among the reasons was the fact that everything they've done with schedules, divisions, interleague play and playoffs for the last 20 years has multiplied the randomness factors to the point where winning the World Series is oddly disconnected from being the best team.
 
  • Like
Reactions: sbrylski
Any serious statistician would obviously agree with you that's a meaningless split. Even just L/R splits for batters, you usually need at least two seasons of samples to get any meaningful insight into the hitter's ability, even for low confidence intervals.

Serious statistician yes, ESPN/YES/SNY commentators nope. That means to them he is going to get a hit right now and they are surprised when he doesn't.
 
mgarbowski mgarbowski You should apply to be the Nate Silver of Yes Network or work in house for NYCFC. Staring Poku over Grabavoy has a 74% more likely chance that we win this game.
 
  • Like
Reactions: mgarbowski
I just have to ask, who on this forum (and especially on this thread) have some kind of stats backgrounds? I really would love for people to respond.

Two years ago I started a social media website for sharing visual data. While I was more focused on politics at the time, I think this is the community I was looking for.
 
I just have to ask, who on this forum (and especially on this thread) have some kind of stats backgrounds? I really would love for people to respond.

Two years ago I started a social media website for sharing visual data. While I was more focused on politics at the time, I think this is the community I was looking for.
4 years as an analyst in various investment related companies, a few more as a consultant. Not stats per se, but modeling my ass off.

If the devil challenged me on Excel as he's been known to do with fiddles, I'd have a PCof gold.
 
Studied some in college. Used to be able to create polls run regressions and set up cross tabs but now i just know enough to be full of BS (and not to have a lot of confidrnce in statistical analysis using small sample sizes).
 
Studied some in college. Used to be able to create polls run regressions and set up cross tabs but now i just know enough to be full of BS (and not to have a lot of confidrnce in statistical analysis using small sample sizes).
Sounds about right for me. I either know how to do it, but not what it means, or vice versa
 
I dont think anyone knows how to quantify it, but dont we all agree that we are a hell of a lot more likely to win any game in which we have Pirlo on the field?
I'd say Pirlo adds a 47.65% greater chance of winning, but that's just a made up number with purposefully meaningless and deceptive precision thrown in for effect.
My own background and experience is that I'm self-taught, capable of greater complexity than I've shown here, but by no means qualified as an expert and won't even pretend to do regressions. There are fewer than 1-2 dozen practicing lawyers in the country who could match my Excel output at work, and while that's another made up number I do have very high confidence in it.