2017 Stats Thread

There are three lost points so far that really annoy me: 1 against RSL (even with a weakened side still should have gotten a tie, especially scoring 4 minutes into the game) and the two against the Revs (ugh). Having those 3 points would make NYCFC second in the table and in line to have 30 points at the midway point (with 27 points now, need 1 win in next 2 games) - putting them on target for 60 points and a sure top 2 seed.

In any case they will have to improve in the second half for that top 2 seed and stop blowing results.

It will be different without a brutal condensed away schedule and our players back from injury and international duty. I hate making excuses, but it's obvious what happened.
 
  • Like
Reactions: Ulrich
Here are the odds for our next 3-game stretch, based on predictions published on the 538 website.

https://projects.fivethirtyeight.com/soccer-predictions/mls/

I am a little surprised that our expected points (4.66) are so low for the next 3 games. It is actually slightly less than it was for our last 3 games (which were 4.80 vs. 4.0 actual). The 538 bot seems very pessimistic about our chances at New Jersey. I think people on this forum would be disappointed with anything fewer than 5 or 6 points, and with good reason.

[EDITED to update with correct chart]

Game Predictions.jpg
 
Last edited:
  • Like
Reactions: adam
Here are the odds for our next 3-game stretch, based on predictions published on the 538 website.

https://projects.fivethirtyeight.com/soccer-predictions/mls/

I am a little surprised that our expected points (4.77) are so low for the next 3 games. It is actually slightly less than it was for our last 3 games (which were 4.80 vs. 4.0 actual). The 538 bot seems very pessimistic about our chances at New Jersey. I think people on this forum would be disappointed with anything fewer than 5 or 6 points, and with good reason.

View attachment 7126
Can you help me understand the expected results section? Why should the outcome of the first two games impact the odds in the thrd game? Its three independent calculations right?
 
Can you help me understand the expected results section? Why should the outcome of the first two games impact the odds in the thrd game? Its three independent calculations right?

It is 3 independent calculations, but to make it fit in a typical excel sheet, I need to run the first 2 of them down 1 side. So, you see all 27 outcomes. Take for example the chances we beat Seattle, tie NJ and then beat Minnesota. That one is represented by WD on the left and W across the top. The odds that we win the first and draw the second are 12.8% (51% times 25%). Then you multiply that by the 54% chance we win the last game. Final odds of win-draw-win equal 3.8%.
 
I think there's a mistake. Just eyeballing it I noticed that you have the W-L-W chance at 3.6%. But W-L-W has to be the result with the highest chance, because we have a 51% chance of a win, followed by 48% chance of loss, followed by 54% chance of a win. No other outcome can be as high. Yet others are much higher, including L-W-W at 13.2%, which should be much less likely. Yet everything adds up to 100%, or close enough given rounding quirks, so maybe some got switched?
 
  • Like
Reactions: Midas Mulligan
It is 3 independent calculations, but to make it fit in a typical excel sheet, I need to run the first 2 of them down 1 side. So, you see all 27 outcomes. Take for example the chances we beat Seattle, tie NJ and then beat Minnesota. That one is represented by WD on the left and W across the top. The odds that we win the first and draw the second are 12.8% (51% times 25%). Then you multiply that by the 54% chance we win the last game. Final odds of win-draw-win equal 3.8%.
But I'm not certain the math is correct for the last calculation. For example, the chances of winning our first two are at 13.8%, so you would expect all of the outcomes across that row to add up to 13.8%, but they add up to 15.2%.
 
Its a strange way to look at it since it seems to imply that our odds in the third game are dependent on the results of the first two.
 
Its a strange way to look at it since it seems to imply that our odds in the third game are dependent on the results of the first two.
I think his methods are correct, but the format does give that impression, and more importantly, I think some of the lines are mixed up (or there are some other types of errors), which would explain the anomalies that SoupInNYC SoupInNYC and I noted. So the result of the third game is not actually dependent on the result of the first two, but the numbers in the table should be the likelihood of getting a W or L or D combined with the likelihood of each possible result of the first two games.

Turning back to Gator's sense that the likely result it low, I think it's just an effect of this type of analysis. Once you start multiplying percentages of multiple likely events the combined likelihoods just go down. We tend to think differently: The most likely result -- and better than 50% -- of each of the 2 home games is a win, so we bank 6 points. Then the combined likelihood of either a win or draw in the RB game also is >50%, so that should be a point, and we figure 7 points is a likely result. Which it is. It is a very likely result compared to all the others, but there are 27 possible outcomes, and even the single mostly likely outcome will be nowhere near 50% overall.

It is just really hard to get a high expected point value given the multiplicative effect. I did some calculations, and estimate we would need to have a 70%+ chance of winning each home game to get an expected point total in the 5.5 range. That's just never going to happen. So even though the method is correct, it tends to give low results, and given the small set of games, I think it diverges greatly from actual possibilities. I calculate the expected point vale of Game 1 by itself at about 1.78, Game 2 is 1.06 and Game 3 is 1.86. That adds up to 4.7 which is close to Gator's result.

But in fact we can only get 0, 1 or 3, in each game, so the expected values of Games 1 and 3 is roughly a full point away from an actual possibility in either direction. I think 3 games is just too small of a sample to soften the edges separating the combined average point value from the actual real world possibilities.
 
I think there's a mistake. Just eyeballing it I noticed that you have the W-L-W chance at 3.6%. But W-L-W has to be the result with the highest chance, because we have a 51% chance of a win, followed by 48% chance of loss, followed by 54% chance of a win. No other outcome can be as high. Yet others are much higher, including L-W-W at 13.2%, which should be much less likely. Yet everything adds up to 100%, or close enough given rounding quirks, so maybe some got switched?
But I'm not certain the math is correct for the last calculation. For example, the chances of winning our first two are at 13.8%, so you would expect all of the outcomes across that row to add up to 13.8%, but they add up to 15.2%.

Shit. You guys are totally right. I copied the tables for the last set of 3 games and pasted them to a new area, and it screwed up the calculations. Here is the correct table.

You will see the total probabilities sum to only 99%, which is due to rounding by 538 of the odds against Minnesota.

Game Predictions.png
 
I think his methods are correct, but the format does give that impression, and more importantly, I think some of the lines are mixed up (or there are some other types of errors), which would explain the anomalies that SoupInNYC SoupInNYC and I noted. So the result of the third game is not actually dependent on the result of the first two, but the numbers in the table should be the likelihood of getting a W or L or D combined with the likelihood of each possible result of the first two games.

Turning back to Gator's sense that the likely result it low, I think it's just an effect of this type of analysis. Once you start multiplying percentages of multiple likely events the combined likelihoods just go down. We tend to think differently: The most likely result -- and better than 50% -- of each of the 2 home games is a win, so we bank 6 points. Then the combined likelihood of either a win or draw in the RB game also is >50%, so that should be a point, and we figure 7 points is a likely result. Which it is. It is a very likely result compared to all the others, but there are 27 possible outcomes, and even the single mostly likely outcome will be nowhere near 50% overall.

It is just really hard to get a high expected point value given the multiplicative effect. I did some calculations, and estimate we would need to have a 70%+ chance of winning each home game to get an expected point total in the 5.5 range. That's just never going to happen. So even though the method is correct, it tends to give low results, and given the small set of games, I think it diverges greatly from actual possibilities. I calculate the expected point vale of Game 1 by itself at about 1.78, Game 2 is 1.06 and Game 3 is 1.86. That adds up to 4.7 which is close to Gator's result.

But in fact we can only get 0, 1 or 3, in each game, so the expected values of Games 1 and 3 is roughly a full point away from an actual possibility in either direction. I think 3 games is just too small of a sample to soften the edges separating the combined average point value from the actual real world possibilities.

I think there is a lot of truth to this - if you run the chances on 3 straight games, the mean outcome is going to be closer to 4-4.5 points than would seem to be the case. Still, I think 538's odds are too low for us. They still rate our defense poorly for some reason. I would think our real chances in each game are a little better than what they show.
 
I think there is a lot of truth to this - if you run the chances on 3 straight games, the mean outcome is going to be closer to 4-4.5 points than would seem to be the case. Still, I think 538's odds are too low for us. They still rate our defense poorly for some reason. I would think our real chances in each game are a little better than what they show.

They rate our defense poorly because the data are trailing for one or two seasons. So we have the dumpster fire of year two and maybe year one to contend with for our defensive weighting. That's going to give us a bad defensive rating, but I can see why they do it; more data points in the spreadsheet.
 
They rate our defense poorly because the data are trailing for one or two seasons. So we have the dumpster fire of year two and maybe year one to contend with for our defensive weighting. That's going to give us a bad defensive rating, but I can see why they do it; more data points in the spreadsheet.
The other rather huge factor is that our consensus is that our best 11 is top 1-3 in the league, and many would limit that to the top 1. So let's assume we're right. 538 has no data of us playing our best 11. Herrera has played 2 full games out of 15. Since we're close to halfway through the season, we could get to the last month before the data catches up to current status.
 
Interesting analysis, shows how "unlucky" NYCFC has been. NYCFC and CHI only 2 teams to have positive xG at home and on road.

Different xG models may produce slightly different results, but my tally is that we would be 12-3 based on xG. That would be tied for the best in the league with SKC. East as a whole would stack up like this:

NYCFC - 12-3
Orlando - 10-5
Chicago - 9-5
Toronto - 9-6
New England - 9-6
NYRB - 7-8
Atlanta - 6-7
Columbus - 7-9
Philadelphia - 6-8
Montreal - 5-7
DC United - 3-11


The other thing I had done is try normalize for home/away imbalance by projecting points for every team based on remaining home and away matches and using their home and away PPG to date. It fails to account for strength of schedule and some other relevant data, but it eliminates the home/away imbalance which is so relevant in MLS given the extreme skews. That produces the following projected point totals:

Toronto - 66.8
Chicago - 60.7
NYCFC - 55.3
Atlanta - 49.7
Orlando - 49.1
New England - 47.7
Montreal - 45.3
NYRB - 43.4
Columbus - 40.4
Philadelphia - 38.9
DC United - 36.8
 
Different xG models may produce slightly different results, but my tally is that we would be 12-3 based on xG. That would be tied for the best in the league with SKC. East as a whole would stack up like this:

NYCFC - 12-3
Orlando - 10-5
Chicago - 9-5
Toronto - 9-6
New England - 9-6
NYRB - 7-8
Atlanta - 6-7
Columbus - 7-9
Philadelphia - 6-8
Montreal - 5-7
DC United - 3-11


The other thing I had done is try normalize for home/away imbalance by projecting points for every team based on remaining home and away matches and using their home and away PPG to date. It fails to account for strength of schedule and some other relevant data, but it eliminates the home/away imbalance which is so relevant in MLS given the extreme skews. That produces the following projected point totals:

Toronto - 66.8
Chicago - 60.7
NYCFC - 55.3
Atlanta - 49.7
Orlando - 49.1
New England - 47.7
Montreal - 45.3
NYRB - 43.4
Columbus - 40.4
Philadelphia - 38.9
DC United - 36.8

I don't suppose you projected future point totals based on xG, with or without a H/A correction?
 
Looking at the xG table here:
http://www.americansocceranalysis.com/team-xg-2017/

It makes us feel good because NYC has the highest xG differential in the league at +9.4, and NYC's actual GD of 8 is less than the xG differential by 1.4, which suggests that we've been unlucky, a bit.

Then I look at some of the really big discrepancies between actual GD and xGD.
Seattle unlucky by -10
Dallas lucky by 7.5
TFC lucky by 8.5
Atlanta lucky by 13.6

This makes me wonder. Atlanta especially. Atlanta's "luck" is almost all on the goals scored side of things. They scored 27, with xG of 15. But xG is based solely on shot location, and ignores the defensive positioning. Atlanta scores a lot of fast break, counter goals where the defense is scrambling and out of position. those are very high probability shots in a way that xG ignores.
NYC does not score or generate shots nearly as much that way. NYC takes a lot of shots inside the box while the other team has defenders in place getting in the way. This happened repeatedly in both games last week.
xG doesn't capture this difference and seems like a major flaw.