## Replace ELO with Win/Loss ration

**Moderator:** Moderators

### Replace ELO with Win/Loss ration

The "How to preserve LELO's" thread has spawned a separate conversation. The weakness of ELO.

Problem: ELO is cumulative. If you win 51% of your games, you will have an ELO that climbs. Ultimately to any level given enough games. I'm not going to fully explain it, there is a significant amount of information on ELO.

Solution: Track and use Winning percentage use it, and a comparison to an opponents win/loss to handicap games the way we do currently with ELO.

Corollary proposals: Display Land won and total games played. Use strength of schedule.

I'll work this out more later. The guys I've been talking to can add to this.

Problem: ELO is cumulative. If you win 51% of your games, you will have an ELO that climbs. Ultimately to any level given enough games. I'm not going to fully explain it, there is a significant amount of information on ELO.

Solution: Track and use Winning percentage use it, and a comparison to an opponents win/loss to handicap games the way we do currently with ELO.

Corollary proposals: Display Land won and total games played. Use strength of schedule.

I'll work this out more later. The guys I've been talking to can add to this.

### Re: Replace ELO with Win/Loss ration

This is not true. If you win 51% of your games, and we assume for simplicity that you fight people at an average of 1600 Elo, then your Elo will naturally stop at 1607. At that point, instead of gaining 2 Elo per win and losing 2 per loss, you start gaining 1.96 per win and losing 2.04 per loss, for an expected value of 1.96*51%+(-2.04)*49% = +0.9996-0.9996 = 0. Your Elo will not change further over time.Nastyogre wrote:Problem: ELO is cumulative. If you win 51% of your games, you will have an ELO that climbs. Ultimately to any level given enough games.

Now, Elo does have weaknesses in comparing people in different time periods - chess has had an Elo inflation problem for a while, and we can even see it on our server to some extent(a 1650 Elo a week after cycle reset is far different than a 1650 Elo six months in) - but at any given time, Elo is a legitimate, statistically valid measure of player quality.

You're basically trying to remake the Elo system here, only without as much detail or statistical power. There's no shame in that - Arpad Elo was a statistics professor and a chess master, he would of course have a much better understanding of this stuff than you or I would. But this is not a good wheel to re-invent.Nastyogre wrote:Solution: Track and use Winning percentage use it, and a comparison to an opponents win/loss to handicap games the way we do currently with ELO.

Corollary proposals: Display Land won and total games played. Use strength of schedule.

The problem you're trying to solve is players wanting to lose games to drop their Elo, not the statistical measure itself. If losing is sometimes good and sometimes bad, then no statistical measure on the planet will measure player quality well. Specifically, the concern is players losing meaningless games to drop their Elo, and winning meaningful ones to reap the benefits of being a low-Elo player.

If you want a simpler solution than the ones we discussed on the other thread, just scale the k-value by the meaningfulness of the game. Instead of a flat k=4 that the server uses now, make it k=land swing/10 or something(where "land swing" = land attacker would gain for winning + land defender would gain for winning, to compensate for the 0% planet problem). That way if you're rocking out with a heavy company battle on a capital planet, and the swing is +50 CP for the winning player, it'll have way more of an impact on your Elo than some dinky light aero battle on a 0% world where the winner will only get 4 CP at best. So yeah, you can intentionally lose 25 of those aero battles to make up for the one capital fight if you really want to, but neither you nor your faction will gain anything from the process with regards to Elo modifiers, net CP, or anything else(besides the usual rewards of activity, of course, but nobody objects to active players earning RP and C-bills).

Member, Council of Six

### Re: Replace ELO with Win/Loss ration

You are right, I'm oversimplifying. But Elo inflation is a problem. Elo is very accurate with similar numbers of games. It's not when you have lots more game. Thus Baruk has a higher ELO than Chaser despite Chaser being more successful. Correct me if I'm wrong, but that is due to ELO inflation.

ELO is good because it is a way to compare strength of players.I don't think it properly handles strength of opponents. Your ELO will still rise if you beat weak opponents. It rises less as your ELO rises but it will still rise. An average player can then have the appearance of a Master by beating 1000 lesser players without ever playing a Master.

A new calculation that took into account opponent strength of opponents would be more accurate.

I don't like the idea of using land in the calc. I think we run the risk of pushing strong players right out as their success limits their future success and is accounted for multiple times. Using strength of opponents would take into account our seal hunting, cherry pickers. While not penalizing people for attacking smartly and defending tenaciously but being willing to cut losses when attacking a fresh target.

ELO is good because it is a way to compare strength of players.I don't think it properly handles strength of opponents. Your ELO will still rise if you beat weak opponents. It rises less as your ELO rises but it will still rise. An average player can then have the appearance of a Master by beating 1000 lesser players without ever playing a Master.

A new calculation that took into account opponent strength of opponents would be more accurate.

I don't like the idea of using land in the calc. I think we run the risk of pushing strong players right out as their success limits their future success and is accounted for multiple times. Using strength of opponents would take into account our seal hunting, cherry pickers. While not penalizing people for attacking smartly and defending tenaciously but being willing to cut losses when attacking a fresh target.

### Re: Replace ELO with Win/Loss ration

The reason why Chaser and Baruk have similar Elos despite very different player skill levels is that the k-value is too low. At k=4, which is what we have, an 80% player's rating will move by less than a point per game on average after only a few dozen games, despite being 200+ points below their proper level. It'd take 350+ games to get within 50 points of where they should be. That's way too long.

I made some simplifying assumptions(all games vs 1600-Elo opponent, averaging scores out so all games earn the mean score instead of varying wins and losses) and ran up an Excel sheet.

60% player "natural Elo" = 1672

60% player, 50 games, k=4 -> Elo 1617, improving 0.3 points/game

60% player, 50 games, k=8 -> Elo 1631, improving 0.45 points/game

60% player, 50 games, k=16 -> Elo 1648, improving 0.52 points/game

60% player, 50 games, k=32 -> Elo 1663, improving 0.33 points/game

80% player "natural Elo" = 1841

80% player, 50 games, k=4 -> Elo 1652, improving 0.91 points/game

80% player, 50 games, k=8 -> Elo 1692, improving 1.38 points/game

80% player, 50 games, k=16 -> Elo 1749, improving 1.66 points/game

80% player, 50 games, k=32 -> Elo 1800, improving 1.34 points/game

(The movement per game drops at k=32 because their rating is so much closer to natural)

IMO, k=16 would be about right. There'd be almost a hundred-point difference between Baruk and Chaser if we had that, instead of them looking equally strong, but a few games won't swing anyone too much. (If the variable-k suggestion I made above was used, then this would be a suggested average, of course).

I made some simplifying assumptions(all games vs 1600-Elo opponent, averaging scores out so all games earn the mean score instead of varying wins and losses) and ran up an Excel sheet.

60% player "natural Elo" = 1672

60% player, 50 games, k=4 -> Elo 1617, improving 0.3 points/game

60% player, 50 games, k=8 -> Elo 1631, improving 0.45 points/game

60% player, 50 games, k=16 -> Elo 1648, improving 0.52 points/game

60% player, 50 games, k=32 -> Elo 1663, improving 0.33 points/game

80% player "natural Elo" = 1841

80% player, 50 games, k=4 -> Elo 1652, improving 0.91 points/game

80% player, 50 games, k=8 -> Elo 1692, improving 1.38 points/game

80% player, 50 games, k=16 -> Elo 1749, improving 1.66 points/game

80% player, 50 games, k=32 -> Elo 1800, improving 1.34 points/game

(The movement per game drops at k=32 because their rating is so much closer to natural)

IMO, k=16 would be about right. There'd be almost a hundred-point difference between Baruk and Chaser if we had that, instead of them looking equally strong, but a few games won't swing anyone too much. (If the variable-k suggestion I made above was used, then this would be a suggested average, of course).

Member, Council of Six

### Re: Replace ELO with Win/Loss ration

So why wouldn't we want something closer to natural? Is that because it's more subject to Elo inflation?

Is this really valuing beating good opponents enough? One of the things I like about using strength of schedule/opponents is we could see that value and determine if the success of a player was due in large part because they played only weak opponents. Fine for a weak player to do that, it's like skill. Crappy for an ace to cherry pick. A system that mitigated seal clubbing would be optimal.

I'm not sure Elo does that. Sure it moves more or less according to opponent elo, but it doesn't reflect opponent quality.

If we did move to 16 or 32 as a K value, we would have to change the elo mod. Ace players would end up mitigated right out of relevance within a couple dozen games unless they played another ace.

Is this really valuing beating good opponents enough? One of the things I like about using strength of schedule/opponents is we could see that value and determine if the success of a player was due in large part because they played only weak opponents. Fine for a weak player to do that, it's like skill. Crappy for an ace to cherry pick. A system that mitigated seal clubbing would be optimal.

I'm not sure Elo does that. Sure it moves more or less according to opponent elo, but it doesn't reflect opponent quality.

If we did move to 16 or 32 as a K value, we would have to change the elo mod. Ace players would end up mitigated right out of relevance within a couple dozen games unless they played another ace.

### Re: Replace ELO with Win/Loss ration

Inflation is mostly a function of new players who are worse than the established vets being a natural Elo source - if the average newb is really a 1550, but they come in at 1600, every newb donates 50 points to the vet pool. Later in cycle = more newbs have joined = more Elo has flowed upwards. Like I said, it's a problem when comparing ratings over time, but at any given time Elo is perfectly good.

Remember that there's no pair of players where one has a 100% chance against the other. AC/20s to the head and triple engine TACs happen to the best of us. So yeah, a strong vet can play the seal-clubber, sure, but the occasional RNG-blessed newb will take a ton of points off them when they do win, and the losses will mean nest to nothing as the Elo spreads widen. When you're getting +1 for a win and -15 for a loss, that means it doesn't take many unlucky games to keep your rating in check.

FWIW, I'm not averse to tracking stats in other ways. If nothing else, it'll be interesting. I do think Elo is the best single number, but there's no reason we need to limit ourselves to single numbers. And you're totally right about changing Elo mods if we do change the k-value, because they're currently tuned for an artificially narrow Elo spread due to the extremely low k-value

Remember that there's no pair of players where one has a 100% chance against the other. AC/20s to the head and triple engine TACs happen to the best of us. So yeah, a strong vet can play the seal-clubber, sure, but the occasional RNG-blessed newb will take a ton of points off them when they do win, and the losses will mean nest to nothing as the Elo spreads widen. When you're getting +1 for a win and -15 for a loss, that means it doesn't take many unlucky games to keep your rating in check.

FWIW, I'm not averse to tracking stats in other ways. If nothing else, it'll be interesting. I do think Elo is the best single number, but there's no reason we need to limit ourselves to single numbers. And you're totally right about changing Elo mods if we do change the k-value, because they're currently tuned for an artificially narrow Elo spread due to the extremely low k-value

Member, Council of Six

### Re: Replace ELO with Win/Loss ration

Our K=8, not 4.Alsadius wrote:The reason why Chaser and Baruk have similar Elos despite very different player skill levels is that the k-value is too low. At k=4, which is what we have, an 80% player's rating will move by less than a point per game on average after only a few dozen games, despite being 200+ points below their proper level. It'd take 350+ games to get within 50 points of where they should be. That's way too long.

I made some simplifying assumptions(all games vs 1600-Elo opponent, averaging scores out so all games earn the mean score instead of varying wins and losses) and ran up an Excel sheet.

60% player "natural Elo" = 1672

60% player, 50 games, k=4 -> Elo 1617, improving 0.3 points/game

60% player, 50 games, k=8 -> Elo 1631, improving 0.45 points/game

60% player, 50 games, k=16 -> Elo 1648, improving 0.52 points/game

60% player, 50 games, k=32 -> Elo 1663, improving 0.33 points/game

80% player "natural Elo" = 1841

80% player, 50 games, k=4 -> Elo 1652, improving 0.91 points/game

80% player, 50 games, k=8 -> Elo 1692, improving 1.38 points/game

80% player, 50 games, k=16 -> Elo 1749, improving 1.66 points/game

80% player, 50 games, k=32 -> Elo 1800, improving 1.34 points/game

(The movement per game drops at k=32 because their rating is so much closer to natural)

IMO, k=16 would be about right. There'd be almost a hundred-point difference between Baruk and Chaser if we had that, instead of them looking equally strong, but a few games won't swing anyone too much. (If the variable-k suggestion I made above was used, then this would be a suggested average, of course).

Never had much, grew up with nothing

But the music, well it was something

Been down and out, I've been on top of the world,

World that keeps on spinning on a turntable.

But the music, well it was something

Been down and out, I've been on top of the world,

World that keeps on spinning on a turntable.

### Re: Replace ELO with Win/Loss ration

Is ELO creep just accountable to "donated" ELO when s new player isn't properly ranked?

### Re: Replace ELO with Win/Loss ration

Ah, sorry. Good to know. I thought it was 4 from some previous threads I dug up on the topic(specifically, #9 in this thread - perhaps I mistook what was meant there).Spork wrote:Our K=8, not 4.

That was my understanding, yes. A plain vanilla Elo system will never add or remove points from the system as a whole due to games played, it'll only redistribute them. The only source of new points is new players, and the way for existing players to grow their rankings over time is to take points from those new players. The average of the player pool(if you include the players who have left) will always be 1600, it's just how those points are divided that changes.Nastyogre wrote:Is ELO creep just accountable to "donated" ELO when s new player isn't properly ranked?

Member, Council of Six

### Re: Replace ELO with Win/Loss ration

Alsadius wrote:Ah, sorry. Good to know. I thought it was 4 from some previous threads I dug up on the topic(specifically, #9 in this thread - perhaps I mistook what was meant there).Spork wrote:Our K=8, not 4.

That was my understanding, yes. A plain vanilla Elo system will never add or remove points from the system as a whole due to games played, it'll only redistribute them. The only source of new points is new players, and the way for existing players to grow their rankings over time is to take points from those new players. The average of the player pool(if you include the players who have left) will always be 1600, it's just how those points are divided that changes.Nastyogre wrote:Is ELO creep just accountable to "donated" ELO when s new player isn't properly ranked?

The ELO exponent to 4 is the slanting of outcomes. A lower exponent there means a more "pure" payout of land and cbills.