Proportionality-regret tradeoff in multiwinner elections

A recent entry on the Election-methods list about measuring multiwinner performance raised the question of how to exactly do so. The post suggested a weighting of happiness for coalitions of different size, but then commented that it was "sounding a bit like an election method definition". This page will try to get closer to an answer in a manner that's hopefully not very method-specific.

When electing a council of many winners, the outcome may have two qualities:

It can be proportional, which means that the opinions or composition of the assembly is similar to that of the people, to the extent possible, if voters prefer opinions like their own or candidates like themselves.
It can also be preferred, which means that the people, as a whole, likes the outcome.

It is not obvious how these can be defined in terms of each other. Rather, they oppose each other: a more proportional system will leave less freedom to pick candidates that the people as a whole likes, whereas insisting on a majoritarian standard for candidates may lead to the election of a group of centrists, an outcome that while perhaps fair, leaves little opposition. For that matter, it is not clear how much of the first to compromise to achieve an increment of the second.

Thus we can't define the qualities in terms of each other, nor find a common quality to combine the two. What option do we have? If the only qualities of interest are the two above, then we can say that: a method that does better on both is preferrable to one that does worse. In such doing, it is possible to construct a Pareto front of methods, as well as of the social optimum, since a simulation would know what this optimum would be, given the right means of measuring proportionality and majoritarian preference. That is what I've tried to do here.

To make this even possible, there has to be some way of measuring majoritarian preference, as well as proportionality. For the former, I have used normalized Bayesian Regret. There may be other means of measuring majoritarian preference - for instance, distance from the outcome of a good single-winner method applied in a majoritarian fashion - but to make it simple, I have used this. "Normalized" merely means that the regret is scaled so that the absolute worst outcome is set to one and the absolute best is set to zero. This is done for all rounds as a whole: if one round has candidates that are more similar to each other, picking a worse council affects the total score less than if the candidates are less similar.
For the latter - proportionality - I have used a similarly normalized version of the Sainte-Laguë index. This is a point of potential weakness: the Sainte-Laguë index has an enormous range, up to the millions, and one has to set a cutoff for groups with zero support, lest it blow up to infinity. Thus, "weak councils" may be too weak. The ideal would be to have something that behaves alike the Sainte-Laguë index, yet does not have those "features"; but for now, I am using the SLI. It seems to provide roughly expected results - I may try other indices later.

There also has to be an issue/utility generator that serves to construct a model of ideal proportionality, so we can check the methods' outcomes against it. For this purpose, I use a simple binary issue generator. The number of issues is set ahead of the round - call it k - and then the issue generator produces k probability values, each of which gives the proportion of voters that agrees with the issue in question. Then it generates the appropriate binary k-vectors for all the voters, randomly deciding whether each issue is yes or no according to probability given. Finally, it updates the probability values, in case random chance or rounding effects turned them inaccurate.
The ballot generation is then easy: every voter votes for every candidate, rating them according to the Hamming proximity - the number of issues minus the Hamming distance. Thus, if a voter agrees with a candidate on more issues than with another, the former is rated higher than the latter. These ratings also constitute the utility as by Bayesian Regret calculations.

The only thing remaining is to set the parameters. For this first attempt, I have set 256 voters, randomly between 1 and 3 (inclusive) binary issues (determined each round), 4 candidates, 2 seats (winners), and 30000 rounds. The results are, very briefly put:
Proportionality-regret tradeoff plot
Detailed data: election methods scores(Pareto front), D'Hondt without lists scores, QPQ scores, and the Social optimum Pareto front.

The election methods tested were:
QPQ - Woodall's Quota-preferential by Quotient, adjusted so as to permit the quotient type (D'Hondt, Sainte-Laguë, etc) as a parameter.
MW-Kemeny - My (extremely small-party-favoring) multiwinner method based on Kemeny, shown here.
SL-Kemeny - A hack upon MW-Kemeny where the winners are apportioned according to Sainte-Laguë, treating each cluster's ordering as a party list provided by that cluster. More information here.
FC-Kemeny - "Forced clustering Kemeny", an attempt to make MW-Kemeny less unfair, detailed here.
CFC-Kemeny - "Continuous forced clustering Kemeny", the continuous version of FC-Kemeny. Since there's no reason why the clustering assignment should be limited to integers, this should be more fair still.
M-Set Webster - "Monotone Set Webster", my (properly) unnamed monotone multiwinner method described in this post.
Set Webster - Like M-Set Webster, but with a DAC/DSC based tiebreaker instead of the margins phase. Not monotone.
R.Web/* - A class of "methods" that work like M-Set Webster, then breaks ties according to the secondary method.
Setwise Highest - Setwise Highest Average, a house-monotone combination of DAC/DSC and highest-average party list PR.
Quota Bucklin - The elect-and-punish scheme combined with Bucklin, this method keeps adding later preferences until someone passes the quota, then reweights and restarts with the candidate eliminated ("restart") or just ignores the candidate ("static").
Set PR Bucklin - An attempt at making a summable proportional method based on a combination of Bucklin and set constraints. It's not Droop-proportional, because shadowing can cause false positives.
PSC-CLE - Proportionality of Solid Coalition by Condorcet Loser Elimination. The rounding up has been changed to rounding off, which helps it somewhat.
Maj[*] - A meta-method that runs the single-winner method in question and fills the council with winners from the social ordering, starting at the top, until full. Named for its (usual) majoritarian quality.
DHwL(x, y) - A meta-method that applies the D'Hondt without lists method to base (Condorcet) method x, attenuating preferences with elected candidates by the factor y.
STV - The single transferable vote.
STV-ME - The multiwinner version of BTR-IRV, STV-ME holds a runoff between the numseats+1 lowest ranked when deciding whom to eliminate. See this post.
Warren STV, Meek STV - Variants upon STV.
Schulze STV - Schulze's vote-management resistant version of STV.
Range-Auction - Warren D. Smith's Range Auction method. Ordinary is as by the page, Cumul renormalizes all ballots to have the same sum power instead of same max power.

These single-winner methods were also used as part of some of the methods above:
Eliminate-* - A meta-method that works by repeatedly eliminating the loser according to the original method, then electing whoever is left. Eliminate-Plurality is IRV, for instance.
Plurality - FPTP.
Cardinal-20 - Cardinal ratings / Range, with a range from -10 to 10.
Minmax(wv) - The minmax Condorcet method, WV.
Baseball MVP, Antiplurality, Nauru Borda, Borda, Eurovision, Heisman Trophy, NREM-Opt - different weighted positional methods. NREM-Opt is Warren's "optimal ranked ballot method" (within the constraints of the normal random elections model).
CDTT, Landau, Schwartz, Smith - selects randomly among the candidates in the set in question.
Copeland/x/y - Copeland's method: each win counts as x points, each tie as y points. The same as first order Copeland. Greatest score wins.
2. order Copeland/x/y - first do Copeland's, then each candidate gets a number of points equal to x times the number of points awarded to each candidate he beats, and y times the number of points awarded to each candidate he ties. As with Copeland, greatest score wins. If y is 1, x < 2 is nonmonotonic.
L-R defense - A Condorcet method where each candidate is scored equal according to the sum of victories by those who beat it. A lesser score is better. WV or margins, probably bad as PO.
L-R offense - A Condorcet method where each candidate is scored equal according to the sum of victories made against other candidates. A greater score is better. WV or margins, probably bad as PO.
Minmax - the Minmax method, where each candidate's opposition score is equal to the maximum victory of the opponents that defeated this candidate. Least opposition score wins.
Maxmin - a pairwise (but not Condorcet) method that scores candidates according to the weakest victory - strongest wins. DHwL does badly with this one.
Schulze - The Schulze method.
Reward worst, Worst Borda - deliberately bad weighted positional systems that give one point to the lowest ranked candidate or give the output as if Borda was applied to the election with all ballots reversed, respectively. Their only point is to produce deliberately bad outcomes, but when linked to R.Web, they also produce slightly more proportional results for some peculiar reason.