You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some feedback has come through that people sometimes do chores poorly. This isn't surprising, but creates an interesting challenge for a pass/fail voting system: evaluators don't want to feel like "the bad guy," and so give passing grades which they do not feel are deserved, or they abstain entirely. The result is a feeling that people are shirking responsibilities without consequence, and ultimately a loss of faith in the system.
There is related feedback that people are also under-incentivized to to a great job on chores, as there is no benefit.
One possible response would be to introduce a more fine-grained scale for feedback. Rather than a binary pass/fail, we could introduce a more nuanced scale, either semantic (emoji) or numeric (scale from 1-10). More fine-grained scales introduce their own risks of bias and error, and so this proposal will be for a relatively modest increase in complexity: a semantic 4-grade scale.
The choice of 4 grades instead of 3 reflects the fact that the phenomenon of interest in this case is not solely over-performance, but underperformance. A 3-grade scale of fail/pass/excellent would fail to capture this nuance, while a 4-grade scale of fail/mediocre/pass/excellent does.
The goal of this change is twofold: first, to give evaluators a meaningful way to express mild dissatisfaction with a chore, something in between unearned approval and unnecessary rejection, and second, to encourage residents to perform chores at a level which they believe will earn full points, knowing that evaluators will be more comfortable penalizing a mediocre performance.
Resident Experience
For residents, the difference will be a move away from the current 2-grade scale of:
👍🏽 👎🏽
to a 4-grade scale of
😦 🫤 😄 🥰
Whereas a result of 👍🏽 earns 100% of the chore's points and 👎🏽 earns 0%, the 4-grade scale allows for more nuance:
😦 : 0% of the points
🫤 : 50% of the points
😄 : 100% of the points
🥰 : 100% of the points + karma
When calculating the final result, the app will first add up the votes assigned to 😦 and 🫤 , and those assigned to 😄 and 🥰 . The highest-voted grade of the highest-voted pair becomes the final grade, with ties breaking towards the lesser option. This two-step process eliminates the risk of vote-splitting among either the higher or lower pairs of grades.
We can show this with a table. In the first example, a simple result of the top-voted grade would have resulted in 🫤 , instead of the more accurate 😄 .
😦
🫤
😄
🥰
result
0
3
2
2
😄
1
3
3
0
🫤
1
1
1
0
😦
0
2
1
2
🥰
Considerations
What behaviors will this change encourage? The goal is to introduce sufficient nuance to discourage mediocre chore performance. Will this do that, or will it result in participants receiving half-points for what they honestly felt was a fair effort? Are we simply be re-creating our current problems "with extra steps," or worse, introduce new ones?
There may be no single answer. Every voting system represents some tradeoff between expressiveness and legibility. The pass/fail vote is powerful because it is simple: it is perfectly clear what the options represent. But it lacks the expressiveness to capture the nuance and subtlety often present in real life. A 4-grade scale is more expressive, but requires more skill and experience to use.
Moving to a 4-grade scale will mean more in-person conversations where participants get on the same page as to what different options mean, and when to assign which grade. There will still be cases where people abstain to vote, or feel as though people continue to shirk responsibilities. But on balance, and hypotheticals aside, the expansion from a 2-grade to a 4-grade scale is a natural next step in response to actual shortcomings in the current design.
Implementation Details
The PollVote.vote field will need to be updated to store integer values, instead of the current boolean values. This will allow for the development of polls of multiple semantic and numeric types. The poll type will be stored in the metadata on creation, and each type will correspond to a particular aggregation logic. We can convert PollVote using this query:
ALTER TABLE "PollVote" ALTER COLUMN "vote" TYPE INTEGER USING (CASE WHEN is_active THEN 1 ELSE 0 END);
We would then need to add a set of functions defining a poll type. For instance, createPollSem4 and evalautePollSem4 for a semantic 4-grade poll. We could imagine creating other types, such as Num10 for a 10-point scale, Sem3 for a semantic 3-grade scale, or Num100 for a percentage input. Each type would have its own aggregation logic, which would be defined on a per-poll basis. One could also imagine a Num100Median for a poll which returns the median value, or a Num100Avg which returns the average value. Opportunities abound, but are beyond the scope of this issue.
In addition, we will need some way of representing "half points" for a chore. Currently, chore claims are either valid or invalid; there is no notion of partial validity. This is a fairly core design element: chore values are determined by summing incremental point updates from the point of the last claim. Some thought should be put into how to achieve this in a non-hacky way. The first thought would be to count a half-chore as valid, and add a "pseudo-ChoreValue` representing half of the points being returned to the pool. Behaviorally, this would not be meaningfully different than the current experience in which a failed chore claim results in an immediately larger points value; the key difference being that instead of a larger window of updates, we add an update representing points returning to the pool.
The text was updated successfully, but these errors were encountered:
Motivation
Some feedback has come through that people sometimes do chores poorly. This isn't surprising, but creates an interesting challenge for a pass/fail voting system: evaluators don't want to feel like "the bad guy," and so give passing grades which they do not feel are deserved, or they abstain entirely. The result is a feeling that people are shirking responsibilities without consequence, and ultimately a loss of faith in the system.
There is related feedback that people are also under-incentivized to to a great job on chores, as there is no benefit.
One possible response would be to introduce a more fine-grained scale for feedback. Rather than a binary pass/fail, we could introduce a more nuanced scale, either semantic (emoji) or numeric (scale from 1-10). More fine-grained scales introduce their own risks of bias and error, and so this proposal will be for a relatively modest increase in complexity: a semantic 4-grade scale.
The choice of 4 grades instead of 3 reflects the fact that the phenomenon of interest in this case is not solely over-performance, but underperformance. A 3-grade scale of fail/pass/excellent would fail to capture this nuance, while a 4-grade scale of fail/mediocre/pass/excellent does.
The goal of this change is twofold: first, to give evaluators a meaningful way to express mild dissatisfaction with a chore, something in between unearned approval and unnecessary rejection, and second, to encourage residents to perform chores at a level which they believe will earn full points, knowing that evaluators will be more comfortable penalizing a mediocre performance.
Resident Experience
For residents, the difference will be a move away from the current 2-grade scale of:
to a 4-grade scale of
Whereas a result of 👍🏽 earns 100% of the chore's points and 👎🏽 earns 0%, the 4-grade scale allows for more nuance:
When calculating the final result, the app will first add up the votes assigned to 😦 and 🫤 , and those assigned to 😄 and 🥰 . The highest-voted grade of the highest-voted pair becomes the final grade, with ties breaking towards the lesser option. This two-step process eliminates the risk of vote-splitting among either the higher or lower pairs of grades.
We can show this with a table. In the first example, a simple result of the top-voted grade would have resulted in 🫤 , instead of the more accurate 😄 .
Considerations
What behaviors will this change encourage? The goal is to introduce sufficient nuance to discourage mediocre chore performance. Will this do that, or will it result in participants receiving half-points for what they honestly felt was a fair effort? Are we simply be re-creating our current problems "with extra steps," or worse, introduce new ones?
There may be no single answer. Every voting system represents some tradeoff between expressiveness and legibility. The pass/fail vote is powerful because it is simple: it is perfectly clear what the options represent. But it lacks the expressiveness to capture the nuance and subtlety often present in real life. A 4-grade scale is more expressive, but requires more skill and experience to use.
Moving to a 4-grade scale will mean more in-person conversations where participants get on the same page as to what different options mean, and when to assign which grade. There will still be cases where people abstain to vote, or feel as though people continue to shirk responsibilities. But on balance, and hypotheticals aside, the expansion from a 2-grade to a 4-grade scale is a natural next step in response to actual shortcomings in the current design.
Implementation Details
The
PollVote.vote
field will need to be updated to store integer values, instead of the current boolean values. This will allow for the development of polls of multiple semantic and numeric types. The poll type will be stored in the metadata on creation, and each type will correspond to a particular aggregation logic. We can convertPollVote
using this query:We would then need to add a set of functions defining a poll type. For instance,
createPollSem4
andevalautePollSem4
for a semantic 4-grade poll. We could imagine creating other types, such asNum10
for a 10-point scale,Sem3
for a semantic 3-grade scale, orNum100
for a percentage input. Each type would have its own aggregation logic, which would be defined on a per-poll basis. One could also imagine aNum100Median
for a poll which returns the median value, or aNum100Avg
which returns the average value. Opportunities abound, but are beyond the scope of this issue.In addition, we will need some way of representing "half points" for a chore. Currently, chore claims are either valid or invalid; there is no notion of partial validity. This is a fairly core design element: chore values are determined by summing incremental point updates from the point of the last claim. Some thought should be put into how to achieve this in a non-hacky way. The first thought would be to count a half-chore as valid, and add a "pseudo-ChoreValue` representing half of the points being returned to the pool. Behaviorally, this would not be meaningfully different than the current experience in which a failed chore claim results in an immediately larger points value; the key difference being that instead of a larger window of updates, we add an update representing points returning to the pool.
The text was updated successfully, but these errors were encountered: