Estimating contribution value
From Miiu.org
Potentially superceding perspective on matter which would lead to solution based on anthropology and economics of tribe-like groups: Aggregation, not algorithms, key to building trust
Problems
- Multiple parties contribute content to the page for a single thing or place, and pages are the units for which ads are sold. The profit from selling ads must be divided among all the parties that contributed to the page in some way. See PageShare for a complete statement of this position.
- Contributions must be vetted for quality. Lower-quality contributions should give less of a share of profit, really low quality or spammy ones should be excluded.
- Information from a variety of sources should be integrated to judge quality: review from superusers of the site, review from people who earn an editorial role through quality contributions, agreement with other content or reviews, user's success rate on benchmark tasks, etc.…
- Separate notions of importance and quality will probably be needed to drive division of revenue, and maybe other kinds of ratings.
Algorithm: Classifier-based approach
- Ratings are classes into which we seek to place content
- The editor role provides the supervision for the classification algorithm
- Non-editor ratings provide features for classification based on the standing and attributes of the person providing the rating
- Content itself also provides features for classification apart from any rating
- This permits generalizing from explicitly rated content to other content
- Topology of an arbitrary map from ratings to a belief manifold is not a concern
- Continuous ratings are synthesized by discretizing the continuous space somewhat similar to what is done in the belief manifold below, but the relationship need not be fundamentally arbitrary
- Various well-established classifiers can be used interchangeably or together: naive Bayes, SVM, …
Algorithm: Bayesian estimation of correct ratings / reviews of various types
- A rating type is a set of possible ratings
- Define for each type of rating
- a bijection between possible ratings and the unit interval for convenience (the topology of the rating ↔ unit interval will determine the way uncertainty spreads beliefs though which must be paid attention, for example 1-5 ratings should go to consecutive partitions; other target spaces are possible but would increase complexity unless a nice library is available)
- an appropriate prior distribution (uniform to be non-committal with respect to the rating ↔ interval map, biased if needed)
- Define for each type of rating
- A user can rate content and has a reliability score
- Reliability score is also a random variable on the unit interval like the target space of ratings for convenience, but is interpreted directly
- Reliability score could possibly use a simpler agreement-based update rule or other more heuristic approaches
- Reliability score decreases the entropy of the evidence their ratings provide
- Administrative users have reliability set so high that the evidence they provide is effectively decisive
- Users can also be members of groups where group aggregate reliability establishes a biased prior for user reliabilities in hierarchical Bayesian fashion
- Updates
- When users rate content, the rating is updated using Bayes' rule, and the user's reliability score is updated
- Perhaps a user's reliability score itself is susceptible to ratings from others with certain roles, to permit meta-ratings to fit directly into this system
- Revenue
- When a sale is made on a page, the set of contributors to the page is paid
- Payment is partitioned according to ratings of content on page and owner of content (Maybe importance × quality roughly)
- Perhaps payment is also partitioned according to group membership of owners of content