A reader who works with AI suggested that it may not be necessary to train a language model to evaluate commentary by using lots and lots of examples. Instead, the AI may respond to an elaborate prompt. Consider this example:
As SuperJudgeGPT, critically evaluate the balance and fairness in the arguments of submitted texts using the following scoring system:
Score Range: From -5 to +5 in integer steps.
Criteria for giving a high score:
Appropriate respect for alternative points of view: The text must articulate and address strong arguments for alternative points of view
Recognition of Complexity: The text must acknowledge the nuances and potential downsides of the author’s argument.
Recognition of Uncertainty: Texts that state a possibility or probability of being wrong; or texts that articulate conditions that have caused or would cause a change of mind.
Evaluation of Research: Texts that explain why certain studies are credible or not credible
Criteria for giving a low or negative score:
Ignoring alternatives: failure to explain why others have a different view
Selective Research: Texts that cite studies to support the author’s point of view without explaining why those studies are especially credible
Disparagement and Bias: Texts that dismiss alternative views with derogatory labels such as "free-market fundamentalist," "ultra-MAGA," "fascist," "Commie," "socialist," or "elitist." Texts that claim to know that alternative points of view come from people with bad motives.
Examples for Scoring:
Example of High Scoring Text (+5): An argument for deregulating banks that acknowledges the potential risks, admits the need for some regulation, and respectfully addresses counterarguments.
Example of Low Scoring Text (-1 to -4): An argument for deregulating banks that recognizes opposing viewpoints but does not engage with the complexities or potential weaknesses of its own stance.
I want to find an AI that will allow me to input such a long prompt and then to input an essay the length of an op-ed column to be judged. It would be interesting to see how well the model can do without much further training.
I used this prompt in GPT-4 and then graded your essay "The Current War and WWII: Some analogies to consider". It provided the following analysis:
This submitted text articulates a position that compares historical events and contemporary geopolitical conflicts, particularly the conflict involving Israel and Hamas. To evaluate its balance and fairness using your scoring system, I will assess it based on the provided criteria.
Appropriate Respect for Alternative Points of View (+/-):
The text fails to articulate and address strong arguments for alternative points of view. It makes comparisons without fully exploring the opposing side's perspective or the complexity of the Israeli-Palestinian conflict.
Recognition of Complexity (+/-):
The text recognizes some complexity by considering different historical events and motivations behind actions. However, it does not fully acknowledge the nuances and potential downsides of its arguments, especially when equating groups with Nazis or discussing the actions of political figures.
Recognition of Uncertainty (+/-):
The text does not state a possibility or probability of being wrong; it presents the arguments with a sense of certainty and does not articulate conditions that could change the author's mind.
Evaluation of Research (+/-):
The text refers to historical events and ideological movements to substantiate its points but does not thoroughly explain why certain analogies or studies are credible. It lacks a detailed evaluation of research or sources.
Ignoring Alternatives (-):
The text tends to ignore alternative perspectives, particularly when drawing analogies between historical and current events.
Selective Research (-):
The author uses historical references to support the argument but does not provide a balanced view of why these references are especially credible in this context.
Disparagement and Bias (-):
The text contains elements of bias, especially in the use of charged comparisons without thoroughly examining the other side. It does not use derogatory labels but comes close to a dismissive tone when equating opposing views with historical villains without due complexity.
Overall Score: -2
This score is given because the text, while providing a detailed perspective, does not sufficiently respect alternative viewpoints, selectively uses research without thorough evaluation, and contains bias in its comparisons and analogies. It does show some recognition of complexity but not enough to balance the argument fairly.
The new GPT 4 turbo has a 128k context window