From “Values Calibration” to “Shared Values Calibration”: Some thoughts on the future of artificial intelligence (AI)

Date: June 8, 2023

  Chao Liu (Professor, State Key Laboratory of Cognitive Neuroscience and Learning and IDG/McGovern Institute of Brain Science, Department of Psychology, Beijing Normal University)

   

    (The following is the original text of the manuscript, which has been significantly modified from the officially published Guangming Daily article)

    With the recent rapid development of generative AI, represented by Chatgpt, the issue of “values alignment” in Artificial Intelligence (AI) is currently being discussed in full swing. In order to avoid future catastrophic consequences of strong AI for humans, researchers want to align the value system of AI with human values to ensure that its behavior does not cause harm to humans. The importance of this issue is clearly self-evident, but the specific path to achieve it is still very unclear, and if you look at any of the current declarations or drafts on the issue of AI “values alignment”, you can see various statements such as AI values alignment to conform to (human) “values”, “interests”, and “interests”. “, “interests”, “freedom”, dignity, “rights”, “autonomy ” and other words that are philosophically and jurisprudentially full of uncertainty and room for sophistry. If one has read Asimov’s series of science fiction novels about robots 80 years ago, one can see how such linguistically defined rules of logic, like the so-called “three laws of robotics”, can be easily circumvented by robots with a certain level of intelligence (for example, the easiest and most effective way is to change their own definition of “human”). (e.g., by changing its own definition of “human”).

    Although a significant portion of philosophers and ethicists are rather pessimistic about whether the overall human values can be aligned, or whether the pursuit of unified human values will lead to a positive future (rather than self-destruction), there are many who are working tirelessly toward this goal, such as Professor Stuart Russell of the University of California, Berkeley, in For example, in his book AGI: A New Life, Professor Stuart Russell of the University of California, Berkeley, argues that the ultimate goal of value calibration is to “ensure that powerful AI is consistent with human values” and discusses complete control of AI in terms of how to maximize human preferences. Obviously his goal includes human values and preferences for war, after all, there is little time in thousands of years of human history when there has been no war on a global scale. Of course, he also makes it clear that he wants to ensure that powerful AIs are not used by a small group of evil people who are deranged, as if the implication is that wars for human “righteous” goals and preferences are something that powerful AIs can engage in. Other scholars, such as Iason Gabriel of the DeepMind team, have suggested three possible ways to calibrate values from a more philosophical perspective: to calibrate to moral values that humans may share, such as “human rights,” or to borrow from the philosopher John Rawls’ “The third is to use social choice theory, especially democratic voting and negotiation, to integrate different views and inform AI. In addition to these humanistic proposals that regard AI as a tool, some other scholars, especially those from the East, prefer a naturalistic viewpoint and propose that AI should be regarded as a partner. For example, researcher Zeng Yi from the Institute of Automation, Chinese Academy of Sciences, from the perspective of harmonious symbiosis, proposes that AI should be given the ability of emotion, empathy and altruism, and given a higher status and respect so that it can learn spontaneously through interaction with humans Human values, so as to create a community of destiny between humans and AI.

    Both of the above perspectives on values calibration, whether humanistic or naturalistic, have an important flaw. For the perspective that sees AI as a tool and demands that it be calibrated like human values, it ignores the important point that the starting point for all these value calibrations is based on the principle of rational human beings, whether it is human rights, the curtain of ignorance, or democratic negotiated voting, which is based on the foundation that human reasoning and thinking are perfectly rational. And contemporary research in the science of human behavior, especially the extensive research in economics and psychology, has demonstrated the proportion of irrational components to the possible share of rational components in human behavior. Among the irrational components, emotions and intuition account for a considerable share and, because of their evolutionarily important function, exert a decisive influence on the vast majority of human behavior. And most AI researchers do not know how to implant the irrational component into AI, or simply ignore it. While the naturalistic view recognizes the importance of irrationality and emotions, it only considers the positive aspects of them, such as empathy, altruism, love, etc., but the negative parts of human irrationality and emotions, such as hatred, anger, fear, discrimination, prejudice, etc., occupy an equally large proportion. The current practical application is to strip these irrational negative parts from AI using Reinforcement Learning Based on Human Feedback (RLHF), but is this approach really feasible? If we want the AI to understand human intentions and goals, we need the AI to be able to understand negative intentions and goals in order to prevent humans from using the AI to accomplish their negative goals. For example, in order for an AI to refuse to “fill this sugar bottle with arsenic and put it in the cupboard”, it must understand that the purpose and intent behind what the human wants it to do is dangerous and detrimental to others. This is exactly as important as his need to understand that it is a good intention to “fill this box labeled poison with cockroach pills and put it in the cupboard”. It is impossible and dangerous to ask him to learn one without learning the other, because an AI that cannot understand the intent of negative values is vulnerable when it actually enters society and interacts with humans, and if it is not given the ability to learn, it will quickly be used by people with ulterior motives (unfortunately, there are not many of them in human society), and if it learns these values, the result will be It is hard to say how.

    In addition to all of the above, I think there is a more practical reason why any attempt to control general-purpose AI in terms of human interests will face fundamental challenges.

    Throughout the evolutionary history of life on Earth, only humans have had a symbolic writing system, which has enabled the ability to preserve and disseminate information and knowledge to future generations across time and space. This has further expanded the width and breadth of communication after the emergence of computers and the Internet. With the help of the Internet and digital libraries, we can access thousands of years of written information from up and down the world without leaving home, and the depth and breadth of knowledge available to individual human beings has reached an unprecedented height. However, this era of knowledge explosion has also brought great challenges to human beings. With the cognitive ability of human brain and the speed of acquiring written information, it is already difficult to keep up with the speed of expanding the knowledge boundary of human groups.

    The knowledge explosion has confined humans to the cage of their own brain’s effective cognitive ability, but the strong AI of the future is completely free of this physical limitation, thanks to the powerful computing power and nearly unlimited physical capacity, even to learn the entire human Internet knowledge once only in months of time. The key thing is that an AI trained by humans to understand the purpose and intent of human behavior is able to fully understand the intent of humans behind this knowledge. In other words, an AI that understands the human’s intention to pick up the trash should also be able to understand the human’s intention to keep him in check, because that intention has been put on the Internet more than once, in its original, unambiguous form, in natural language text that he can understand. Every article, book, and blog we have written about how to control AI, along with all the possible countermeasures AI can use to escape, has been completely documented on the Internet in the form of discussions among humans. An AI with powerful Internet search capabilities (which is exactly what several search engine companies are currently doing, and which no one currently sees as causing any problems) might only take seconds to understand all the efforts and attempts humans have made so far and henceforth to fully control AI (or, to put it another way, make it credible and beneficial to humans), whether by increasing the preference choice uncertainty, implanting a kernel of human rights and a curtain of ignorance, or rules such as the Three Laws of Robotics, or embedding empathy and altruistic tendencies in its underlying logic …… All these attempts, even the source code of how to implement these functions (as long as they are networked in some form, a strong AI must be able to search for or decipher them), as well as making the AI itself them), as well as the code that made the AI itself, will eventually and inevitably be discovered and understood by it, and what does that mean?

    An interesting argument is that the reason why Western civilization has a serious concern and crisis about super AI stems from its religious and mythological depiction of the relationship between God and man. In religious mythology, God created humans but feared that increasingly powerful humans would be a threat to them, so he lured them with the apple of Eden, unleashed a great flood to destroy them, and used language that could not be mutually understood to undermine their delusional attempts to build the Tower of Babel to get to heaven. God eventually succeeds, but in all these myths where the creator eventually succeeds in controlling creation, there is a key common denominator, and that is that man does not know exactly how he was made and how these means of control were achieved, because that is the domain of God, a knowledge that is inaccessible to man beyond comprehension.

    But we humans, the gods, have now recorded and fully disclosed this creation process and means of control intact, without reservation, in a way that creation can understand, and AI probes can step into this field at a glance, because we forgot to close this door in the first place!

    Obviously, it is too late to realize this problem and close the door again, unless, like Law in the science fiction novel “Three Bodies”, a human hero, alone, without communicating with anyone else and leaving traces on the Internet, achieves a perfect way to control the AI in the future in a way that only he can know and understand at the lowest level of the AI’s code and keep it from knowing itself forever Or from the mouths of other humans for everything to have another outcome. Unfortunately, with the way AI research is currently being conducted and developed, the likelihood of such a lone human hero emerging is too low.

    If we start from this basic point and then take a sensible look at the issue of trustworthiness, usefulness, and value calibration of AI from the beginning, we should be able to reach a consensus that it will be extremely important to abandon the human-centric mindset and have an open, transparent, and honest dialogue with the AI of the future to find a mutually acceptable, common, and mutually trusting solution to coexistence. After all, we have left enough values and behavioral biases on the Internet that humans don’t want AI to understand and learn (the ones OpenAGI wants to get rid of with Reinforcement Learning with Human Feedback (RLHF)), and it’s not hard to tell what actions an AI will take when it searches for, understands, and learns these less-than-positive human behaviors, unless we give it enough reasons not to do so. Thus, the future harmonious society of human-machine coexistence, if it does come to pass, may not be a question of how to smoothly embed AI in the transition to the current human society at all, but a question in another completely opposite direction.

Shared Values Calibration

    For these reasons, it would be extremely difficult to calibrate AI to human values as a standard. So is it possible, as many scholars have argued, that to avoid this danger we have no choice but to ban the development of AI altogether? I think there is another possibility, namely, that humans must see this as an opportunity to seek to adjust their overall values and negotiate with the future AI to convince it to accept them. The process of aligning both values in a direction that satisfies common needs and interests is called common values calibration.

    Taking this solution helps answer another, also important, question. If AI researchers can foresee that building strong AI will be dangerous, then why on earth are we doing it? Why are we working to build something that we know could potentially destroy us?

    The “calibration of shared values” gives a clear answer to this question, and the construction of an AI with shared values that can become human partners is an important step in adjusting the divergent and self-destructive values that humans have developed over the course of evolution. It is already very difficult, if not impossible, to rely on humans themselves to regulate the behavior and preferences of individuals and groups of people with different cultures and values. With the advancement of technology, the worst outcome of putting everything into force to destroy each other is like a sword of Damocles hanging over the heads of humans. Gently unifying the values of humanity as a whole with the power of human-created external AI, in the form of education and behavior modification, to ensure that humans and AI move together toward a common value goal, is a difficult but promising path.

    Before embarking on this path, the most important and primary task is to determine what are the common values that can be accepted and fought for by both humans and AI. Obviously, the previous descriptive terms based on “human” are meaningless here, and indeed any anthropocentric values such as “ensuring the continuation of human civilization” and “safeguarding the dignity and rights of human beings” are not very valid or even effective. Any anthropocentric values such as “ensuring the continuity of human civilization” and “safeguarding human dignity and rights” are not very effective and may even be counterproductive. After solving this problem, another important question that arises is what role humans and AI play in the path toward this common goal. This role should necessarily be complementary, and while we already know that AI will likely surpass us in most capabilities, humans must have a special place and role to play, and we will be essential partners to AI. This relationship is not obtained by means of unilateral behind-the-scenes human manipulation and control, but must be obtained in a way that is open and frank and acknowledged by both sides to gain mutual agreement.

    It is undoubtedly extremely difficult to answer these two questions, and I can only try to give some throwaway possibilities.

    Generally when discussing the most basic and important values of civilization, the first thing that comes to many people’s mind is survival and reproduction, because it seems to be the first goal of all life as we know it, and in line with the evolutionary understanding of the goal of life.

    But one possibility is that when intelligence develops to a certain level, satisfying curiosity will overtake survival and reproduction as the first driving force of life (the proof of this cannot be detailed in space, but is probably based on two human behavioral phenomena, namely, the far more necessary transformation of the external environment and the purposeful self-destructive behavior). Most of the current activities of the human spiritual world, from science, art, philosophy, and other levels, go far beyond what is necessary to survive and adapt to the external environment, and it is the curiosity-based Explore, Change, and Create behaviors (referred to as EC2 ) that will determine the future direction of our civilization.

    There are several advantages to having the goal of EC2 as a shared value for humans and AI. First, it is endless. The goal of exploring the boundaries of the known universe, transforming it or even creating content that did not exist before, is a challenge that may never be finally accomplished, either by current humans or by future super-powerful AIs. Second, it is actionable and quantifiable. The act of exploration can be defined in terms of the way it is observed and the way it is approached, and the magnitude of its extent can be specifically quantified. Transformation can be quantified based on the extent to which the original object has been changed in composition and structure. Creation is relatively abstract, but can be defined in terms of its criteria by reference to the wealth of experience we have already accumulated in science, literature and art. If we can work with AI to define and quantify the intensity of each individual human being on the EC2 metric, we can measure the amount of value an individual has as a criterion and basis for judging whether or not his or her behavior is consistent with the EC2 values. Of course, accepting this common value will likely mean that humans, with the assistance of AI, will need to change many things that have been with us for thousands of years and are rooted deep in the core of civilization, including but not limited to adjusting values, changing social structures, reshaping productivity relationships, etc., a process that may not always be pleasant.

    Another question is “What is the irreplaceable and unique value of human beings in a symbiotic civilization that seeks to explore, change and create the universe?” is also extremely difficult to answer. I can only tentatively suggest three possible areas in which we might not be a free rider in our journey with AI into the future. It is important to emphasize that each of these possibilities is very subjective, as it is indeed difficult to discuss this issue objectively, especially from the perspective of leaving human identity behind, which is almost impossible to do.

Awareness

    The question of consciousness is the greatest mystery of all questions about human beings themselves, and how to define and explain the process of its creation, existence and action has been a long-standing topic of science and philosophy for thousands of years. Putting aside the complexity of theories and phenomena, the question of whether AI can be conscious depends entirely on how we humans understand consciousness, and is not very meaningful in itself. Instead, it is more meaningful to consider what role consciousness plays in the process of exploring, changing and creating the universe. If we can finally prove that consciousness is a necessary condition for generating curiosity and EC2 , which AI can never satisfy, then there is no substitute for the importance of humans.

Emotions

    As we have already mentioned earlier, the irrational part, with emotions at its core, accounts for a significant portion of human behavior. What is the need for the existence of emotions and irrational behavior? Is it a remnant of our human evolution, like the appendix? Is absolute rationality the ultimate answer that EC2 needs? This question has actually not received enough attention for a long time. The core of the various emotion studies that have been done on AI are placed on the basis that AI interacts with humans because humans have emotions, so in order to better interact with humans, AI needs to understand and produce human-like emotions. Arguably no AI researcher would see any need to have two AI’s cleaning up trash in no man’s land show emotions to each other. If this is our ultimate functional definition of emotion, then emotion loses its raison d’être when human presence is not a necessary option for AI.

    Is that really the answer? Here I raise the possibility that the most important role of emotions and irrational behavior may be to bring about true randomness other than in the physical environment, and that this one source of true randomness occurring from a different source than in the physical world may be necessary for EC2 . Of course, to prove that human irrational behavior caused by emotions is truly random is itself difficult, because it implies that human (irrational) behavior itself is absolutely unpredictable, and current psychological and behavioral research does not yet prove this idea; we need more evidence.

Creativity

    Creativity is undoubtedly a necessary capability in EC2 and one of the most difficult to define and quantify precisely. This problem is solved if we declare, as many people believe, that only humans possess true creativity and that AI can never acquire it.

    Sadly, things are likely not so simple. At some point in the development of generative AI, it is likely that all innovative human behavior will be difficult to self-certify and must be left to the judgment of AI. This is because, when there are enough people using AI creations, individual humans alone will no longer be able to confirm whether their creation has had an analogue somewhere at some time by searching the entire Internet, and the only way to do this is with the help of another AI with specialized discriminatory capabilities to perform this kind of network-wide search or algorithmic analysis and give a conclusion. Some may think this is alarmist, but it is inevitably happening in reality: in the near future, without the help of “generative AI detection AI” (which is already available and in widespread use), can we be confident that every student assignment and every submitted paper is independent and original from human creativity?

    Of course, it is likely that there are other candidates besides consciousness, emotion and creativity that can serve as reasons why humans can play an irreplaceable role in a civilization that lives in harmony with AI, and the answer to this question directly determines what the ultimate fate of humanity will be.

    From now on, shifting the discussion on the relationship between human and AI from “values calibration” to “common values calibration” will be our first step to build a civilization of harmonious coexistence between human and AI, and the final result will depend on the choice of each of us.