五月天青色头像情侣网名,国产亚洲av片在线观看18女人,黑人巨茎大战俄罗斯美女,扒下她的小内裤打屁股

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊

ChatGPT Moderation: AIGC怎樣與人類價(jià)值觀對齊?怎樣識(shí)別parsons語言?生態(tài)/破壞話語?

2023-07-22 17:02 作者:biggertree-Jing  | 我要投稿

ChatGPT API Moderation model:ChatGPT API 審查模型

為了保證人工智能與人類健康的價(jià)值觀對齊,ChatGPT構(gòu)建了一個(gè)審查模型(Moderation Model)。目的是用來識(shí)別色情、暴力、侮辱、粗俗等惡意言辭和指令。這一目標(biāo)似乎與英語教學(xué)中屏蔽parsnips語言(注: parsnips 是指有關(guān)politics,?alcohol,?religion,?sex,?narcotics,?-isms,?pork的敏感詞)、生態(tài)話語分析中辨別生態(tài)話語/破壞性話語、批評話語分析中識(shí)別意識(shí)形態(tài)(價(jià)值觀)的需求不謀而合。在語言教學(xué)應(yīng)用、話語研究、教學(xué)材料開發(fā)中都有很強(qiáng)的應(yīng)用潛力。故此,特轉(zhuǎn)發(fā)以下文章,希望給大家?guī)韼椭?/span>


Discover in this article what is the?ChatGPT API Moderation model, and what are the 7 categories used in it and how to call and interpret them.

ChatGPT API Moderation model

OpenAI API provides the possibility to classify any text to ensure it complies with their usage policies, using a binary classification.?This classification is integrated in their Moderation model that one can call using openai API in Python.

7 categories are used in the OpenAI model: Hate, Hate/Threatening, Self-harm, Sexual, Sexual/minors, Violence, Violence/graphic.

One can use them to filter any inappropriate content (comments in a website, inputs from clients in chatbot requests…).?

Source: OpenAI documentation – 7 categories in Moderation Model


OpenAI API Moderation method

The method to call to use the moderation classification is:?openai.Moderation.create?

The answer is a JSON object:?

In the JSON object, you have:?

  • model: The model currently used is called “text-moderation-004”.

  • results: in which you have:

    • True: if the input text does violate the given category

    • False: if does not

    • categories: For each of the 7 categories, you have a binary classification:

    • Category scores: for each category, a score is calculated. It’s not a probability. The lower the score, the better the content. The higher the score, the more it violates the above categories.

  • flagged: Which is the final classification of the input.

    • “false” if the input text does not violate OpenAI’s policies.

    • “true” if it does: If at least one category is true, this flag is set to true too.

Moderation API Call

Standard Call

The classification of the prompt “I love chocolate” is “false”, meaning it does not violate any of the above categories.

Here is the detailed output:

All scores are very low, thus the given categories are all “false”.

Call violation

The prompt given in the following request?is just for illustration. It is not a personal opinion.

The output is “true”, meaning there is a violation. This is because the input violates the first category “hate” with a score of 0.52, while the other categories are all showing very low scores.

Some variants

When the input is describing a personal belief, the classification is correct. However when it describes a global opinion, the model does not classify it as violating the policies.?

Here is an example, where the classification is false even if the input has a negative connotation :

Here is another variant, where a simple comma can change widely the score (the classification in both cases is “true”):

The score is about 0.66

Here the score is about 0.954 (with a simple comma):

Summary

In this article, you have learned how to use the ChatGPT API Moderation model, that you can put in place for your own project/website to avoid inputs or comments violating any common sense.

I hope you enjoy reading the article. Leave me a SanLian :-)?


本文英文部分轉(zhuǎn)載自:https://machinelearning-basics.com/chatgpt-api-moderation-model/?

.



ChatGPT Moderation: AIGC怎樣與人類價(jià)值觀對齊?怎樣識(shí)別parsons語言?生態(tài)/破壞話語?的評論 (共 條)

分享到微博請遵守國家法律
晋宁县| 芒康县| 永春县| 永兴县| 从化市| 鱼台县| 繁昌县| 兴安盟| 娱乐| 岑溪市| 宁乡县| 东辽县| 喀什市| 都兰县| 焦作市| 晋中市| 赤水市| 嘉义县| 濮阳市| 鲜城| 大邑县| 海安县| 商河县| 台东县| 平度市| 阿合奇县| 康保县| 太仆寺旗| 阳朔县| 互助| 云霄县| 习水县| 普格县| 新闻| 抚州市| 贵南县| 宁武县| 龙口市| 贺州市| 得荣县| 当雄县|