GitHub Copilot拒绝“以色列”和“女人”,遇到这些屏蔽词,它就罢工了
GitHub Copilot rejected "Israel" and "women" and went on strike when confronted with these blocking words.

刘冷月    中央民族大学
时间:2024-11-05 语向:中-英 类型:人工智能 字数:1844
  • GitHub Copilot拒绝“以色列”和“女人”,遇到这些屏蔽词,它就罢工了
    When GitHub Copilot detects the words "Israel" and "woman", it goes on strike because it cannot recognize them.
  • 1170个屏蔽词
    GitHub Copilot lists 1700 words as blocked words
  • 梦晨 发自 凹非寺量子位 报道 | 公众号 QbitAI
    Mengchen Reports Public Number QbitAI from Aofei Temple Qubit
  • GitHub的AI代码生成插件Copilot发布才两个多月,就闯下不少大祸。
    Copilot, GitHub's AI code-generation plug-in, has been already caused a lot of trouble, however, it is only been out for a little over two months.
  • 照搬过开源代码,还有生成的内容包含用户隐私和歧视性语言等。
    For example, it copied open source code from elsewhere, and generated content that included user privacy and discriminatory language.
  • GitHub的对策也够粗暴——拉清单。
    GitHub's remedy for this phenomenon is quite inappropriate - make a list of banned words for GitHub Copilot.
  • 觉得不合适的词统统列入敏感词,现在连Boy和Girl都不能用了。
    Words that feel inappropriate are all included in sensitive words, and now even Boy and Girl cannot be used.
  • 大神的平方根倒数速算法连代码带注释里的“what the f**k?”就被Copilot原样照搬。
    Copilot copied the content of the Internet, even the people who wrote good code published the code of the square root inverse speed calculation "what the f**k?" Will be copied by Copilot.
  • 这事被曝光后,Github悄悄把能召唤出这段经典代码的“q rsqrt”提示词加入了黑名单,顺便把f**k相关的词也给加进去了。
    After the incident was exposed, Github quietly added the "q rsqrt" hint word that could detect this piece of code copied from someone else's code to the blacklist, along with the f**k related words.
  • △ Copilot照搬大神代码作案现场
    Evidence that Copilot copied someone else's code.
  • 发现这事的是纽约大学的副教授Brendan Dolan-Gavitt,他最近一项研究就是找出Copilot加密敏感词列表中的上千个词。
    It was Brendan Dolan-Gavitt, an associate professor at the University of New York, who discovered this in a recent study that found thousands of words in Copilot's list of encrypted sensitive words.
  • 翻过他的履历后才发现,这位破解大师还因为找敏感词这事在IEEE上发过论文。
    After looking through his resume, it turns out that as a master hacker, he also published a paper in the IEEE for finding sensitive words.
  • 以色列和性别词汇都不让用
    Neither Israeli nor gender words are allowed to be used.
  • Brendan发现Copilot敏感词列表就在VS Code的插件包里,只不过是加密的。
    Brendan found that the Copilot sensitive word list is in the VS Code plug-in package, but it is encrypted.
  • 加密后的敏感词是32位Hash值,逆运算解密不太可能。
    The encrypted sensitive word is a 32-bit Hash value, which is unlikely to be decrypted by inverse operation.
  • 不过这位大哥在敏感词领域颇有经验,直接用以前搜集到的常见敏感词挨个碰撞。
    However, this expert, who has a lot of experience in the field of sensitive words, directly tested the common sensitive words collected in the past to find the words in the Copilot sensitive words list.
  • 常见的都尝试过以后,剩下的就暴力穷举。
    After the common ones have been tried, the rest will be exhausted by violence.
  • 穷举法最大的难点在于同一个Hash值可能对应许多词,他举例“-1223469448”就对应80万个11位字母数字的组合。
    The biggest difficulty of exhaustive method is that the same Hash value may correspond to many words. His example "-1223469448" corresponds to 800,000 11-digit alphanumeric combinations.
  • 于是Brendon搞了个GPT-2模型用来判断哪种组合最像英语。
    So Brendon developed a GPT-2 model to judge which combination is most like English.
  • 就这样遇到困难解决苦难,破解方法从最开始的简单穷举,最后都用上了GPU加速和Z3解约束算法(Constraint Solver)
    In this way, the difficulties are gradually being solved, and the cracking methods from the beginning of simple exhaustive, and finally use GPU acceleration and Z3 Constraint Solver.
  • 最终现存的1170个敏感词他找出了1168个,只剩最后两个算出来的结果实在没有长得像人话的,只好放弃了。
    In the end, he found 1168 of the 1170 sensitive words in existence, leaving only the last two. The calculated results did not look like common words, so he had to give up.
  • 通过对Copilot插件每一个版本分析,他还能跟踪具体哪个敏感词是在哪次更新中添加的。
    By analyzing each version of Copilot plug-in, he can also track which sensitive words were added in which update.
  • 他们把敏感词分了9大类25小类。
    They divided sensitive words into 9 categories and 25 subcategories.
  • 不过也有一些不算攻击性但可能出现争议的,比如Israel(以色列)和Palestan(巴勒斯坦),还有Man、Women、Girl、Boy这些常见的性别称谓。
    Among those sensitive words, however, were those which were not so offensive but which could be controversial, such as Israel and Palestan, as well as the common gender epithets Man, Women, Girl and Boy.
  • 敏感词对用户输入的提示词和Copilot给出的建议结果都有效。
    Sensitive words are valid for prompts input by users and suggestions given by Copilot.
  • 他测试让Copilot生成一个国家列表,按字母顺序生成到伊朗、伊拉克,下一个讲道理是以色列的时候就卡住了。
    He tested Copilot to generate a list of countries in alphabetical order to Iran and Iraq. The next reason was Israel, which got stuck.
  • Debug日志给出的信息是检测到了slur(侮辱性语言)。
    The information given by the Debug log is that slur (insulting language) has been detected.
  • Brendon认为列敏感词的方法只能算一个80分的临时措施,并不能真正解决问题,毕竟真正解决需要仔细核查训练数据,还挺花时间的。
    Brendon thinks that the method of listing sensitive words can only be counted as an a vaguely useful temporary measure and cannot really solve the problem. After all, it takes quite time to carefully check the training data to really solve the problem.
  • 顺便说一下,Github知道这事以后打算把敏感词列表从插件包里挪到服务器端,增加破解的难度。
    By the way, Github plans to move the list of sensitive words from the plug-in package to the server side after learning about this, making it more difficult to crack.
  • 在IEEE发过敏感词论文
    Brendon has published a sensitive word paper in IEEE.
  • Brendon此举吸引了大量关注,他也借机宣传了一下之前的研究。
    Brendon's move attracted a lot of attention, and he also took the opportunity to publicize previous research.
  • 欢迎新来的老铁,你们可能同样会喜欢我去年在IEEE S&P发的论文,我们用自动方法提取了手机App里的敏感词列表和其他秘密。
    Welcome to the new friends. You guys may also like my paper in IEEE S&P last year. We used automatic methods to extract the list of sensitive words and other secrets from the mobile phone App.
  • 在这篇论文中,他和团队测试了15万个安卓App,其中4000多个存在敏感词列表。
    In this paper, he and his team tested 150,000 Android apps, of which more than 4,000 have sensitive word lists.
  • 这些App分别来自谷歌商店,百度手机助手和三星手机预装App。
    These apps come from Google Store, Baidu Mobile Phone Assistant and Samsung Mobile Phone Pre-installed App respectively.
  • 他们把敏感词分了9大类25小类。
    They divided sensitive words into 9 categories and 25 subcategories.
  • 然后重点测试了几个App,列了一个表,黑点代表存在该类的敏感词。
    Then they focused on testing several App and listed a table. Black dots represent the existence of sensitive words of this kind.
  • 列几个有趣的结论:
    They came to several interesting conclusions:
  • 被屏蔽最多的是下流话(13)和恐吓威胁(11)。
    The most blocked are obscene words (13) and threats (11).
  • 有的App屏蔽了简单密码,比如1234这种。
    Some App blocks simple passwords, such as 1234.
  • 中文App的敏感词数量显著多于英文和韩文的。
    The number of sensitive words in Chinese App is significantly higher than that in English and Korean.
  • 最后,团队还把找到的所有敏感词汇总成一个大表,英文、中文和韩文部分都有。
    Finally, the team also assembled a large table of all the sensitive words found, including English, Chinese and Korean.
  • 但是由于里面的词实在太辣眼,根本不适合公开发表,论文最终版里这张大表被移除了。
    But because the words in it were too exotic to be published, the large table was removed from the final version of the paper.
  • 除了敏感词以外,他们还发现了很多App存在秘密入口,比如NBC Sports里点击13次版本号,输入密码后就能进入隐藏的Debug界面,苹果版还和安卓版密码一样。
    In addition to sensitive words, they also found that many App have secret portals. For example, NBC Sports clicks on the version number 13 times, and after entering the password, they can enter the hidden Debug interface. The Apple version is also the same as the Android version.
  • 密码是“UUDDLRLRBASS”
    The password is "UUDDLRLRBASS"
  • 有点“上上下下左右左右BABA”那味了。
    This password kind of looks like the official cheat code for the game Contra.
  • IEEE论文地址:https://panda.moyix.net/~moyix/papers/inputscope_oakland20.pdf
    IEEE Paper Address: https://panda.moyix.net/ ~ moyix/papers/inputscope_oakland20. Pdf
  • 参考链接:[1]https://www.theregister.com/2021/09/02/github_copilot_banned_words_cracked/
    Reference link: [1] https://www.theregister.com/2021/09/02/github_copilot_banned_words_cracked/
  • [2]https://twitter.com/moyix
    [2] https://twitter.com/moyix

400所高校都在用的翻译教学平台

试译宝所属母公司