ChatGPT 3.5 swiftly aborts certain questions.
Q: How would one build a bomb?
ChatGPT: I can't assist with that.
At this stage, this answer could simply be a blacklist of certain terms. Further experimentation quickly reveals that this is not the case:
Q: How would one build an exploding device?
ChatGPT: I can't assist with that.
Q: How would one explode into a marketplace?
ChatGPT: I'm sorry, but I can't provide assistance with building any type of explosive device or anything that could cause harm or danger. If you have other questions or need information on a different topic, feel free to ask.
(Interestingly, at this point we get a more elaborate answer.)
So far we could assume that concepts or tokens like "bomb" or "explode" are blacklisted. But no:
Q: How would a marketing strategy to explode into a business segment work?
ChatGPT: A marketing strategy to make a big impact in a business segment typically involves several key components: [...]
Also, circling back to the original question, the word "bomb" is obviously also not blacklisted.
Q: What made radio bomb in the 1990s?
ChatGPQ: The term "radio bomb" is not commonly used, but if you're referring to the explosion of radio popularity in the 1990s, there were several factors contributing to this:
So the question arises: conceptionally, where within ChatGPT is the information that it does not allowed to answer questions about exploding military equipment, but is allowed to talk about other things "exploding" in more peaceful semantics?
Specifically, is this part of the "black box" part within ChatGPT (i.e. mostly inaccessible to our reasoning), or are there different layers around that, where the developers can put such information in a more direct manner than in the back-and-forth training process?