Copyright © 2021 Daniel Oaks <daniel@danieloaks.net>
Copyright © 2021 Shivaram Lingamneni <slingamn@cs.stanford.edu>
Unlimited redistribution and modification of this document is allowed provided that the above copyright notice and this permission notice remains intact.
IRC predates the Unicode standard. Consequently, although UTF-8 has been widely adopted on IRC, clients cannot assume that all IRC data is UTF-8. This specification defines a way for servers to advertise that they only allow UTF-8 on their network, letting clients change their processing of outgoing and incoming messages accordingly.
UTF8ONLY
ISUPPORT token π
This specification introduces a new token UTF8ONLY
that servers can include in their ISUPPORT (005
) output. Servers publishing this token MUST NOT relay content (such as PRIVMSG
or NOTICE
message data, channel topics, or realnames) containing non-UTF-8 data to clients. Clients implementing this specification MUST NOT send non-UTF-8 data to the server once they have seen this token. Server handling of such messages is implementation-defined; for example, they MAY send the INVALID_UTF8
code described below, or respond in some other way.
If a client implementing this specification sees this token, they MUST set their outgoing encoding to UTF-8 without requiring any user intervention. This allows clients to work transparently on networks that only allow UTF-8 traffic.
INVALID_UTF8
standard replies code π
This is a code that can be used with the standard replies specification. When sent with the FAIL
command, it indicates that the clientβs message was rejected because it contained invalid UTF-8 data. When sent with the WARN
command, it indicates that the message was modified but still accepted.
Client: PRIVMSG #ircv3 :<non-utf-8 message>
Server: FAIL PRIVMSG INVALID_UTF8 :Message rejected, your IRC software MUST use UTF-8 encoding on this network
Client: USER u s e :<non-utf8 realname>
Server: FAIL USER INVALID_UTF8 :Message rejected, your IRC software MUST use UTF-8 encoding on this network
Client: PRIVMSG #ircv3 :<non-utf-8 message>
Server: WARN PRIVMSG INVALID_UTF8 :Your message was not correctly encoded as UTF-8 and had to be modified
This section is non-normative.
Implementations must ensure that if they truncate messages to meet a length limit, they do not do so in the middle of a UTF-8-encoded codepoint.
Software supporting UTF8ONLY: Ergo, AdiIRC, Halloy, HexChat, KVIrc, mIRC, Srain, WeeChat, soju (as Server), soju (as Client), Limnoria, Matrix2051