String encoding and unicode issues

This section mostly concerns Python 2.

When the documentation says that a str is accepted, an unicode is also always accepted on Python 2. What is more, when the documentation says that a str is returned or passed to a callback, on Python 2 it is actually a unicode (you mostly don’t need to care about that though, because most string operations on Python 2 allow mixing str and unicode).

When passing a bytes object (or, equivalently, a str object on Python 2) to a SDK function that says that it accepts a str, the bytes will be interpreted as being UTF-8 encoded. Beware: If the string has invalid UTF-8 (e.g., Latin-1/ISO-8859-1, as it may occur in HTTP headers), the function to which it was passed may fail either partially or fully. Such failures are guaranteed to neither throw exceptions nor violate any invariants of the involved objects, but some or all of the information passed in that function call may be lost (for example, a single invalid HTTP header passed to oneagent.sdk.SDK.trace_incoming_web_request() may cause an null-tracer to be returned – but it is also allowed to, for example, truncate that HTTP header and discard all that follow; the exact failure mode is undefined and you should take care to not pass invalid strings). Also, the diagnostic callback (oneagent.sdk.SDK.set_diagnostic_callback()) may be invoked (but is not guaranteed to).

HTTP Request and Response Header Encoding

The strings passed to the add_request_header(), add_request_headers(), add_parameter(), add_parameters(), add_response_header() and add_response_headers() methods of the oneagent.sdk.tracers.IncomingWebRequestTracer and oneagent.sdk.tracers.OutgoingWebRequestTracer classes follow the usual SDK encoding conventions, i.e., must be either unicode strings or UTF-8 bytes objects.

Warning

However, HTTP and Python’s WSGI will use the Latin-1 encoding on Python 2. Before passing such Python 2 native strings to these methods, use s.decode('Latin-1') (see bytes.decode()) to convert the string to unicode, which can be correctly handled.