String encoding and unicode issues¶
This section mostly concerns Python 2.
When the documentation says that a str is accepted, an unicode
is also always accepted on Python 2. What is more, when the documentation says
that a str is returned or passed to a callback, on Python 2 it is
actually a unicode (you mostly don’t need to care about that though,
because most string operations on Python 2 allow mixing str and
unicode).
When passing a bytes object (or, equivalently, a str object on
Python 2) to a SDK function that says that it accepts a str, the bytes
will be interpreted as being UTF-8 encoded. Beware: If the string has invalid
UTF-8 (e.g., Latin-1/ISO-8859-1, as it may occur in HTTP headers), the function to which it was passed may fail either
partially or fully. Such failures are guaranteed to neither throw exceptions nor
violate any invariants of the involved objects, but some or all of the
information passed in that function call may be lost (for example, a single
invalid HTTP header passed to
oneagent.sdk.SDK.trace_incoming_web_request() may cause an null-tracer to
be returned – but it is also allowed to, for example, truncate that HTTP header
and discard all that follow; the exact failure mode is undefined and you should
take care to not pass invalid strings). Also, the diagnostic callback
(oneagent.sdk.SDK.set_diagnostic_callback()) may be invoked (but is not
guaranteed to).
HTTP Request and Response Header Encoding¶
The strings passed to the add_request_header(), add_request_headers(),
add_parameter(), add_parameters(), add_response_header() and
add_response_headers() methods of the oneagent.sdk.tracers.IncomingWebRequestTracer
and oneagent.sdk.tracers.OutgoingWebRequestTracer classes follow the usual SDK
encoding conventions, i.e., must be either unicode strings or UTF-8 bytes objects.
Warning
However, HTTP and Python’s WSGI will use the Latin-1 encoding on
Python 2. Before passing such Python 2 native strings to these methods, use
s.decode('Latin-1') (see bytes.decode()) to convert the string to unicode, which
can be correctly handled.