[elbe-devel] [PATCH] soapclient: Handle logged utf-16 characters in wait_busy action

Tue Apr 30 14:09:51 CEST 2019

On 07:45 Tue 30 Apr     , andreas at linutronix.de wrote:
> From: Andreas Messerschmid <andreas.messerschmid at linutronix.de>
> 
> Convert logging output to utf-8 before printing and let the
> python libraries do the replacing of all the newline/carriage return
> variants, so we can omit failures like:
>
> | Mon Apr 29 11:18:04 2019 -- Adding debian:IdenTrust_Commercial_Root_CA_1.pem
> | Traceback (most recent call last):
> |   File "/home/andreas/elbe/elbe", line 55, in <module>
> |     cmdmod.run_command(sys.argv[2:])
> |   File "/home/andreas/elbe/elbepack/commands/control.py", line 169, in run_command
> |     action.execute(control, opt, args[1:])
> |   File "/home/andreas/elbe/elbepack/soapclient.py", line 617, in execute
> |     log[1].replace('\n','')))
> | UnicodeEncodeError: 'ascii' codec can't encode character u'\u0151' in position 70:
> |                     ordinal not in range(128)
> | elbe control wait_busy Failed
> | Giving up

<off-topic>
While looking at this, I recognized that if pyhton3 is used six defaults to
string as datatype for TextType and if pyhton2 is used they default to unicode.
Probably the idea is to be compatible with "the other" python version. However
if six is used in a middleware on client and server side and those python
versions differ, i guess the opposite of being compatible happens. We should
keep this in mind if switching to pyhton3.
</off-topic>

I don't understand the UnicodeEncodeError here. The string should be in
unicode since d75491903d8080c04c62196b7ac8534c6db50c61 that makes an explicit
decode("utf-8", "replace") on server side.

However u'\u0151' seems to be utf-16. So i would expect the decode("utf-8",
"replace") on server side to fail.

Do you have a "quick" reproducer for the problem?

More inline..

> Signed-off-by: Andreas Messerschmid <andreas.messerschmid at linutronix.de>
> ---
>  elbepack/soapclient.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/elbepack/soapclient.py b/elbepack/soapclient.py
> index 6ae1bd2..1de3e64 100644
> --- a/elbepack/soapclient.py
> +++ b/elbepack/soapclient.py
> @@ -614,7 +614,7 @@ class WaitProjectBusyAction(ClientAction):
>                          localtime = time.asctime(time.localtime(time.time()))
>                          try:
>                              print("%s -- %s" % (localtime,
> -                                                log[1].replace('\n','')))
> +                                                ''.join(log[1].splitlines()).encode('utf-8')))

''.join() seems to be a noop for to me:

>>> type(''.join("a"))
<type 'str'>
>>> type(''.join(u"a"))
<type 'unicode'>

Is it really needed?

print can handle unicode, so the encode('utf-8') is also not necessary
i guess.

Is it posible that encodings get mixed, because the replace function of the
unicode string is called with normal strings as parameters?

Than this
log[1].replace(u'\n', u'')
or that
log[1].striplines()
should do the job.

>                          except IndexError:
>                              print("IndexError - part: %d (skipped)" % part)
>                      else:
> -- 
> 2.11.0
> 
> 
> _______________________________________________
> elbe-devel mailing list
> elbe-devel at linutronix.de
> https://lists.linutronix.de/mailman/listinfo/elbe-devel