Sending Unicode through Windows Pipes

Sending Unicode through Windows Pipes

Passing unicode data through Windows Pipes sometimes requires special handling to avoid having the pipes scramble the data. We can see this using the cmd prompt and navigating to a path containing unicode characters. The unicode characters cannot be typed, or rendered on screen (showing as question marks instead), though we can use tab complete to type the unicode path.

If we are passing data from one program to another through a pipe controlled through a CreateProcess call, we can read the data off with ReadFile commands. However, our valid unicode printed onto the pipe can come out the other side as the question marks we saw in the cmd prompt.

printf commands format printed text according to the code page of the stream, which for stdout and stderr, is set by the locale. Changing the locale will allow the unicode to be passed through the pipe, though it will be multi-byte encoded.

fwprintf(stderr, L"%s\n", L"(暖风器系统图)\n");
> (??????)
setlocale(LC_ALL, ".936");
fwprintf(stderr, L"%s\n", L"(暖风器系统图)\n");
> (暖风器系统图)

Windows doesn’t always support posix compliant locales, so you can check which locale has been set by reading the return codes when you set the locale or by sending a NULL to query the current locale.

fprintf(stderr, "Locale: %s\n", setlocale(LC_ALL, ".936"));
fprintf(stderr, "Locale: %s\n", setlocale(LC_ALL, NULL));

Leave a Reply

Your email address will not be published. Required fields are marked *