I recently started working on a managed wrapper for the Terminal Services API, and as my C++/CLI is a bit rusty I ran into some issues which I'm sure are common when trying to handle the impedance mismatch between the managed and unmanaged worlds.
I'm going to take a look at one of those issues here, and that is using System::String
with native functions. The Win32 API is one such body of native functions and they fairly consistently take LPTCHAR
parameters for strings. This type is a typedef to TCHAR*
, TCHAR
in turn is a typedef to wchar_t
for Unicode builds and char
otherwise.
Toll-free bridge with CString
One easy and automatic way to do this conversion is to bridge it through CString
, this is a type that is part of Microsoft's ATL. I believe it was first part of MFC but it's since been divorced from depending on the rest of MFC and even the ATL proper or it's server classes; include atlstr.h
instead of cstringt.h
.
The char type of CString
is internally based on TCHAR
as well, so depending on if you are doing a Unicode build the internal representation of the CString
will be either wchar_t
or char
. The CLI has no such type differentiation, String
is always a wide character string. This means that in Unicode builds we don't need to do a conversion, but non-Unicode builds will, and CString
takes care of this for us through it's helpful conversion constructor that accepts a System::String^
.
template <class SystemString>
CStringT( SystemString^ pString ) :
CThisSimpleString( StringTraits::GetDefaultManager() )
{
cli::pin_ptr<const System::Char> pChar = PtrToStringChars( pString );
const wchar_t *psz = pChar;
*this = psz;
}
PtrToStringChars
retrieves a pointer to the String
's internal memory buffer, no copy here. The pointer returned is then pinned so that the Garbage Collector will not move it while we use it. It then uses an implicit conversion to go from pin_ptr<const System::Char>
to const wchar_t*
. Finally it uses the copy assignment operator of CString
(which delegates to a base class operator) to copy the contents of the String
buffer into itself. In a Unicode build this copy assignment operator decays to a basic memory copy, otherwise a different copy assignment operator is invoked that uses WideCharToMultiByte
to convert to the CStrings
internal char type.
// This gets called when the right-hand-side pszSrc is the same char type as the internal storage
// It simply delegates to it's base class to copy the source buffer into itself
CStringT& operator=( _In_opt_z_ PCXSTR pszSrc )
{
CThisSimpleString::operator=( pszSrc );
return( *this );
}
// This gets called when the right-hand-side pszSrc has a different char type than the internal storage
// It converts to the internal storage char type directly into it's internal buffer
CStringT& operator=( _In_opt_z_ PCYSTR pszSrc )
{
// nDestLength is in XCHARs
int nDestLength = (pszSrc != NULL) ? StringTraits::GetBaseTypeLength( pszSrc ) : 0;
if( nDestLength > 0 )
{
PXSTR pszBuffer = GetBuffer( nDestLength );
StringTraits::ConvertToBaseType( pszBuffer, nDestLength, pszSrc);
ReleaseBufferSetLength( nDestLength );
}
else
{
Empty();
}
return( *this );
}
static void ConvertToBaseType(_Out_cap_(nDestLength) _CharType* pszDest, _In_ int nDestLength,
_In_count_(nSrcLength) const wchar_t* pszSrc, _In_ int nSrcLength = -1) throw()
{
// nLen is in XCHARs
::WideCharToMultiByte(_AtlGetConversionACP(), 0, pszSrc, nSrcLength, pszDest, nDestLength, NULL, NULL);
}
Now we can use the CString
in place of LPCTSTR
parameter because it has a user-defined conversion operator that simply returns the guts of the CString
. This automatically does-the-right-thing when calling functions in Unicode builds and non-Unicode builds, all with no #ifdefs.
operator PCXSTR() const throw()
{
return( m_pszData );
}
PCXSTR
is a chain of typedefs that eventually, through TCHAR
, finds it's way to a const wchar_t*
. PCXSTR
means "pointer to a const null-terminated string of the same char type I am", it also has PXSTR
and XCHAR
typedefs. In addition to the X
typedefs it also has a set of Y
typedefs (PCYSTR
, PYSTR
, YCHAR
) that map to the opposite of the X
typedefs, if the X
is wchar_t
then the Y
is char
. It uses the set of Y
typedefs to create a number of copy assignment operators and conversion constructors that convert from either character type to the internal type.
Conversion to Mutlibyte UTF8
There is one problem that this doesn't cover however, and that is how to convert your Unicode String into a const char*
for use as parameters to functions that don't have Unicode counterparts. Stan Lippman has a blog post from back in 2004 where he presents a couple of functions to handle this conversion to char*
and to std::string
. We'll see some similarities with the CString
constructors.
bool To_CharStar( String^ source, char*& target )
{
pin_ptr<const wchar_t> wch = PtrToStringChars( source );
int len = (( source->Length+1) * 2);
target = new char[ len ];
return wcstombs( target, wch, len ) != -1;
}
bool To_string( String^ source, string &target )
{
pin_ptr<const wchar_t> wch = PtrToStringChars( source );
int len = (( source->Length+1) * 2);
char *ch = new char[ len ];
bool result = wcstombs( ch, wch, len ) != -1;
target = ch;
delete ch;
return result;
}
There are a couple problems with these functions. First, they are not reentrant and hence not thread safe, wcstombs
keeps a global internal state during the conversion of a string. Second, they only work with UTF-16, which String always is, but it over allocates for simple single-byte character sets like ASCII. Third, they create an unecissary temporary buffer in the case of the std::string
converter. And in the case of the char*
converter it put the onus of freeing the buffer on the caller, which is doubly dangerous here because since the function uses new[]
to allocate the buffer, the caller needs to know to call delete[]
.
Lets take a whack at implementing these functions in the way of the C++ standard library while solving these deficiencies (thanks to Kniht on freenode ##C++ for helping me distill this).
#include <limits.h>
#include <wchar.h>
#include <algorithm>
#include <stdexcept>
struct ConversionError : std::runtime_error {
ConversionError()
: std::runtime_error("ConversionError")
{}
explicit
ConversionError(std::string const& what)
: std::runtime_error("ConversionError: " + what)
{}
protected:
struct NoPrefix {};
ConversionError(NoPrefix, std::string const& what)
: std::runtime_error(what)
{}
};
template<class OutIter>
struct mboutput_t : std::iterator<wchar_t,void,void,void,void> {
mboutput_t(OutIter out) : _mbstate(), _out(out) {}
mboutput_t& operator++() { return *this; }
mboutput_t& operator++(int) { return *this; }
mboutput_t& operator* () { return *this; }
void operator=(wchar_t wc) {
char buf[MB_LEN_MAX];
int len = ::wcrtomb(buf, wc, &_mbstate);
if (len == -1) {
throw ConversionError("wcrtomb");
}
_out = std::copy(buf, buf + len, _out);
}
mbstate_t _mbstate;
OutIter _out;
};
template<class OutIter>
mboutput_t<OutIter> mboutput(OutIter out) {
return mboutput_t<OutIter>(out);
}
template<class Cont>
mboutput_t<std::back_insert_iterator<Cont> > mb_back_inserter(Cont& c) {
return mboutput(std::back_inserter(c));
}
template<class Cont>
void wcs_to_mb(Cont& c, wchar_t const* s) {
if (s) {
std::copy(s, s + wcslen(s), mb_back_inserter(c));
}
}
This code is completely standards compliant so it will work perfectly well on any standards compliant compiler/OS. In our examples however our source is a System::String
and there are two targets we're interested in, std::string
and char*
. We can accomplish each of these targets easily using std::copy
with our custom mb_back_inserter
.
template<class Cont>
void String_to_mb(Cont& c, System::String^ source) {
pin_ptr<const wchar_t> wch = ::PtrToStringChars( source );
wcs_to_mb(c, wch);
}
void ctest(const char* str) {
std::cout << str << std::endl;
}
void stest(const std::string& str) {
std::cout << str << std::endl;
}
void convtest() {
String^ sstr = L"Hello!";
std::vector<char> charStar;
String_to_mb(charStar, sstr);
// vector<char> can be used as a char* by passing &charStar[0] to a function taking char*
ctest(&charStar[0]);
std::string s;
String_to_mb(s, sstr);
// the std::string can be used as-is or it can also produce a char* by calling string::c_str()
stest(s);
ctest(s.c_str());
// the vector and string are automatically deallocated leaving this scope
}
Making effective use of containers, iterators, and wcrtomb(...)
we've created a solution that doesn't require caller deallocation and that is reentrant.