Tuesday, October 24, 2006

Value-type casts of the new age

Easy and sensible type conversions of streamable object values / states


Last weekend I was trying to write for myself a small shim-class which will convert between bool and std::string data types. I wanted the conversions to also handle English words like "true", "yes" or "NO" in an intellligent way and convert them to boolean values.


I ended up writing a small set of classes which handle conversions between any two streamable types (in other words any type which can be inserted into an ostream object like cout and extracted from an istream object like cin). This work is not a purely original work - influenced by the semantics of boost::lexical_cast. It also uses a case-insensitive std::basic_string constructing which has been described by Herb Sutter in one of his GotW articles.


It's just that I might just want to use a lexical_cast facility without really bothering to use a whole new library for it (I love boost but if lexical_cast is the only thing I need from its offerings, I'd rather roll out my own, call it egotism if you will).


I have disallowed conversions to pointer types. I reckon the only conversion to a pointer type you will really ever need is to char* and you can manage with conversions std::string and then use c_str() accessor member function. If you want to convert from long to some pointer type use casts provided by the language (static_cast / reinterpret_cast).


I haven't looked at boost::lexical_cast's code and I am certain it will be tighter than mine. That notwithstanding, my code is listed below.





/* ---- this file is called conversion.hpp ---- */
#ifndef __CONVERSION_HPP__
#define __CONVERSION_HPP__

#include <sstream>

namespace MYUTILS {

// ---- Case-insensitive character string support ----
// *** WARNING - works only in en locales - ***

struct char_ci_traits : public std::char_traits<char>
{
static bool eq ( char c1, char c2 )
{
return ( toupper(c1) == toupper(c2) );
}


static bool ne ( char c1, char c2 )
{
return ( toupper(c1) != toupper(c2) );
}


static bool lt ( char c1, char c2 )
{
return ( toupper(c1) < toupper(c2) );
}


static int compare ( const char * s1,
const char * s2,
size_t n )
{
while ( n-- > 0 && toupper(*s1) == toupper(*s2) )
s1++,s2++;
return ( *s1 - *s2 );
}


static const char * find ( const char *s1, int n, char a )
{
while ( n-- >0 && toupper(*s1) != toupper(a) )
s1++;


return s1;
}
};


typedef std::basic_string<char, char_ci_traits> ci_string;




class bad_lexical_cast : public std::exception
{
public:
bad_lexical_cast( const std::string& error ) : std::exception(), msg_(error)
{
std::ostringstream strm;
strm << "Could not convert [ "
<< error.c_str()
<< " ] using lexical_cast";


msg_ = strm.str();
}


virtual ~bad_lexical_cast() throw()
{}


const char* what() const throw()
{
return msg_.c_str();
}


protected:
std::string msg_;
};


// ---- functor lexical cast ----
// converts between any two streamable types
// 1. For conversion to non-strings

template <typename U>
struct __inr_lexical_cast
{
template <typename T>
U operator () ( T val ) throw(bad_lexical_cast)
{
std::stringstream strm;
strm << val;
U retval;
strm >> retval;


if ( strm.bad() || strm.fail() ) {
throw bad_lexical_cast( strm.str() );
}


return retval;
}


// Optimization - type to same-type conversion
// No conversion is required - just return what passed to you

U operator () ( U val )
{
return val;
}
};


// 2. Second overload is for conversion to strings
// No conversion to "const char*" is provided specifically for this purpose

template <>
struct __inr_lexical_cast<std::string>
{
template <typename T>
std::string operator () ( T val )
{
std::stringstream strm;
strm << val;


return strm.str();
}


std::string operator () ( std::string val )
{
return val;
}
};



// 3. Some specialized string-to-bool conversions
// true, yes, y become 1 and false, no, no become 0
// - case insensitive so true=TRUE=TrUe, etc.

template <>
struct __inr_lexical_cast<bool>
{
template <typename T>
bool operator () ( T val ) throw(bad_lexical_cast)
{
std::stringstream strm;
strm << val;


return operator()( strm.str() );
}


bool operator () ( const std::string& str ) throw(bad_lexical_cast)
{
try{
int val = __inr_lexical_cast<int>()( str );
return (val)?true:false;
}
catch(...){}


ci_string istr = str.c_str();


if ( istr == "yes" || istr == "y" || istr == "true" )
return true;
else
if ( istr == "no" || istr == "n" || istr == "false" )
return false;
else
throw bad_lexical_cast(str);
}
};




// 4. Conversions to pointer-types are not supported
// To get char*, use lexical_cast<std::string> and use .c_str()
// of string

template <typename U>
struct __inr_lexical_cast <U*>
{
// No operator () provided
// Something like lexical_cast<char*>(my_obj) won't compile
};




// ---- Finally the template function front-end to the above functors ----
// - use this as:
// U Uval = lexical_cast<U>(Tval);
// - this syntax is more natural but slightly less-efficient

template <typename U, typename T>
U lexical_cast ( T val ) throw(bad_lexical_cast)
{
U retval = __inr_lexical_cast<U>()(val);

return retval;
}




} /*MYUTILS*/
#endif



The above code compiles on gcc 2.95 and above. The code does not compile on MSVC 6.0 and I am yet to test it on a later compiler (I have MSVC 7 and I guess it will compile on it).

Here is some stub code to test the above utilities:

// ---- this file is called lexical_cast_test.cpp -----

// lexical_cast_stub :
//


#include <iostream>
#include "conversion.hpp"

int main(int argc, char* argv[])
{
try {
std::string val = "52";
int num = MYUTILS::lexical_cast<int>(val);bool>(val); // expect num to contain 52

val = "TruE";
bool b = MYUTILS::lexical_cast<bool>(val); // expect b to contain true

val = "I am not a number";
num = MYUTILS::lexical_cast<int>(val); // this should throw bad_lexical_cast
}
catch ( std::exception& e )
{
std::cout << e.what ( ) << std::endl;
}

return 0;
}



This form of usage obviates the need for an elaborate set of conversion functions like atoi which can segfault or the [v]s*printf family of functions, whose use is clumsy.

3 Comments:

Anonymous Anonymous said...

>#ifndef __CONVERSION_HPP__
>#define __CONVERSION_HPP__

Names starting with _ and capital letters are reserved for implemention. So, better not use them for portability.

4:50 PM  
Blogger Arindam Mukherjee said...

This comment has been removed by the author.

5:40 PM  
Blogger Arindam Mukherjee said...

Not entirely true. Upper case *is* the recommended way for third-party libraries. However, it's true that you shouldn't start with an underscore.

9:32 PM  

Post a Comment

<< Home