[Pdl-porters] 64bit index support fix and PDL3 types

Discussion:

Chris Marshall

2013-12-10 19:42:57 UTC

This is a quick summary of my planned fix for the
implicit double<->longlong conversion which is causing
problems with the new PDL_Indx type when it is a
64bit integer. I believe this fix would work as the
basis for general type support planned for PDL3.

NOTE: The new type support could probably be
shoehorned into our PDL-2.x development sequence
but to simplify development, a clean break with the
legacy kitchen-sink of code needs to be made.

As it is, there are large sections of the PDL-2.x
source that are not well known by any current PDL
developers. Cleaning that out should enable a more
robust core and stable base for the new functionality.

ASSUMPTIONS for PDL type support:
- based on C to avoid C++ compiler/link problems
- as simple as possible
- keep types modular and efficient
- data should be able to be processed at the C level

The current PDL-2.x types are based on single,
atomic types which are a subset of the standard C
data types *and* which all can be cast into a double
without loss of precision. As a result, the generic
PDL type is implicitly 'double' which is why the
problems with 64bit integer index support. An IEEE
double only has a 52bit mantissa.

THE PROPOSED FIX:

Replace the hidden, container type of 'double' by a
union type large enough to hold the set of desired
atomic types *and* a pointer. The idea is that the
atomic types could be passed by copy but the
pointer type would support arbitrarily complex types
at the expense of an additional indirection. Conveniently,
perl SVs are pointers (which give us perl references,
object, code,...)

This idea seems to work directly for PDL3 where the
atomic types would be all C integer types 64bits or
less, all floating point types 64bit or less, and the
corresponding complex float types with components
64bits or less.

In addition to the union, we would need to pass around
information on what the type is. I'm not sure where the
best place to stash it. Seems likely that having the
value perl-piddle would be efficient and similar to what
we have now. If a type were needed per-element, then
that would involve using one of the pointer types at
which point full generality is available.

Thoughts and/or comments?
Chris

David Mertens

2013-12-10 20:52:02 UTC

Permalink

One the surface, this seems like it is moving in the right direction. Do
you have any references in the code where I could look more closely into
the implementation mechanism? In particular, I wonder if it might be better
to represent a type by a vtable, which would be a pointer to a struct with
methods specifically meant to handle this kind of data. Different data
types could add support for themselves at runtime by simply building a new
vtable struct. (That said, a general type lookup table wouldn't be a bad
idea, and a general cleanup mechanism would also be a good idea.)

As you can see, I'm speaking in very vague terms, so a more precise context
in the PDL codebase would be helpful. :-)

David

Post by Chris Marshall
This is a quick summary of my planned fix for the
implicit double<->longlong conversion which is causing
problems with the new PDL_Indx type when it is a
64bit integer. I believe this fix would work as the
basis for general type support planned for PDL3.
NOTE: The new type support could probably be
shoehorned into our PDL-2.x development sequence
but to simplify development, a clean break with the
legacy kitchen-sink of code needs to be made.
As it is, there are large sections of the PDL-2.x
source that are not well known by any current PDL
developers. Cleaning that out should enable a more
robust core and stable base for the new functionality.
- based on C to avoid C++ compiler/link problems
- as simple as possible
- keep types modular and efficient
- data should be able to be processed at the C level
The current PDL-2.x types are based on single,
atomic types which are a subset of the standard C
data types *and* which all can be cast into a double
without loss of precision. As a result, the generic
PDL type is implicitly 'double' which is why the
problems with 64bit integer index support. An IEEE
double only has a 52bit mantissa.
Replace the hidden, container type of 'double' by a
union type large enough to hold the set of desired
atomic types *and* a pointer. The idea is that the
atomic types could be passed by copy but the
pointer type would support arbitrarily complex types
at the expense of an additional indirection. Conveniently,
perl SVs are pointers (which give us perl references,
object, code,...)
This idea seems to work directly for PDL3 where the
atomic types would be all C integer types 64bits or
less, all floating point types 64bit or less, and the
corresponding complex float types with components
64bits or less.
In addition to the union, we would need to pass around
information on what the type is. I'm not sure where the
best place to stash it. Seems likely that having the
value perl-piddle would be efficient and similar to what
we have now. If a type were needed per-element, then
that would involve using one of the pointer types at
which point full generality is available.
Thoughts and/or comments?
Chris
_______________________________________________
PDL-porters mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan

Chris Marshall

2013-12-10 23:19:08 UTC

Permalink

I was thinking the vtable part would be part of the piddle
structure in an effort to keep the implementation simple.
Clearly, it would be possible to put it in the type instead
and then changes to the piddle would occur by linking
into, before, after, around the existing vtable.

--Chris

One the surface, this seems like it is moving in the right direction. Do you
have any references in the code where I could look more closely into the
implementation mechanism? In particular, I wonder if it might be better to
represent a type by a vtable, which would be a pointer to a struct with
methods specifically meant to handle this kind of data. Different data types
could add support for themselves at runtime by simply building a new vtable
struct. (That said, a general type lookup table wouldn't be a bad idea, and
a general cleanup mechanism would also be a good idea.)
As you can see, I'm speaking in very vague terms, so a more precise context
in the PDL codebase would be helpful. :-)
David

Chris Marshall

2013-12-15 16:36:07 UTC

Permalink

The idea of a union type would be ok for a truely generic
datatype for piddles but at the expense of memory use
and some complexity.

It seems that the most efficient approach would be to
have the data separate from the type information which
would be used to determine how and what processing
occurs.

The union type could be used for efficient but truly
generic type support at the element level (can you say
arbitrary element piddles?). However, most processing
would use the type information (vtable or however
specified) to determine the processing on the data.

The union type would be present in the pdl struct so
that there is a uniform way to access data if needed.

--Chris

Post by Chris Marshall
I was thinking the vtable part would be part of the piddle
structure in an effort to keep the implementation simple.
Clearly, it would be possible to put it in the type instead
and then changes to the piddle would occur by linking
into, before, after, around the existing vtable.
--Chris

One the surface, this seems like it is moving in the right direction. Do you
have any references in the code where I could look more closely into the
implementation mechanism? In particular, I wonder if it might be better to
represent a type by a vtable, which would be a pointer to a struct with
methods specifically meant to handle this kind of data. Different data types
could add support for themselves at runtime by simply building a new vtable
struct. (That said, a general type lookup table wouldn't be a bad idea, and
a general cleanup mechanism would also be a good idea.)
As you can see, I'm speaking in very vague terms, so a more precise context
in the PDL codebase would be helpful. :-)
David