[Pdl-porters] PDL3 bitfield support

Discussion:

Chris Marshall

2013-12-13 19:08:26 UTC

To follow up on mraptor's desire for bitfield support as part
of the new PDL data type support. E.g.:

Usage: $bits = bit [ 1, 0, 0, 1, 1 ]
$bits = zeros bit, 1000, 1000;

Here are a few thoughts on the subject:

* Bit piddles can be more than an order of magnitude more
compact than corresponding integer types (8X to 64X)

* I suggest we go a bit more general :-) and allow arbitrary
length bitfield types: bit, quadbit, octbit, hexbit, ... which
ties in to some of the recent discussion on categorical
PDLs on the #pdl.

* These types of data are commonly referred to as SIMD
Within A Registar computation or SWAR:

http://en.wikipedia.org/wiki/SWAR

* The availability of bitfield data types would allow for very
compact storage and access of PDL logical mask
generation and computations. E.g., a bit2 data type
could support an enhanced type of bad value support:
good, bad, missing, approximate.

* We would have to construct the engines to manipulate
and use the bitfield data types but I expect it to be quite
general. For example, GPU instructions often have the
ability to process a vector of data under the condition of
a mask. This maps very well into the proposed new type.

* Part of making this efficient would be to allow for bitfield
data to be laid out in a computationally efficient format.
E.g., a 2D bit array might have a 64bit section of the
bitfield actually represent an 8x8 block of the data. That
would allow for efficient local computation with reduced
memory traffic. Algorithms to transpose/permute bitfield
data would be essential.

Could be some very cool stuff!

Thoughts?
Chris

Chris Marshall

2013-12-13 21:10:55 UTC

Permalink

Will things like sum, sumover work on them ? i.e. functions which do
something but return "non a-bit-field results"

Take a look at PDL::PP for how one can code PDL
operations according to user specified signatures.
The hope is that PDL3 implementation be upwardly
compatible with PDL-2.x functionality---maybe with
different underlying engine.

One other idea just come up to me... you could probably allow some
sort of mapping.. for example look at the Erlang/Elixir bitstring
syntax for a way to map structured data over bitfield's.
http://learnyousomeerlang.com/starting-out-for-real
scroll down to "BitSyntax!"
Let say you store TCP/IP headers in a bitfield ... you can impose a
slice-syntax definition, so you refer to them by name... forget my
x = pdl bit, 1000, 10000 ;#1000 cols
#imaginary TCP struct
struct = << flag1:1, flag2:3, flag:5, ip:32 >>
x->where(struct.flag == 0b101 ) #get me back all ip's which have flag2 set to 5
you get the idea...the bitfield is amorphos-container and you define
structure on top of it which is nothing more but glorified
slice-definitions..

Yes, you can already do this with PDL-2.x and I often use
this technique for PDL IO for binary data files. See PDL::Slices
for PDL-2.007 for the new slice() method syntax and features.

--Chris

-------| http://ifni.co

Post by Chris Marshall
To follow up on mraptor's desire for bitfield support as part
Usage: $bits = bit [ 1, 0, 0, 1, 1 ]
$bits = zeros bit, 1000, 1000;
* Bit piddles can be more than an order of magnitude more
compact than corresponding integer types (8X to 64X)
* I suggest we go a bit more general :-) and allow arbitrary
length bitfield types: bit, quadbit, octbit, hexbit, ... which
ties in to some of the recent discussion on categorical
PDLs on the #pdl.
* These types of data are commonly referred to as SIMD
http://en.wikipedia.org/wiki/SWAR
* The availability of bitfield data types would allow for very
compact storage and access of PDL logical mask
generation and computations. E.g., a bit2 data type
good, bad, missing, approximate.
* We would have to construct the engines to manipulate
and use the bitfield data types but I expect it to be quite
general. For example, GPU instructions often have the
ability to process a vector of data under the condition of
a mask. This maps very well into the proposed new type.
* Part of making this efficient would be to allow for bitfield
data to be laid out in a computationally efficient format.
E.g., a 2D bit array might have a 64bit section of the
bitfield actually represent an 8x8 block of the data. That
would allow for efficient local computation with reduced
memory traffic. Algorithms to transpose/permute bitfield
data would be essential.
Could be some very cool stuff!
Thoughts?
Chris

Craig DeForest

2013-12-17 18:58:45 UTC

Permalink

Bitfield support should be manageable (if slightly slow) within PP, since all access is supposed to be through the access macros anyway, so it's a SMOP to add mask-and-shift steps to those macros. The bigger issue is the places that work around the PP macros (e.g. range(), and unknown third-party apps). A little thought is going to be required for how to handle those cases.

Post by Chris Marshall

Will things like sum, sumover work on them ? i.e. functions which do
something but return "non a-bit-field results"

Yes, you can already do this with PDL-2.x and I often use
this technique for PDL IO for binary data files. See PDL::Slices
for PDL-2.007 for the new slice() method syntax and features.
--Chris

-------| http://ifni.co

_______________________________________________
PDL-porters mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters

Chris Marshall

2013-12-17 22:28:22 UTC

Permalink

On Tue, Dec 17, 2013 at 1:58 PM, Craig DeForest

Post by Craig DeForest
Bitfield support should be manageable (if slightly slow) within PP,
since all access is supposed to be through the access macros anyway,
so it's a SMOP to add mask-and-shift steps to those macros. The bigger
issue is the places that work around the PP macros (e.g. range(), and
unknown third-party apps). A little thought is going to be required for
how to handle those cases.

Is this the SMOP to which you refer? :-)
http://en.wikipedia.org/wiki/Small_matter_of_programming

At any rate, a brute force implementation could be done as
suggested by more extended processing to handle the
indexing, data retrieval and assignment, with possibly
expanding to an enclosing integer data type as the first
step.

However, my thought was that here was a place that the
PDL3 part comes in---the current implementation of PP
uses statically compiled routines. If we use Moo[se] as the
basis for PDL objects, it would be possible to override various
features and functionality on a per class or ever per object
basis. This could give a place to tie in hooks for JIT code
generation or perhaps work based out of the tcc
investigations of David.

In fact, that might be the best way for general type support
to be implemented so that we don't have a huge, monolithic
blob of every possible type loop when we only use a few.

--Chris

Post by Craig DeForest

Post by Chris Marshall

Will things like sum, sumover work on them ? i.e. functions which do
something but return "non a-bit-field results"

Yes, you can already do this with PDL-2.x and I often use
this technique for PDL IO for binary data files. See PDL::Slices
for PDL-2.007 for the new slice() method syntax and features.
--Chris

-------| http://ifni.co

_______________________________________________
PDL-porters mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/pdl-porters

mraptor

2013-12-13 20:16:27 UTC

Permalink

Kewl... :)
Will things like sum, sumover work on them ? i.e. functions which do
something but return "non a-bit-field results"

One other idea just come up to me... you could probably allow some
sort of mapping.. for example look at the Erlang/Elixir bitstring
syntax for a way to map structured data over bitfield's.

http://learnyousomeerlang.com/starting-out-for-real
scroll down to "BitSyntax!"

Let say you store TCP/IP headers in a bitfield ... you can impose a
slice-syntax definition, so you refer to them by name... forget my
pseudo code, but here is the idea :

x = pdl bit, 1000, 10000 ;#1000 cols
#imaginary TCP struct
struct = << flag1:1, flag2:3, flag:5, ip:32 >>

x->where(struct.flag == 0b101 ) #get me back all ip's which have flag2 set to 5

you get the idea...the bitfield is amorphos-container and you define
structure on top of it which is nothing more but glorified
slice-definitions..
-------| http://ifni.co