Richard Balint
2014-08-20 03:57:16 UTC
Hello! I'm Richard, and back in 2008, I did a lot of updates for
PDL::IO::HDF5 when I was with Ball Aerospace. Well, I've finally gotten
around to try to get them properly upstream. :)
I used to be in contact with John Cerney back then, and he did move some of
the patches into the tree back then, but there are lots more:
* multi-dimension attributes,
* attributes of varying datatypes,
* compound data types,
* HDF5 constants are dynamically computed at compile time,
* split hdf5.pd into an XS (hdf5.pd) and a perl/POD module (HDF5.template.pm)
for less hair pulling
* Additional test cases
* Added ->close() methods.
* Group.pm/Dataset.pm use inherited class routines when possible for code
consolidation (mostly attribute stuff)
* getGroupsByAttr() now uses regex
* Added pthread compilation option to Makefile.PL
At least that's what my old release notes tell me. ;)
What I wanted to do was raise awareness that I wish to merge request some
or all of my code upstream if possible. But I had a few questions on
interfaces, style, etc.
* Can we use v5.10 :) Consensus on this list seems that PDL is going to
v5.10 in the next few iterations. Can PDL::IO::HDF5 move this way as well?
* The current attribute interface mostly assumes that attributes are text,
and tries to send back perl text if at all possible. Wouldn't it be more
standard to always assume that this returns a PDL, rather than a perl
variable? I know that John mentioned that this may break some older code,
and as a mitigation, maybe have some flags on HDF5 open to force conversion
of text PDLs to Perl text scalars? Thoughts on this? This is probably my
biggest concern. Users pulling attributes and having to test if it is a PDL
versus scalar seems less friendly than PDL text values. :/
* When was the groupIndex cache added? Is this heavily used? If so, I
would need to rework some code to put it back in, due to some slight
behind-the-scenes changes. (The fileObj requirement for groups and datasets
just seems very ugly to me... there's got to be a better way to implement
the caching... but no idea on how to do it yet.)
* Tests are in a random order, would it make sense to have tests run in a
more logical order like file, group, dataset, attributes, etc? So that if
there's a problem in say group, testing fails quicker and doesn't test the
stuff that depends on it?
* Does anyone know if users are still using HDF5 1.4.x? [2007] The code
would be a lot cleaner if we were able to just support HDF5 1.6.x and
1.8.x, which are very similar.
If anyone is interested in perusing my changed so far (only have a few test
files to get re-working) (up through attributes works now), my fork is at:
https://sourceforge.net/u/hackswell/pdl/ci/reorganization/tree/
(which is following the official master.)
Responses, thought-out criticism, ideas, peer-review, etc are all welcome!
-Ricardo!
PDL::IO::HDF5 when I was with Ball Aerospace. Well, I've finally gotten
around to try to get them properly upstream. :)
I used to be in contact with John Cerney back then, and he did move some of
the patches into the tree back then, but there are lots more:
* multi-dimension attributes,
* attributes of varying datatypes,
* compound data types,
* HDF5 constants are dynamically computed at compile time,
* split hdf5.pd into an XS (hdf5.pd) and a perl/POD module (HDF5.template.pm)
for less hair pulling
* Additional test cases
* Added ->close() methods.
* Group.pm/Dataset.pm use inherited class routines when possible for code
consolidation (mostly attribute stuff)
* getGroupsByAttr() now uses regex
* Added pthread compilation option to Makefile.PL
At least that's what my old release notes tell me. ;)
What I wanted to do was raise awareness that I wish to merge request some
or all of my code upstream if possible. But I had a few questions on
interfaces, style, etc.
* Can we use v5.10 :) Consensus on this list seems that PDL is going to
v5.10 in the next few iterations. Can PDL::IO::HDF5 move this way as well?
* The current attribute interface mostly assumes that attributes are text,
and tries to send back perl text if at all possible. Wouldn't it be more
standard to always assume that this returns a PDL, rather than a perl
variable? I know that John mentioned that this may break some older code,
and as a mitigation, maybe have some flags on HDF5 open to force conversion
of text PDLs to Perl text scalars? Thoughts on this? This is probably my
biggest concern. Users pulling attributes and having to test if it is a PDL
versus scalar seems less friendly than PDL text values. :/
* When was the groupIndex cache added? Is this heavily used? If so, I
would need to rework some code to put it back in, due to some slight
behind-the-scenes changes. (The fileObj requirement for groups and datasets
just seems very ugly to me... there's got to be a better way to implement
the caching... but no idea on how to do it yet.)
* Tests are in a random order, would it make sense to have tests run in a
more logical order like file, group, dataset, attributes, etc? So that if
there's a problem in say group, testing fails quicker and doesn't test the
stuff that depends on it?
* Does anyone know if users are still using HDF5 1.4.x? [2007] The code
would be a lot cleaner if we were able to just support HDF5 1.6.x and
1.8.x, which are very similar.
If anyone is interested in perusing my changed so far (only have a few test
files to get re-working) (up through attributes works now), my fork is at:
https://sourceforge.net/u/hackswell/pdl/ci/reorganization/tree/
(which is following the official master.)
Responses, thought-out criticism, ideas, peer-review, etc are all welcome!
-Ricardo!