Discussion:
[Pdl-porters] Patch for PDL::FFTW3 plans to support alignment
Chris Marshall
2014-01-19 19:35:36 UTC
Permalink
Hi Dima-

I couldn't figure out the workflow to issuing a pull request at
github.com so here is a patch generated to add support for arbitrary
alignment for the input to PDL::FFTW3 routines. The new
implementation appears to work but the tests for whether or not a new
plan was made now fail due to the modified logic for plan generation.
I've added a routine to PDL::FFTW3 called get_data_alignment() that
could be used to generalize the tests.

After working through a couple of iterations on this code, I can see
why you weren't a fan of handling the alignment stuff directly. With
this fix, we should be able to support existing PDL versions until we
eventually get to the PDL3 work. I'm still trying to wrap up the
final fixes for the 64bit index support for PDL-2.008. We definitely
need to add better support for alignment issues in PDL3.

Regards,
Chris
Chris Marshall
2014-01-19 23:35:31 UTC
Permalink
Hi Dima-

I think I got the pull-request stuff to work. In addition to the
tests needing to be updated, I couldn't figure out how to get
the PDL::FFTW3::get_data_alignment to be exported to
PDL as a method. For PDL3, this sort of functionality should
be in the core support. Having it here is to enable PDL::FFTW3
to work with older PDL versions.

--Chris
Post by Chris Marshall
Hi Dima-
I couldn't figure out the workflow to issuing a pull request at
github.com so here is a patch generated to add support for arbitrary
alignment for the input to PDL::FFTW3 routines. The new
implementation appears to work but the tests for whether or not a new
plan was made now fail due to the modified logic for plan generation.
I've added a routine to PDL::FFTW3 called get_data_alignment() that
could be used to generalize the tests.
After working through a couple of iterations on this code, I can see
why you weren't a fan of handling the alignment stuff directly. With
this fix, we should be able to support existing PDL versions until we
eventually get to the PDL3 work. I'm still trying to wrap up the
final fixes for the 64bit index support for PDL-2.008. We definitely
need to add better support for alignment issues in PDL3.
Regards,
Chris
Dima Kogan
2014-01-19 23:37:47 UTC
Permalink
Hi.

I see the patch, and I'll look at it tonight.

thanks
Dima Kogan
2014-01-20 10:34:55 UTC
Permalink
Post by Chris Marshall
After working through a couple of iterations on this code, I can see
why you weren't a fan of handling the alignment stuff directly. With
this fix, we should be able to support existing PDL versions until we
eventually get to the PDL3 work. I'm still trying to wrap up the
final fixes for the 64bit index support for PDL-2.008. We definitely
need to add better support for alignment issues in PDL3.
Hi.

OK. Wow, that was a pain. Thank you very much for the patch, Chris. It
was exactly what I was planning on doing, once I'd get around to it. It
turns out that more was needed, however. Without more stuff, lots of
tests still fail on x86. I resolved most of the issues, and things look
much better now, although still not perfect. Some questionable
"features" of the current system:

- I have a mostly-finished-but-not-yet-complete set of tests to exercise
various alignments. These aren't done, and appear in a disabled state
at the end of the .t file. They are done enough to discover that for
particularly poorly-aligned data the FFTW3 planner crashes. On amd64,
with double-precisiou data this happens with alignment 2 or worse.
Because of this I'm wary of turning on these tests, since I don't want
the test suite to crash on some other platform. I'm relatively
comfortable that this isn't something you'll ever see when actually
using the module, since you get memory from malloc() ultimately, and
it's never going to give you anything that badly aligned. So I'm
tempted to just leave the tests disabled; don't know.

- The test suite was checking for plan creations, since only the input
size/type was dictating when new plans would be made. With variable
alignment, there are other factors, so those checks are now wrong much
of the time. I disabled them.

- Current architecture supports running multiple FFTs in parallel with
PDL threading. I however create a single plan for ALL the threads. It
thus should be possible to have a situation where some part of the
thread is aligned and some other part is not, thus making the computer
plan incorrect for some of the threads. I constructed a test case to
tickle this issue, but for some reason I do not see a failure. So the
current code tests for a known problem, but the known problem does not
appear to be a problem. This really needs more investigation before a
"release" can happen, whatever that means.

I'm done looking at this for today, but the code is pushed up and feel
free to have a look.

dima
Chris Marshall
2014-01-20 14:23:55 UTC
Permalink
Post by Dima Kogan
Post by Chris Marshall
After working through a couple of iterations on this code,
I can see why you weren't a fan of handling the alignment
stuff directly. With this fix, we should be able to support
existing PDL versions until we eventually get to the PDL3
work. I'm still trying to wrap up the final fixes for the
64bit index support for PDL-2.008. We definitely need to add
better support for alignment issues in PDL3.
Hi.
OK. Wow, that was a pain. Thank you very much for the patch,
Chris. It was exactly what I was planning on doing, once I'd
get around to it. It turns out that more was needed, however.
Without more stuff, lots of tests still fail on x86. I resolved
most of the issues, and things look much better now, although
still not perfect. Some questionable "features" of the current
- I have a mostly-finished-but-not-yet-complete set of tests to
exercise various alignments. These aren't done, and appear
in a disabled state at the end of the .t file. They are done
enough to discover that for particularly poorly-aligned data
the FFTW3 planner crashes. On amd64, with double-precisiou
data this happens with alignment 2 or worse. Because of
this I'm wary of turning on these tests, since I don't
want the test suite to crash on some other platform.
I'm relatively comfortable that this isn't something you'll
ever see when actually using the module, since you get memory
from malloc() ultimately, and it's never going to give you
anything that badly aligned. So I'm tempted to just leave the
tests disabled; don't know.
Leave them disabled. Since most malloc operations will
align to the basic type (at least), the extra testing
is effort that would be best spent on better type and
alignment in the PDL3 work. If testing shows a problem,
it should be possible to implement a re-alignment warning
or error to cover those cases.
Post by Dima Kogan
- The test suite was checking for plan creations, since only
the input size/type was dictating when new plans would be
made. With variable alignment, there are other factors, so
those checks are now wrong much of the time. I disabled them.
Concur.
Post by Dima Kogan
- Current architecture supports running multiple FFTs in
parallel with PDL threading. I however create a single plan
for ALL the threads. It thus should be possible to have a
situation where some part of the thread is aligned and some
other part is not, thus making the computer plan incorrect
for some of the threads. I constructed a test case to tickle
this issue, but for some reason I do not see a failure. So
the current code tests for a known problem, but the known
problem does not appear to be a problem. This really needs
more investigation before a "release" can happen, whatever
that means.
If the entire data block of a pdl is aligned properly
(according to what the plan expects), then as long as
the size of the FFT dimension is a multiple of 2 for
complex float data, then all loop computations should
be aligned. For the case of a real float or complex
FFT the requirement would be a multiple of 4 or a
multiple of 2, respectively.

Documenting this would be sufficient and we could check
alignment for each FFT if needed but I think it would
be simpler to enforce the dimension size constraints
and say, for fully general operation, use double complex
FFTs.

Otherwise, I think we should push a CPAN developers
release of PDL::FFTW3 with Alien::FFTW3 listed as
an optional requirement for testing purposes. My
guess is that we may be good for a general release
assuming nothing weird pops up from the testing.

The Alien::FFTW3 stuff needs to be addressed but I
would like to get the 64bit index support fix in
first and I've spend more time than I had planned
on FFTW3. :-)

Cheers,
Chris
Post by Dima Kogan
I'm done looking at this for today, but the code is pushed up
and feel free to have a look.
dima
Dima Kogan
2014-01-20 20:34:45 UTC
Permalink
Post by Chris Marshall
Post by Dima Kogan
- Current architecture supports running multiple FFTs in
parallel with PDL threading. I however create a single plan
for ALL the threads. It thus should be possible to have a
situation where some part of the thread is aligned and some
other part is not, thus making the computer plan incorrect
for some of the threads. I constructed a test case to tickle
this issue, but for some reason I do not see a failure. So
the current code tests for a known problem, but the known
problem does not appear to be a problem. This really needs
more investigation before a "release" can happen, whatever
that means.
If the entire data block of a pdl is aligned properly
(according to what the plan expects), then as long as
the size of the FFT dimension is a multiple of 2 for
complex float data, then all loop computations should
be aligned. For the case of a real float or complex
FFT the requirement would be a multiple of 4 or a
multiple of 2, respectively.
The test case in the suite is a single-precision, odd-size real
transform (length 5, I think). So, if the block starts at 0, the second
transform starts at 0x14. This SHOULD fail, since the plan will assume
very aligned data. I'll look at it again in a bit.

Loading...