Discussion:
[Pdl-porters] PDL::Tiny --- what should be in it?
Chris Marshall
2014-12-14 16:31:30 UTC
Permalink
To support POGL2 development (updating Perl OpenGL bindings to APIs 3.x,
4.x, and the ES variants) and as a start at the PDL3 core implementation,
I'm preparing a PDL::Tiny module and am looking for your input on what you
think should or should not be in it. Here are my general thoughts so far:

- The basic PDL::Tiny object starts with Moo
- This allows full meta-object programming via Moose
- Interoperable with state of the art perl OO programming
- KISS principle is satisfied
- Additional capabilities would be added via Roles
- Data allocation
- Data types
- Computation support
- Threading/vectorization
- PDL::Tiny should interoperate with PDL-2.x
- Using PDL::Objects support
- Allows PDL-2.x and PDL3 options
- Perhaps a pure-perl implementation


This will give a concrete platform with which to develop PDL3 concepts. In
addition, I plan to set up a github project for this effort so I can come
up to speed with that platform and to encourage rapid development.

I welcome your thoughts and suggestions

Regards,
Chris (with my PDL3 and POGL2 hats on)
Ed .
2014-12-14 20:19:54 UTC
Permalink
I believe you’ll want:
* a nice simple build system (possibly based on the upcoming EUMM XSMULTI change)
* to release early and often
* to master the art of git rebasing branches rather than merging

From: Chris Marshall
Sent: Sunday, December 14, 2014 4:31 PM
To: pdl-porters ; ***@jach.hawaii.edu ; Chris Marshall
Subject: [Pdl-porters] PDL::Tiny --- what should be in it?

To support POGL2 development (updating Perl OpenGL bindings to APIs 3.x, 4.x, and the ES variants) and as a start at the PDL3 core implementation, I'm preparing a PDL::Tiny module and am looking for your input on what you think should or should not be in it. Here are my general thoughts so far:

a.. The basic PDL::Tiny object starts with Moo
a.. This allows full meta-object programming via Moose
b.. Interoperable with state of the art perl OO programming
c.. KISS principle is satisfied
b.. Additional capabilities would be added via Roles
a.. Data allocation
b.. Data types
c.. Computation support
d.. Threading/vectorization
c.. PDL::Tiny should interoperate with PDL-2.x
a.. Using PDL::Objects support
b.. Allows PDL-2.x and PDL3 options
c.. Perhaps a pure-perl implementation

This will give a concrete platform with which to develop PDL3 concepts. In addition, I plan to set up a github project for this effort so I can come up to speed with that platform and to encourage rapid development.


I welcome your thoughts and suggestions


Regards,
Chris (with my PDL3 and POGL2 hats on)



--------------------------------------------------------------------------------
Chris Marshall
2014-12-15 16:35:15 UTC
Permalink
Good ideas. Starting small should help with the build system
design. I'm glad you'll be helping out there. Going to github
specifically for rapid turn around and implementation. I'll read
up on the merge/rebase discussion which looks interesting.
I can tell I have things to learn there.

--Chris
Post by Ed .
* a nice simple build system (possibly based on the upcoming EUMM XSMULTI change)
* to release early and often
* to master the art of git rebasing branches rather than merging
David Mertens
2014-12-14 21:22:07 UTC
Permalink
Hey Chris,

What exactly is the aim of this project? Is this a 90% reimplementation of
PDL? If so, I would really like to have a well thought-out C API, so that I
can easily create new PDLs from my C or C-like code. I would also really
like to be able to call PDL functions from C.

Whether those belong in a Tiny module I cannot say. It depends on what
you're trying to make tiny. :-)

David
Post by Chris Marshall
To support POGL2 development (updating Perl OpenGL bindings to APIs 3.x,
4.x, and the ES variants) and as a start at the PDL3 core implementation,
I'm preparing a PDL::Tiny module and am looking for your input on what you
- The basic PDL::Tiny object starts with Moo
- This allows full meta-object programming via Moose
- Interoperable with state of the art perl OO programming
- KISS principle is satisfied
- Additional capabilities would be added via Roles
- Data allocation
- Data types
- Computation support
- Threading/vectorization
- PDL::Tiny should interoperate with PDL-2.x
- Using PDL::Objects support
- Allows PDL-2.x and PDL3 options
- Perhaps a pure-perl implementation
This will give a concrete platform with which to develop PDL3 concepts.
In addition, I plan to set up a github project for this effort so I can
come up to speed with that platform and to encourage rapid development.
I welcome your thoughts and suggestions
Regards,
Chris (with my PDL3 and POGL2 hats on)
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Chris Marshall
2014-12-15 15:54:57 UTC
Permalink
PDL::Tiny would essentially provide the core types, computation, and
framework for the next-gen PDL3 implementation. Since it is focused on the
core of what and how PDL would or should work, it gives us a laboratory to
quickly implement and refine these ideas. Here are my thoughts on the
initial development:

- Use github.com for faster code development rather than stability
- Start with perl-level Moo OO structure for PDL
- Roles and architecture are critical for new PDL3
- We could even begin with perl-only implementations
- Types, Arrays, Units, ...
- Can work on JIT-PP code generation
- What are the key dimensions of PDL computation?
- Indexing, threadloops,...
- Should be upgradable to full PDL capabilities
- PDL-2.x via has-a support?
- PDL3 as it is implementation
- Other options for KISS and lightweight compute
- What about testing and backward compatibility
- Test against PDL-2.x t/ possible?
- Options for test driven development
- Performance evaluation/metrics
- It should be possible to evaluate options for C-OO and C-PDL
- Enlightenment Object model looks promising
- Want PDL from perl and PDL from C to be equivalent
- Would enable multi-threaded and parallel/GPU compute

Exactly how tiny is possible or desired could be a result of the
development.

--Chris
Post by David Mertens
Hey Chris,
What exactly is the aim of this project? Is this a 90% reimplementation of
PDL? If so, I would really like to have a well thought-out C API, so that I
can easily create new PDLs from my C or C-like code. I would also really
like to be able to call PDL functions from C.
Whether those belong in a Tiny module I cannot say. It depends on what
you're trying to make tiny. :-)
David
Post by Chris Marshall
To support POGL2 development (updating Perl OpenGL bindings to APIs 3.x,
4.x, and the ES variants) and as a start at the PDL3 core implementation,
I'm preparing a PDL::Tiny module and am looking for your input on what you
- The basic PDL::Tiny object starts with Moo
- This allows full meta-object programming via Moose
- Interoperable with state of the art perl OO programming
- KISS principle is satisfied
- Additional capabilities would be added via Roles
- Data allocation
- Data types
- Computation support
- Threading/vectorization
- PDL::Tiny should interoperate with PDL-2.x
- Using PDL::Objects support
- Allows PDL-2.x and PDL3 options
- Perhaps a pure-perl implementation
This will give a concrete platform with which to develop PDL3 concepts.
In addition, I plan to set up a github project for this effort so I can
come up to speed with that platform and to encourage rapid development.
I welcome your thoughts and suggestions
Regards,
Chris (with my PDL3 and POGL2 hats on)
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Zakariyya Mughal
2014-12-15 04:56:28 UTC
Permalink
Hi everyone,

The following are some suggestions. I'd love to help work on them. Sorry for
the length.

- Zaki Mughal

---

I announced adding data frames for PDL several months back
<http://comments.gmane.org/gmane.comp.lang.perl.pdl.general/8335> and my
intention to embed R in Perl. Embedding R in Perl is actually complete now and
just about ready for CPAN <https://github.com/zmughal/embedding-r-in-perl-experiment>
thanks to the help of the gang on #inline <http://inline.ouistreet.com/node/zfp7.html>.

In order to build the data frames and match R types, I created several
subclasses of PDL that handle a subset of PDL functions, but I haven't figured
out a way to wrap all of PDL's functionality systematically. I have several
thoughts on this.

## Levels of measurement

When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on a certain
type of data. One such type of data is categorical data, which R calls
factors and for which I made a subclass of PDL called PDL::Factor. Some of
this behvaviour is inspired by the statistical methodology of levels of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I believe
SAS even explicitly allows assigning levels of measurment to variables.

For example, if I try to apply the mean() function on all the columns of the
Iris data set, I get this warning:

```r
lapply( iris, mean )
#> $Sepal.Length
#> [1] 5.843333
#>
#> $Sepal.Width
#> [1] 3.057333
#>
#> $Petal.Length
#> [1] 3.758
#>
#> $Petal.Width
#> [1] 1.199333
#>
#> $Species
#> [1] NA
#>
#> Warning message:
#> In mean.default(X[[5L]], ...) :
#> argument is not numeric or logical: returning NA
```

`NA` is R's equivalent of `BAD` values. For `mean()` this makes sense for
categorical data. For logical vectors, it does something else:

```r
which_setosa <- iris$Species == 'setosa' # this is a logical
mean( which_setosa )
#> [1] 0.3333333
```

This means 1/3 of the logical data was true which may be useful for `mean()`
to return in that case.

Thinking in terms of levels of measurement can help with another experiment
I'm doing which based around tracking the units of measure used for numerical
things in Perl. Code is here <https://github.com/zmughal/units-experiment/blob/master/overload_override.pl>.

What I do there is use Moo roles to add a unit attribute to numerical types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go through an
operation by either operator overloading or calling a function such as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees Celsius). I
know that units of measure are messy to implement, but being able to support
auxiliary operations like this will go a long way to making PDL flexible.

[Has anyone used udunits2? I made an Alien package for it. It's on CPAN.]

## DataShape and Blaze

I think it would be beneficial to look at the work being done by the Blaze
project <http://blaze.pydata.org/> with its DataShape specification
<http://datashape.pydata.org/>. The idea behind it is to be able to use the
various array-like APIs without having to worry what is going on in the
backend — be it with a CPU-based, GPU-based, SciDB, or even a SQL server.

## Julia

Julia has been doing some amazing things with how they've grown out their
language. I was looking to see if they have anything similar to the dataflow
in PDL and I came across ArrayViews <https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto already
existing n-d arrays as opposed to how PDL does it.

I do not know what tradeoffs that brings, but it is a starting point to think
about. I think similar approaches can be made to support sparse arrays.

In fact, one of Julia's strengths is how they use multimethods to handle new
types with ease. See "The Design Impact of Multiple Dispatch" <http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
for examples. [Perl 6 has built-in multimethods]

## MATLAB subclassing

I use MATLAB daily. I came across this area of the documentation that talks
about how to subclass. <http://www.mathworks.com/help/matlab/matlab_oop/subclassing-matlab-built-in-classes.html>

Some of the information in there is good for knowing how *not* to implement
things, but there is also some discussion on what is necessary for the
storage types that might be worth looking at.

[By the way, I have downloaded all of MATLAB File Central's code and I could do
some analysis on the functions used there if that would be helpful.]

## GPU and threading

I think it would be best to offload GPU support to other libraries, so it
would be good to extract what is common between libraries like

- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.

Eigen is interesting in particular because it has support for storing in both
row-major and column-major data <http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.

Another source of inspiration would be the VSIPL spec <http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the embedded DSP world
and comes with "Core" and "Core Lite" profiles which might help decide what
should be included in a smaller subset of PDL.

Also in my wishlist is interoperability with libraries like ITK <http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They have
interesting architectures especially for computation. Unfortunately, the
first two are C++ based and I don't have experience with combining C++ and XS.

## Better testing

PDL should make more guarantees about how types flow through the system. This
might be accomplished by adding assertions in the style of Design-by-Contract
which can act as both a testable spec and documentation. I'm working on the
test suite right now on a branch and I hope to create a proof-of-concept of
this soon.

I hope that this can help make PDL more consistent and easily testable. There
are still small inconsistencies that shouldn't be there which can be weeded out
with testing. For example, what type is expected for this code? :

```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```

I would expect 'float', but it is actually 'double' under PDL v2.007_04.

## Incremental computation

I find that the way I grow my code is to slowly add modules that work
together in a pipeline. Running and rerunning this code through all the
modules is slow. To avoid that, I create multiple small programs that read
and write files to pass from one script to the next. I was looking for a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>. It
modifies the Python interpreter to support automatic persistent memoization.
I don't think the idea has caught on, but I think it should and perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Chris Marshall
2014-12-15 16:32:06 UTC
Permalink
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on a certain
type of data. One such type of data is categorical data, which R calls
factors and for which I made a subclass of PDL called PDL::Factor. Some of
this behvaviour is inspired by the statistical methodology of levels of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I believe
SAS even explicitly allows assigning levels of measurment to variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes sense for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another experiment
I'm doing which based around tracking the units of measure used for numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to numerical types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go through an
operation by either operator overloading or calling a function such as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees Celsius). I
know that units of measure are messy to implement, but being able to support
auxiliary operations like this will go a long way to making PDL flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's on CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by the Blaze
project <http://blaze.pydata.org/> with its DataShape specification
<http://datashape.pydata.org/>. The idea behind it is to be able to use the
various array-like APIs without having to worry what is going on in the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL server.
## Julia
Julia has been doing some amazing things with how they've grown out their
language. I was looking to see if they have anything similar to the dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting point to think
about. I think similar approaches can be made to support sparse arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other libraries, so it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for storing in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.

We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the embedded DSP world
and comes with "Core" and "Core Lite" profiles which might help decide what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They have
interesting architectures especially for computation. Unfortunately, the
first two are C++ based and I don't have experience with combining C++ and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm working on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily testable. There
are still small inconsistencies that shouldn't be there which can be weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that work
together in a pipeline. Running and rerunning this code through all the
modules is slow. To avoid that, I create multiple small programs that read
and write files to pass from one script to the next. I was looking for a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>. It
modifies the Python interpreter to support automatic persistent memoization.
I don't think the idea has caught on, but I think it should and perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.

Thanks for the thoughts!
Chris
David Mertens
2014-12-15 17:45:46 UTC
Permalink
FWIW, it looks like Julia views are like affine slices in PDL. As I have
said before, almost nothing out there has the equivalent of non-contiguous,
non-strided support like we get with which, where, and their ilk. GSL
vectors do not, either. Matlab only supports it as a temporary object, and
eliminates it after the line has executed. Not sure about Numpy here.

David
Post by Zakariyya Mughal
On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on a
certain
type of data. One such type of data is categorical data, which R calls
factors and for which I made a subclass of PDL called PDL::Factor.
Some of
this behvaviour is inspired by the statistical methodology of levels of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I
believe
SAS even explicitly allows assigning levels of measurment to variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another
experiment
I'm doing which based around tracking the units of measure used for
numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to numerical
types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
operation by either operator overloading or calling a function such as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
know that units of measure are messy to implement, but being able to
support
auxiliary operations like this will go a long way to making PDL
flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by the
Blaze
project <http://blaze.pydata.org/> with its DataShape specification
<http://datashape.pydata.org/>. The idea behind it is to be able to
use the
various array-like APIs without having to worry what is going on in the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL
server.
## Julia
Julia has been doing some amazing things with how they've grown out
their
language. I was looking to see if they have anything similar to the
dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto
already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting point
to think
about. I think similar approaches can be made to support sparse arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to
handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other libraries, so
it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for storing
in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the embedded
DSP world
and comes with "Core" and "Core Lite" profiles which might help decide
what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They have
interesting architectures especially for computation. Unfortunately,
the
first two are C++ based and I don't have experience with combining C++
and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the
system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm working
on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily
testable. There
are still small inconsistencies that shouldn't be there which can be
weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL
v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that work
together in a pipeline. Running and rerunning this code through all the
modules is slow. To avoid that, I create multiple small programs that
read
and write files to pass from one script to the next. I was looking for
a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
It
modifies the Python interpreter to support automatic persistent
memoization.
I don't think the idea has caught on, but I think it should and
perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.
Thanks for the thoughts!
Chris
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
David Mertens
2014-12-15 19:00:24 UTC
Permalink
Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
recently came across a great explanation here:
http://igoro.com/archive/gallery-of-processor-cache-effects/

One of the files in the Perl interpreter's core code is called pp_hot.c.
According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.

David
Post by David Mertens
FWIW, it looks like Julia views are like affine slices in PDL. As I have
said before, almost nothing out there has the equivalent of non-contiguous,
non-strided support like we get with which, where, and their ilk. GSL
vectors do not, either. Matlab only supports it as a temporary object, and
eliminates it after the line has executed. Not sure about Numpy here.
David
Post by Zakariyya Mughal
On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on a
certain
type of data. One such type of data is categorical data, which R calls
factors and for which I made a subclass of PDL called PDL::Factor.
Some of
this behvaviour is inspired by the statistical methodology of levels
of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I
believe
SAS even explicitly allows assigning levels of measurment to
variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another
experiment
I'm doing which based around tracking the units of measure used for
numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to numerical
types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
operation by either operator overloading or calling a function such as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
know that units of measure are messy to implement, but being able to
support
auxiliary operations like this will go a long way to making PDL
flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by the
Blaze
project <http://blaze.pydata.org/> with its DataShape specification
<http://datashape.pydata.org/>. The idea behind it is to be able to
use the
various array-like APIs without having to worry what is going on in
the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL
server.
## Julia
Julia has been doing some amazing things with how they've grown out
their
language. I was looking to see if they have anything similar to the
dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto
already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting point
to think
about. I think similar approaches can be made to support sparse
arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to
handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<
http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other libraries,
so it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for storing
in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the embedded
DSP world
and comes with "Core" and "Core Lite" profiles which might help
decide what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They have
interesting architectures especially for computation. Unfortunately,
the
first two are C++ based and I don't have experience with combining
C++ and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the
system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm working
on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily
testable. There
are still small inconsistencies that shouldn't be there which can be
weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL
v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that work
together in a pipeline. Running and rerunning this code through all
the
modules is slow. To avoid that, I create multiple small programs that
read
and write files to pass from one script to the next. I was looking
for a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
It
modifies the Python interpreter to support automatic persistent
memoization.
I don't think the idea has caught on, but I think it should and
perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.
Thanks for the thoughts!
Chris
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Chris Marshall
2014-12-15 19:45:30 UTC
Permalink
Agreed.

The need to avoid cache-busting code and poor performance is one motivation
for JIT compiling to avoid memory sloshing fromfunction
pointer->pointer->pointers. Implementing benchmarks andperformance metrics
alongside the new development will beessential to avoiding unnecessary
performance bottlenecks and to determine the right level to compute at...

--Chris
Post by David Mertens
Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
http://igoro.com/archive/gallery-of-processor-cache-effects/
One of the files in the Perl interpreter's core code is called pp_hot.c.
According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.
David
Post by David Mertens
FWIW, it looks like Julia views are like affine slices in PDL. As I have
said before, almost nothing out there has the equivalent of non-contiguous,
non-strided support like we get with which, where, and their ilk. GSL
vectors do not, either. Matlab only supports it as a temporary object, and
eliminates it after the line has executed. Not sure about Numpy here.
David
Post by Zakariyya Mughal
On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on a
certain
type of data. One such type of data is categorical data, which R
calls
factors and for which I made a subclass of PDL called PDL::Factor.
Some of
this behvaviour is inspired by the statistical methodology of levels
of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>. I
believe
SAS even explicitly allows assigning levels of measurment to
variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes
sense for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another
experiment
I'm doing which based around tracking the units of measure used for
numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to
numerical types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
operation by either operator overloading or calling a function such
as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
know that units of measure are messy to implement, but being able to
support
auxiliary operations like this will go a long way to making PDL
flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by the
Blaze
project <http://blaze.pydata.org/> with its DataShape specification
<http://datashape.pydata.org/>. The idea behind it is to be able to
use the
various array-like APIs without having to worry what is going on in
the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL
server.
## Julia
Julia has been doing some amazing things with how they've grown out
their
language. I was looking to see if they have anything similar to the
dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto
already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting point
to think
about. I think similar approaches can be made to support sparse
arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to
handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<
http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other libraries,
so it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for
storing in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the embedded
DSP world
and comes with "Core" and "Core Lite" profiles which might help
decide what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They
have
interesting architectures especially for computation. Unfortunately,
the
first two are C++ based and I don't have experience with combining
C++ and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the
system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm working
on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily
testable. There
are still small inconsistencies that shouldn't be there which can be
weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL
v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that work
together in a pipeline. Running and rerunning this code through all
the
modules is slow. To avoid that, I create multiple small programs
that read
and write files to pass from one script to the next. I was looking
for a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
It
modifies the Python interpreter to support automatic persistent
memoization.
I don't think the idea has caught on, but I think it should and
perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.
Thanks for the thoughts!
Chris
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
David Mertens
2015-01-09 01:41:25 UTC
Permalink
Hey Chris, porters,

I was thinking again about this project. One thing that occurs to me is
that p5mop-redux, Stevan Little's attempt at creating something like Moose
that could be pushed into core Perl, has been stalled for many months. I am
not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
has a C object API at all. I wonder if we might consider stepping in an
lending a hand to help build a C object API, which would serve as the
foundation for the mop.

If we had a solid C object system with the high potential of getting pushed
into the core, we would be in excellent shape to create the next generation
of PDL.

Thoughts?
David
Post by Chris Marshall
Agreed.
The need to avoid cache-busting code and poor performance is one
motivation for JIT compiling to avoid memory sloshing fromfunction
pointer->pointer->pointers. Implementing benchmarks andperformance metrics
alongside the new development will beessential to avoiding unnecessary
performance bottlenecks and to determine the right level to compute at...
--Chris
Post by David Mertens
Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
http://igoro.com/archive/gallery-of-processor-cache-effects/
One of the files in the Perl interpreter's core code is called pp_hot.c.
According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.
David
Post by David Mertens
FWIW, it looks like Julia views are like affine slices in PDL. As I have
said before, almost nothing out there has the equivalent of non-contiguous,
non-strided support like we get with which, where, and their ilk. GSL
vectors do not, either. Matlab only supports it as a temporary object, and
eliminates it after the line has executed. Not sure about Numpy here.
David
Post by Zakariyya Mughal
On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on a
certain
type of data. One such type of data is categorical data, which R
calls
factors and for which I made a subclass of PDL called PDL::Factor.
Some of
this behvaviour is inspired by the statistical methodology of
levels of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>.
I believe
SAS even explicitly allows assigning levels of measurment to
variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes
sense for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another
experiment
I'm doing which based around tracking the units of measure used for
numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to
numerical types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
operation by either operator overloading or calling a function such
as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
know that units of measure are messy to implement, but being able
to support
auxiliary operations like this will go a long way to making PDL
flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by
the Blaze
project <http://blaze.pydata.org/> with its DataShape specification
<http://datashape.pydata.org/>. The idea behind it is to be able
to use the
various array-like APIs without having to worry what is going on in
the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL
server.
## Julia
Julia has been doing some amazing things with how they've grown out
their
language. I was looking to see if they have anything similar to the
dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto
already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting
point to think
about. I think similar approaches can be made to support sparse
arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to
handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<
http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other libraries,
so it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for
storing in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the embedded
DSP world
and comes with "Core" and "Core Lite" profiles which might help
decide what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They
have
interesting architectures especially for computation.
Unfortunately, the
first two are C++ based and I don't have experience with combining
C++ and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the
system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm
working on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily
testable. There
are still small inconsistencies that shouldn't be there which can
be weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL
v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that
work
together in a pipeline. Running and rerunning this code through all
the
modules is slow. To avoid that, I create multiple small programs
that read
and write files to pass from one script to the next. I was looking
for a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
It
modifies the Python interpreter to support automatic persistent
memoization.
I don't think the idea has caught on, but I think it should and
perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.
Thanks for the thoughts!
Chris
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
David Mertens
2015-01-09 01:43:41 UTC
Permalink
Link: https://github.com/stevan/p5-mop-redux
Post by David Mertens
Hey Chris, porters,
I was thinking again about this project. One thing that occurs to me is
that p5mop-redux, Stevan Little's attempt at creating something like Moose
that could be pushed into core Perl, has been stalled for many months. I am
not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
has a C object API at all. I wonder if we might consider stepping in an
lending a hand to help build a C object API, which would serve as the
foundation for the mop.
If we had a solid C object system with the high potential of getting
pushed into the core, we would be in excellent shape to create the next
generation of PDL.
Thoughts?
David
Post by Chris Marshall
Agreed.
The need to avoid cache-busting code and poor performance is one
motivation for JIT compiling to avoid memory sloshing fromfunction
pointer->pointer->pointers. Implementing benchmarks andperformance metrics
alongside the new development will beessential to avoiding unnecessary
performance bottlenecks and to determine the right level to compute at...
--Chris
Post by David Mertens
Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
http://igoro.com/archive/gallery-of-processor-cache-effects/
One of the files in the Perl interpreter's core code is called pp_hot.c.
According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.
David
On Mon, Dec 15, 2014 at 12:45 PM, David Mertens <
Post by David Mertens
FWIW, it looks like Julia views are like affine slices in PDL. As I
have said before, almost nothing out there has the equivalent of
non-contiguous, non-strided support like we get with which, where, and
their ilk. GSL vectors do not, either. Matlab only supports it as a
temporary object, and eliminates it after the line has executed. Not sure
about Numpy here.
David
On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall <
Post by Zakariyya Mughal
On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on
a certain
type of data. One such type of data is categorical data, which R
calls
factors and for which I made a subclass of PDL called PDL::Factor.
Some of
this behvaviour is inspired by the statistical methodology of
levels of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>.
I believe
SAS even explicitly allows assigning levels of measurment to
variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes
sense for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another
experiment
I'm doing which based around tracking the units of measure used
for numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to
numerical types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
operation by either operator overloading or calling a function
such as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
know that units of measure are messy to implement, but being able
to support
auxiliary operations like this will go a long way to making PDL
flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by
the Blaze
project <http://blaze.pydata.org/> with its DataShape
specification
<http://datashape.pydata.org/>. The idea behind it is to be able
to use the
various array-like APIs without having to worry what is going on
in the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL
server.
## Julia
Julia has been doing some amazing things with how they've grown
out their
language. I was looking to see if they have anything similar to
the dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto
already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting
point to think
about. I think similar approaches can be made to support sparse
arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to
handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<
http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22>
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other
libraries, so it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for
storing in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the
embedded DSP world
and comes with "Core" and "Core Lite" profiles which might help
decide what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They
have
interesting architectures especially for computation.
Unfortunately, the
first two are C++ based and I don't have experience with combining
C++ and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the
system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm
working on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily
testable. There
are still small inconsistencies that shouldn't be there which can
be weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL
v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that
work
together in a pipeline. Running and rerunning this code through
all the
modules is slow. To avoid that, I create multiple small programs
that read
and write files to pass from one script to the next. I was looking
for a
solution and came across IncPy <http://www.pgbovine.net/incpy.html>.
It
modifies the Python interpreter to support automatic persistent
memoization.
I don't think the idea has caught on, but I think it should and
perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.
Thanks for the thoughts!
Chris
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
David Mertens
2015-01-09 01:53:40 UTC
Permalink
Wrong! There's another! https://github.com/stevan/p5-mop-again-seriously-wtf,
which appears to be the most recent (and follows the older
https://github.com/stevan/p5-mop-XS).

And in case you wondered why Stevan Little's work on p5-mop hit a weird
stall:
http://blogs.perl.org/users/stevan_little/2014/05/on-prototyping-in-public-part-duex.html
Post by David Mertens
Link: https://github.com/stevan/p5-mop-redux
Post by David Mertens
Hey Chris, porters,
I was thinking again about this project. One thing that occurs to me is
that p5mop-redux, Stevan Little's attempt at creating something like Moose
that could be pushed into core Perl, has been stalled for many months. I am
not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
has a C object API at all. I wonder if we might consider stepping in an
lending a hand to help build a C object API, which would serve as the
foundation for the mop.
If we had a solid C object system with the high potential of getting
pushed into the core, we would be in excellent shape to create the next
generation of PDL.
Thoughts?
David
Post by Chris Marshall
Agreed.
The need to avoid cache-busting code and poor performance is one
motivation for JIT compiling to avoid memory sloshing fromfunction
pointer->pointer->pointers. Implementing benchmarks andperformance metrics
alongside the new development will beessential to avoiding unnecessary
performance bottlenecks and to determine the right level to compute at...
--Chris
Post by David Mertens
Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
http://igoro.com/archive/gallery-of-processor-cache-effects/
One of the files in the Perl interpreter's core code is called
pp_hot.c. According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.
David
On Mon, Dec 15, 2014 at 12:45 PM, David Mertens <
Post by David Mertens
FWIW, it looks like Julia views are like affine slices in PDL. As I
have said before, almost nothing out there has the equivalent of
non-contiguous, non-strided support like we get with which, where, and
their ilk. GSL vectors do not, either. Matlab only supports it as a
temporary object, and eliminates it after the line has executed. Not sure
about Numpy here.
David
On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall <
Post by Zakariyya Mughal
On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
...snip...
## Levels of measurement
When using R, one of the nice things it does is warn or give
an error when you try to do an operation that would be invalid on
a certain
type of data. One such type of data is categorical data, which R
calls
factors and for which I made a subclass of PDL called
PDL::Factor. Some of
this behvaviour is inspired by the statistical methodology of
levels of
measurement <https://en.wikipedia.org/wiki/Level_of_measurement>.
I believe
SAS even explicitly allows assigning levels of measurment to
variables.
+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement
...snip...
`NA` is R's equivalent of `BAD` values. For `mean()` this makes
sense for
I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...
...snip...
Thinking in terms of levels of measurement can help with another
experiment
I'm doing which based around tracking the units of measure used
for numerical
things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
.
What I do there is use Moo roles to add a unit attribute to
numerical types
(Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
operation by either operator overloading or calling a function
such as
`sum()`, the unit will be carried along with it and be manipulated
appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
know that units of measure are messy to implement, but being able
to support
auxiliary operations like this will go a long way to making PDL
flexible.
Yes! The use of method modifiers offer some powerful development
tools to implement various high level features. I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
[Has anyone used udunits2? I made an Alien package for it. It's
on CPAN.]
## DataShape and Blaze
This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.
I think it would be beneficial to look at the work being done by
the Blaze
project <http://blaze.pydata.org/> with its DataShape
specification
<http://datashape.pydata.org/>. The idea behind it is to be able
to use the
various array-like APIs without having to worry what is going on
in the
backend be it with a CPU-based, GPU-based, SciDB, or even a SQL
server.
## Julia
Julia has been doing some amazing things with how they've grown
out their
language. I was looking to see if they have anything similar to
the dataflow
in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
It may be enlightening to see how they compose this feature onto
already
existing n-d arrays as opposed to how PDL does it.
I do not know what tradeoffs that brings, but it is a starting
point to think
about. I think similar approaches can be made to support sparse
arrays.
Julia views look a lot like what we call slices.
In fact, one of Julia's strengths is how they use multimethods to
handle new
types with ease. See "The Design Impact of Multiple Dispatch"
<
http://nbviewer.ipython.org/gist/StefanKarpinski/b8fe9dbb36c1427b9f22
for examples. [Perl 6 has built-in multimethods]
Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...
## MATLAB subclassing
...snip...
## GPU and threading
I think it would be best to offload GPU support to other
libraries, so it
would be good to extract what is common between libraries like
- MAGMA <http://icl.cs.utk.edu/magma/>,
- ViennaCL <http://viennacl.sourceforge.net/>,
- Blaze-lib <https://code.google.com/p/blaze-lib/>,
- VXL <http://vxl.sourceforge.net/>,
- Spark <http://spark.apache.org/>,
- Torch <http://torch.ch/>,
- Theano <http://www.deeplearning.net/software/theano/>,
- Eigen <http://eigen.tuxfamily.org/>, and
- Armadillo <http://arma.sourceforge.net/>.
Eigen is interesting in particular because it has support for
storing in both
row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.
We would benefit by supporting the commonalities needed to work
with other GPU computation libraries. I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level. We may want our own JIT for performnce.
Another source of inspiration would be the VSIPL spec <
http://www.omgwiki.org/hpec/vsipl>.
It's a standard made for signal processing routines in the
embedded DSP world
and comes with "Core" and "Core Lite" profiles which might help
decide what
should be included in a smaller subset of PDL.
Also in my wishlist is interoperability with libraries like ITK <
http://www.itk.org/>,
VTK <http://www.vtk.org/>, and yt <http://yt-project.org/>. They
have
interesting architectures especially for computation.
Unfortunately, the
first two are C++ based and I don't have experience with
combining C++ and XS.
Thanks for all the references and ideas!
## Better testing
PDL should make more guarantees about how types flow through the
system. This
might be accomplished by adding assertions in the style of
Design-by-Contract
which can act as both a testable spec and documentation. I'm
working on the
test suite right now on a branch and I hope to create a
proof-of-concept of
this soon.
I think starting with the PDL::Tiny core and building out we could
clarify some of these issues.
I hope that this can help make PDL more consistent and easily
testable. There
are still small inconsistencies that shouldn't be there which can
be weeded out
```perl
use PDL;
print stretcher( sequence(float, 3) )->type;
```
I would expect 'float', but it is actually 'double' under PDL
v2.007_04.
This is a bug. One thing that would be nice to have is
a way to trace the dataflow characteristics through the
PDL processing chains...
## Incremental computation
I find that the way I grow my code is to slowly add modules that
work
together in a pipeline. Running and rerunning this code through
all the
modules is slow. To avoid that, I create multiple small programs
that read
and write files to pass from one script to the next. I was
looking for a
solution and came across IncPy <
http://www.pgbovine.net/incpy.html>. It
modifies the Python interpreter to support automatic persistent
memoization.
I don't think the idea has caught on, but I think it should and
perhaps Perl
and PDL is flexible enough to herald it as a CPAN module.
Nice idea for improvement and ease of use. If PDL methods are
implemented compatible with Moo[se] then method modifiers could
be used for this.
Thanks for the thoughts!
Chris
_______________________________________________
Perldl mailing list
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Chris Marshall
2015-01-11 20:36:15 UTC
Permalink
Hi David-

I think if we start trying to get something in the perl5 core we'll
rediscover the pain that Stevan Little found. Right now my thought was
to use the existing perl5 MOP (i.e., Mo[o[se]) to generate the PDL::Tiny
classes and using that information to generate the C object
binding/implementation. I'm looking at the Enlightenment Object model
as a starting point for the C object model to avoid re-inventing the
wheel. One nice thing there is that the EO library can be called from
either C or *real* C++ code so you can have the best of both worlds
without the problem of forcing the use of a specific C++ compiler
everywhere....

--Chris
Post by David Mertens
Hey Chris, porters,
I was thinking again about this project. One thing that occurs to me
is that p5mop-redux, Stevan Little's attempt at creating something
like Moose that could be pushed into core Perl, has been stalled for
many months. I am not sure if p5mop-redux has a very good C API;
indeed, I am not sure if it has a C object API at all. I wonder if we
might consider stepping in an lending a hand to help build a C object
API, which would serve as the foundation for the mop.
If we had a solid C object system with the high potential of getting
pushed into the core, we would be in excellent shape to create the
next generation of PDL.
Thoughts?
David
David Mertens
2015-01-12 01:41:20 UTC
Permalink
Hey Chris,

I think if we try to extract and (minimally) generalize the Prima object
system, we'll give Stevan a highly performant C-based object system upon
which to build p5-mop. If ever there was a time to introduce a minimal C
object system into the Perl core, p5-mop would be it.

As an added bonus, if we get involved in this sort of effort, we can lend
more manpower to the effort, which has usually been a two-man show. This
will increase the likelihood that p5-mop gets fully implemented, and bring
some more awareness to PDL.

But then again, I've spoken about Prima's object system and not
successfully extracted it (yet). Eo is written, a known and tested quantity.

It just strikes me as a profound coincidence that p5-mop still hasn't been
finalized, and we're bandying about the notion of a new C-based object
system for PDL. I could be wrong, but it seems to me that this is a moment
to be seized. Why not merge forces?

David
Post by Chris Marshall
Hi David-
I think if we start trying to get something in the perl5 core we'll
rediscover the pain that Stevan Little found. Right now my thought was to
use the existing perl5 MOP (i.e., Mo[o[se]) to generate the PDL::Tiny
classes and using that information to generate the C object
binding/implementation. I'm looking at the Enlightenment Object model as a
starting point for the C object model to avoid re-inventing the wheel. One
nice thing there is that the EO library can be called from either C or
*real* C++ code so you can have the best of both worlds without the problem
of forcing the use of a specific C++ compiler everywhere....
--Chris
Post by David Mertens
Hey Chris, porters,
I was thinking again about this project. One thing that occurs to me is
that p5mop-redux, Stevan Little's attempt at creating something like Moose
that could be pushed into core Perl, has been stalled for many months. I am
not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
has a C object API at all. I wonder if we might consider stepping in an
lending a hand to help build a C object API, which would serve as the
foundation for the mop.
If we had a solid C object system with the high potential of getting
pushed into the core, we would be in excellent shape to create the next
generation of PDL.
Thoughts?
David
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." -- Brian Kernighan
Loading...