Arel is a very popular ruby gem that is one of the major components of the most popular ruby frameworks, Rails. It is an ORM-helper of sorts that exposes a beautiful and intuitive syntax for creating SQL strings by chaining Ruby methods.
In daru, we have successfully adopted this syntax and the result is a very intuitive and readable syntax for obtaining any sort of data from a DataFrame or Vector.
Lets see how this syntax works with some examples:
require 'daru'
To use this syntax we call any of the comparator methods defined on Daru::Vector
and pass the results from these to the #where
method available for both DataFrame
and Vector
.
To demonstrate with a quick example:
vector = Daru::Vector.new([2,4,5,51,5,16,2,5,3,2,1,5,2,5,2,1,56,234,6,21])
vector.where((vector.eq(5) | vector.eq(1)) & vector.mt(2))
Note that we use union OR (|
) and union AND (&
) and not logical OR (||
) or logical AND (&&
)
Thus the results returned by the Vector#eq
method are evaluated by Vector#where
and the generated Vector is returned. The index is also preserved.
The where
clause can also be used with DataFrame, with similar results.
df = Daru::DataFrame.new({
a: [1,2,3,4,5,6]*100,
b: ['a','b','c','d','e','f']*100,
c: [11,22,33,44,55,66]*100
}, index: (1..600).to_a.shuffle)
df.where(df[:a].eq(2) | df[:c].eq(55))
The comparator methods on Vector return an object of type Daru::Core::Query::BoolArray
which lets us perform OR and AND operations on it. See this blog post for more information on BoolArray and other comparator methods.
As a convenience for readability, you can also you the #and
or #or
methods instead of #&
and #|
.
vector.where(vector.eq(2).or(vector.eq(5)).and(vector.mt(2)))
One of the major advantages of using the where
clause over other more robust methods like DataFrame#filter
or Vector#keep_if
is that it is much faster (though not destructive). These benchmarks prove my point.
Daru::DataFrame offers the #join
method for performing SQL style joins between two DataFrames. Currently #join supports inner, left outer, right outer and full outer joins between DataFrames.
To demonstrate:
left = Daru::DataFrame.new({
:id => [1,2,3,4],
:name => ['Pirate', 'Monkey', 'Ninja', 'Spaghetti']
})
right = Daru::DataFrame.new({
:id => [1,2,3,4],
:name => ['Rutabaga', 'Pirate', 'Darth Vader', 'Ninja']
})
nil
To perform an inner join on the :name
column:
left.join(right, on: [:name], how: :inner)
An outer left join can be done with:
left.join(right, on: [:name], how: :left)
An outer right join can be done like so:
left.join(right, on: [:name], how: :right)
And finally, a full outer join:
left.join(right, on: [:name], how: :outer)