This notebook deals with categorical data support which is now added to Daru. With this Daru can handle categorical data.
require 'daru'
Initialize a vector whose data is categorical by specifying type: :category
dv = Daru::Vector.new [:a, 1, :a, 1, :c], type: :category
dv.frequencies
You can initialize it with some predefined categories even though they do not exist using categories
option.
dv = Daru::Vector.new [:a, 1, :a, 1, :c], type: :category, categories: [:a, :b, :c, 1]
categories option initalizes new categories and also specify the order in which they should occur. So now if you see the frequency table it would be ordered with the order you specified.
dv.frequencies
Since categorical data can be ordered as well as unordered you can specify whether the vector is ordered or not using the ordered: true
or ordered: false
during initialization.
dv = Daru::Vector.new [:a, 1, :a, 1, :c], categories: [:a, :b, :c, 1], ordered: false, type: :category
dv.min
As you can see you can't do the comparision if vector is not ordered. Lets make it ordered.
dv = Daru::Vector.new [:a, 1, :a, 1, :c], ordered: true, categories: [:a, :b, :c, 1], type: :category
dv.min
dv.sort!
Beside during the initialization you can also set the categories after the vector has been initialized.
dv = Daru::Vector.new [:a, 1, :c, 1, :c], type: :category
dv.categories = [:a, :b, :c, 1]
You can also check all the categories associated with the vector.
dv.categories
You can specify if the vector has to be treated as ordered or not after initialization of vector.
Note: By default the vector will be unordered
dv = Daru::Vector.new [:a, 1, :c, 1, :c], type: :category
dv.ordered?
dv.ordered = true
dv.ordered?
Here are a few measures to summarize categorical vector.
dv = Daru::Vector.new [:a, :a, :a, :b, :b, :c], type: :category
dv.summary
Gives the frequency of each category in the order they occur.
dv = Daru::Vector.new ['third']*3 + ['second']*2 + ['first'], type: :category, categories: ['first', 'second', 'third']
dv.frequencies
Note: These operations only apply if the vector is ordered.
dv
dv.min
dv.ordered = true
dv.min
dv.max
dv.sort!
Associates new categories with the vector.
Note: In order to insert a new categorical value you need to use #add_category
to make sure this category is registered in the vector. For example -
dv
dv[0] = 'fourth'
dv.add_category 'fourth'
dv[0] = 'fourth'
dv
dv.categories
You can rename subset of existing categories by passing a hash mapping old ones to new ones.
dv = Daru::Vector.new [1, 2, 'third', 2, 1], type: :category
dv.rename_categories 1 => 'first', 2 => 'second'
dv
Indexing works similar to an ordinary vector, so you can expect these methods to do the same as with ordinary vector. Here are few examples:
dv = Daru::Vector.new [1, 1, 2, 2, 3, 1], index: :a..:f, type: :category
dv[0..2]
dv.at -1
dv.set_at [0, 1], 3
dv
Daru uses Arel-like syntax for querying data.
dv = Daru::Vector.new ['I', 'II', 'I', 'III', 'IV', 'I', 'II'], type: :category, categories: ['I', 'II', 'III', 'IV']
dv.ordered = true
dv.frequencies
dv.where(dv.eq('I'))
dv.where(dv.gt('II'))
df = Daru::DataFrame.new({
a: (1..7).to_a,
b: ('a'..'g').to_a,
c: ['I', 'II', 'I', 'III', 'IV', 'I', 'II']
})
df.c = df.c.to_category
df
df.where(df.c.gt('I') & df.c.lt('IV'))
Categorical data supports 4 types of contrast coding schemes-
dv = Daru::Vector.new ['I', 'II', 'I', 'III', 'IV', 'I', 'II'], type: :category, categories: ['I', 'II', 'III', 'IV']
dv.name = 'Rank'
dv.contrast_code
You can set the base category using #base_category=
dv.base_category = 'IV'
dv.contrast_code
To use any other coding using #coding_scheme
dv.coding_scheme = :deviation
dv.contrast_code