In [1]:
require '~/workspace/daru/lib/daru.rb'
Out[1]:
true

Categorical Vector Visualization

In [2]:
dv = Daru::Vector.new ['III']*10 + ['II']*5 + ['I']*5, type: :category, categories: ['I', 'II', 'III']
dv.type
Out[2]:
:category

Bar graph

1. Frequency (count)

In [3]:
dv.plot(type: :bar) do |p, d|
  p.x_label 'Categories'
  p.y_label 'Frequency'
end

2. Percentage

In [4]:
dv.plot(type: :bar, method: :percentage) do |p, d|
  p.x_label 'Categories'
  p.y_label 'Percentage (%)'
end

3. Fraction

In [5]:
dv.plot(type: :bar, method: :fraction) do |p, d|
  p.x_label 'Categories'
  p.y_label 'Fraction'
end

Categorical data visualization in Dataframe

Bar Graph

In [6]:
df = Daru::DataFrame.new({
  a: [1, 2, 4, -2, 5, 23, 0],
  b: [3, 1, 3, -6, 2, 1, 0],
  c: ['I', 'II', 'I', 'III', 'I', 'III', 'II']
  })
df.to_category :c
df[:c].type
Out[6]:
:category
In [7]:
df.plot(type: :bar, x: :c)

Scatter plot categorized by categorical variable

Plots can be categorized by

  • Color
  • Size
  • Shape
In [8]:
df = Daru::DataFrame.new({
  a: [1, 2, 4, -2, 5, 23, 0],
  b: [3, 1, 3, -6, 2, 1, 0],
  c: ['I', 'II', 'I', 'III', 'I', 'III', 'II']
  })
df.to_category :c
df[:c].type
Out[8]:
:category

Below are few examples

In [9]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :color}) do |p, d|
  p.xrange [-10, 10]
  p.yrange [-10, 10]
end
In [10]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :shape}) do |p, d|
  p.xrange [-10, 10]
  p.yrange [-10, 10]
end

One can also specify custom colors, size and shape. For example:

In [11]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :color, color: [:red, :blue, :green]}) do |p, d|
  p.xrange [-10, 10]
  p.yrange [-10, 10]
end
In [12]:
df.plot(type: :scatter, x: :a, y: :b, categorized: {by: :c, method: :size, size: [300, 600, 900]}) do |p, d|
  p.xrange [-10, 10]
  p.yrange [-10, 10]
end

Line plot categorized by categorical variable

It works similar to Scatter plot above and all options are same except that there's no categorization by size but instead there is categorization by stroke_width in line plots.

In [13]:
df = Daru::DataFrame.new({
  a: [1, 2, 3, 4, 5, 6, 7, 8, 9],
  b: [2, 4, 6, 1, 3, 5, 6, 4, 3],
  c: ['I']*3 + ['II']*3 + ['III']*3
  })
df.to_category :c
df[:c].type
Out[13]:
:category
In [14]:
df.plot type: :line, x: :a, y: :b, categorized: {by: :c, method: :color} do end
In [15]:
df.plot type: :line, x: :a, y: :b, categorized: {by: :c, method: :stroke_width} do |p, d|
  p.xrange [-10, 10]
  p.yrange [-10, 10]  
end
In [ ]: