Pantun-AI

“Teaching machines to read between the lines of Malay verse.”

Services

AI Solutions

Tech

PythonPyTorchMalayBERTTransformersscikit-learnNLP

Pantun-AI
Theme Classifier
Pantun · Input
Pisang emas dibawa belayar,
Masak sebiji di atas peti;
Hutang emas boleh dibayar,
Hutang budi dibawa mati.
Majority Vote
Pantun Budi & Adab
3/3
Unanimous
SVM
97.3%
TextCNN
78.8%
MalayBERT
74.8%
Analyze theme →

Documentation

Pantun-AI interface — pantun input, A-B-A-B anatomy breakdown and three-model consensus

Majority-vote consensus panel — SVM, TextCNN and MalayBERT agreeing 3/3 on Pantun Budi & Adab

MalayBERT transformer analysis with predicted theme, confidence and candidate breakdown

Related pantun recommendations grouped by theme

Ensemble models

~60%

MalayBERT macro F1

Pantun theme classes

3/3

Consensus on clear cases

A Malay pantun theme classifier that runs three very different models — a fine-tuned MalayBERT transformer, a classic TF-IDF + SVM, and a TextCNN — side by side, then resolves their predictions through a majority vote to label the theme of any four-line pantun.

The Challenge

Malay pantun rarely states its meaning outright. The literal "pembayang" (foreshadow) and the figurative "maksud" (intent) often pull in different directions, so keyword matching collapses: a pantun mentioning "kasih" is not necessarily about love. Compounding this, the labelled dataset is small and badly imbalanced — some theme classes have only a few dozen samples — which punishes data-hungry models.

Our Solution

We built an ensemble that plays each model to its strength. MalayBERT (mesolitica/bert-base-standard-bahasa-cased) reads whole-context meaning and figurative intent; the TF-IDF + SVM nails explicit keyword signals like "Tuhan" or "Budi"; TextCNN captures local n-gram patterns. A majority-vote layer surfaces a single consensus theme with per-model confidence, plus a pantun anatomy breakdown (A-B-A-B rhyme, pembayang vs. maksud) and related pantun suggestions.

The Outcome

MalayBERT led on nuance at ~60% macro F1, with SVM close behind (~55%) and TextCNN trailing (~47%) — exactly as the data scarcity predicted. The transparent three-model view turns a black-box label into an explainable, teachable read of each verse, useful for students of classical Malay literature.

Full Tech Stack

PythonPyTorchTransformersMalayBERTscikit-learnTF-IDFTextCNNpandas

← Previous

myFinaFlow

Makmur