Home‎ > ‎

Data Science


Visualisation

How to visualise your data

Visualisation for Algorithms

Mallet

Importing data

One instance per file:
bin/mallet import-dir --input sample-data/web/* --output web.mallet
One file, one instance per line:
bin/mallet import-file --input /data/web/data.txt --output web.mallet

Build the classifier

bin/mallet train-classifier --input acl/acl.mallet --output-classifier acl/acl.maxent.classifier   --trainer MaxEnt

Test how it works with unseen data

bin/mallet classify-dir --input datadir --output - --classifier classifier

Evaluation of a Classification Algorithm


./bin/mallet train-classifier --input web.mallet --training-portion 0.9 --trainer MaxEnt

./bin/mallet train-classifier --input web.mallet --cross-validation 10 --trainer MaxEnt


Mallet Sequence Tagging


Using SimpleTagger perform n-fold cross validation using these parameters
--train true --test lab --threads 2 --iterations 50 crf-input-data.txt


Generalised Expectation


./bin/mallet import-file --input train.file.tsv --output train.file.mallet


./bin/mallet import-file --input test.file.tsv --use-pipe-from train.file.mallet --output test.file.mallet


vectors2vectors

--input hockey-train.mallet --output hockey.unlabeled.vectors --hide-targets


vectors2featureconstraints

--input lang-train.mallet --output lang.constraints --features-file labeled-features-lang.tsv --targets heuristic


Test:

./bin/mallet train-classifier --training-file   ham.train.unlabeled.vectors --testing-file   ham.test.mallet --trainer "MaxEntGETrainer,gaussianPriorVariance=0.1,constraintsFile=\"ham.constraints\"" --report test:accuracy



Human generated features:


java cc.mallet.classify.tui.Vectors2FeatureConstraints \

--input baseball-hockey.train.vectors \

--output baseball-hockey.constraints \

--features-file baseball-hockey.labeled_features \

--targets heuristic 


Machine generated features:


Finally, we may estimate the expectations using the exact target expectations from the labeled data. The targets option to do this is oracle.

java cc.mallet.classify.tui.Vectors2FeatureConstraints \

--input baseball-hockey.train.vectors \

--output baseball-hockey.constraints \

--features-file baseball-hockey.features \

--targets oracle


---


mallet Line # does not match regex: When importing files to mallet

Run:

tr -dc [:alnum:][\ ,.]\\n < ./inputfile.txt > ./inputfilefixed.txt
See explanation.

Read More about Mallet


Comments