MaltParser 1.0.2

org.maltparser.ml.libsvm.malt04
Class LibsvmMalt04

java.lang.Object
  extended by org.maltparser.ml.libsvm.malt04.LibsvmMalt04
All Implemented Interfaces:
LearningMethod

public class LibsvmMalt04
extends Object
implements LearningMethod

Implements an interface to the LIBSVM learner (LIBSVM 2.80 is used). More information about LIBSVM can be found at LIBSVM -- A Library for Support Vector Machines. This class tries to reproduce the same behavior as MaltParser 0.4. Unfortunately we have to introduce some strange behaviors and bugs to able to reproduce the results:

  1. RightArc{CLASSITEM_SEPARATOR}{ROOT_LABEL} is mapped to the Reduce transition for the Nivre Arc-eager and Nivre Arc-standard algorthm, where {ROOT_LABEL} is specified by the --graph-root_label option and the --guide-classitem_separator option (bug in MaltParser 0.4).
  2. LeftArc{CLASSITEM_SEPARATOR}{ROOT_LABEL} is mapped to the Right Arc transition with last dependency type in the DEPREL tagset, here {ROOT_LABEL} is specified by the --graph-root_label option and the --guide-classitem_separator option (bug in MaltParser 0.4).
  3. The mapping of RightArc{CLASSITEM_SEPARATOR}{ROOT_LABEL} into Reduce results in an illegal transition and therefore the default transition (Shift) is used during parsing (indirect bug in MaltParser 0.4).
  4. Null-value of the LEMMA, FORM, FEATS columns in the CoNLL shared task format is not written into the instance file (this can be controlled by the --libsvm-libsvm_exclude_null and --libsvm-libsvm_exclude_columns options in the new MaltParser)
  5. If feature is an output feature and feature != "OutputColumn(DEPREL, Stack[0])" and it points at a node which has the root as head it will not extract the dependency type of informative root label, instead it will extract the root label specified by the --graph-root_label option (bug in MaltParser 0.4).
  6. If feature = "Split(InputColumn(FEATS, X), \|"), where X is arbitrary node. The set of syntactic and/or morphological features will not be ordered correctly according to the LIBSVM format (bug in MaltParser 0.4).
  7. If feature = "Split(InputColumn(FEATS, X), \|"), where X is arbitrary node. It will not regard the set of syntactic and/or morphological features as set. In some cases, there are treebanks that does not follow the CoNLL data format and have individual syntactic and/or morphological features twice in the FEATS column (bug in MaltParser 0.4).
  8. Unfortunately there is minor difference between LIBSVM 2.80 (used by MaltParser 0.4) and the latest version of LIBSVM. Therefore we have to use the LIBSVM 2.80 to able to reproduce the results.

Since:
1.0
Author:
Johan Hall

Field Summary
protected  Integer learnerMode
          The learner/classifier mode
static String LIBSVM_VERSION
           
protected  String name
          The name of the learner
protected  int numberOfInstances
          Number of processed instances
protected  InstanceModel owner
          The parent instance model
protected  String pathExternalSVMTrain
           
 
Fields inherited from interface org.maltparser.ml.LearningMethod
CLASSIFY, TRAIN
 
Constructor Summary
LibsvmMalt04(InstanceModel owner, Integer learnerMode)
          Constructs a LIBSVM learner.
 
Method Summary
 void addInstance(ClassTable classCodeTable, FeatureVector features)
           
static double atof(String s)
          Returns the double (floating-point) value of the string s
static int atoi(String s)
          Returns the integer value of the string s
protected  void closeInstanceWriter()
          Close the instance writer
 void decreaseNumberOfInstances()
           
 void finalize()
           
 void finalizeSentence(Sentence sentence, DependencyGraph dependencyGraph)
           
 Configuration getConfiguration()
          Returns the current configuration
protected  File getFile(String suffix)
          Returns a file object.
protected  InputStreamReader getInstanceInputStreamReader(String suffix)
          Returns the instance input reader.
protected  OutputStreamWriter getInstanceOutputStreamWriter(String suffix)
          Returns the instance output writer.
 BufferedWriter getInstanceWriter()
           
 Integer getLearnerMode()
          Returns the learner mode
 String getLearningMethodName()
          Returns the name of the learning method
 int getNumberOfInstances()
          Returns the number of processed instances
 InstanceModel getOwner()
          Returns the parent instance model
 String getParamString()
          Returns the parameter string for used for configure LIBSVM
 String[] getSVMParamStringArray(libsvm28.svm_parameter param)
           
 void increaseNumberOfInstances()
           
 void initParameters(libsvm28.svm_parameter param)
          Assign a default value to all svm parameters
protected  void initSpecialParameters()
          Initialize the LIBSVM with a coding and a behavior strategy.
protected  void initSvmParam(String paramString)
          Initialize the LIBSVM according to the parameter string
static void maltSVMFormat2OriginalSVMFormat(InputStreamReader isr, OutputStreamWriter osw, ArrayList<Integer> cardinality)
          Converts the instance file (Malt's own SVM format) into the LIBSVM (SVMLight) format.
 void moveAllInstances(LearningMethod method, Feature divideFeature, ArrayList<Integer> divideFeatureIndexVector)
           
 void noMoreInstances()
           
 void parseParameters(String paramstring, libsvm28.svm_parameter param)
          Parses the parameter string.
 boolean predict(FeatureVector features, KBestList kBestList)
           
 void readProblemMaltSVMFormat(InputStreamReader isr, libsvm28.svm_problem prob, ArrayList<Integer> cardinality, libsvm28.svm_parameter param)
          Reads an instance file into a svm_problem object according to the Malt-SVM format, which is column fixed format (tab-separated).
static void readProblemOriginalSVMFormat(InputStreamReader isr, libsvm28.svm_problem prob, libsvm28.svm_parameter param)
          Reads an instance file into a svm_problem object according to the LIBSVM (SVMLight) format.
 void setLearnerMode(Integer learnerMode)
          Sets the learner mode
protected  void setLearningMethodName(String name)
          Sets the learning method name
protected  void setNumberOfInstances(int numberOfInstances)
          Sets the number of instance
protected  void setOwner(InstanceModel owner)
          Sets the parent instance model
 String toString()
           
 String toStringParameters(libsvm28.svm_parameter param)
          Returns a string containing all svm-parameters of interest
 void train(FeatureVector features)
           
 
Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

LIBSVM_VERSION

public static final String LIBSVM_VERSION
See Also:
Constant Field Values

owner

protected InstanceModel owner
The parent instance model


learnerMode

protected Integer learnerMode
The learner/classifier mode


name

protected String name
The name of the learner


numberOfInstances

protected int numberOfInstances
Number of processed instances


pathExternalSVMTrain

protected String pathExternalSVMTrain
Constructor Detail

LibsvmMalt04

public LibsvmMalt04(InstanceModel owner,
                    Integer learnerMode)
             throws MaltChainedException
Constructs a LIBSVM learner.

Parameters:
owner - the guide model owner
learnerMode - the mode of the learner TRAIN or CLASSIFY
Throws:
MaltChainedException
Method Detail

addInstance

public void addInstance(ClassTable classCodeTable,
                        FeatureVector features)
                 throws MaltChainedException
Specified by:
addInstance in interface LearningMethod
Throws:
MaltChainedException

finalizeSentence

public void finalizeSentence(Sentence sentence,
                             DependencyGraph dependencyGraph)
                      throws MaltChainedException
Specified by:
finalizeSentence in interface LearningMethod
Throws:
MaltChainedException

noMoreInstances

public void noMoreInstances()
                     throws MaltChainedException
Specified by:
noMoreInstances in interface LearningMethod
Throws:
MaltChainedException

train

public void train(FeatureVector features)
           throws MaltChainedException
Specified by:
train in interface LearningMethod
Throws:
MaltChainedException

moveAllInstances

public void moveAllInstances(LearningMethod method,
                             Feature divideFeature,
                             ArrayList<Integer> divideFeatureIndexVector)
                      throws MaltChainedException
Specified by:
moveAllInstances in interface LearningMethod
Throws:
MaltChainedException

predict

public boolean predict(FeatureVector features,
                       KBestList kBestList)
                throws MaltChainedException
Specified by:
predict in interface LearningMethod
Throws:
MaltChainedException

finalize

public void finalize()
              throws MaltChainedException
Specified by:
finalize in interface LearningMethod
Overrides:
finalize in class Object
Throws:
MaltChainedException

getInstanceWriter

public BufferedWriter getInstanceWriter()
Specified by:
getInstanceWriter in interface LearningMethod

closeInstanceWriter

protected void closeInstanceWriter()
                            throws MaltChainedException
Close the instance writer

Throws:
MaltChainedException

initSvmParam

protected void initSvmParam(String paramString)
                     throws MaltChainedException
Initialize the LIBSVM according to the parameter string

Parameters:
paramString - the parameter string to configure the LIBSVM learner.
Throws:
MaltChainedException

initSpecialParameters

protected void initSpecialParameters()
                              throws MaltChainedException
Initialize the LIBSVM with a coding and a behavior strategy. This strategy parameter is used for reproduce the behavior of MaltParser 0.4 (C-impl).

Throws:
MaltChainedException

getParamString

public String getParamString()
Returns the parameter string for used for configure LIBSVM

Returns:
the parameter string for used for configure LIBSVM

getOwner

public InstanceModel getOwner()
Returns the parent instance model

Returns:
the parent instance model

setOwner

protected void setOwner(InstanceModel owner)
Sets the parent instance model

Parameters:
owner - a instance model

getLearnerMode

public Integer getLearnerMode()
Returns the learner mode

Returns:
the learner mode

setLearnerMode

public void setLearnerMode(Integer learnerMode)
Sets the learner mode

Parameters:
learnerMode - the learner mode

getLearningMethodName

public String getLearningMethodName()
Returns the name of the learning method

Returns:
the name of the learning method

getConfiguration

public Configuration getConfiguration()
                               throws MaltChainedException
Returns the current configuration

Returns:
the current configuration
Throws:
MaltChainedException

getNumberOfInstances

public int getNumberOfInstances()
Returns the number of processed instances

Returns:
the number of processed instances

increaseNumberOfInstances

public void increaseNumberOfInstances()
Specified by:
increaseNumberOfInstances in interface LearningMethod

decreaseNumberOfInstances

public void decreaseNumberOfInstances()
Specified by:
decreaseNumberOfInstances in interface LearningMethod

setNumberOfInstances

protected void setNumberOfInstances(int numberOfInstances)
Sets the number of instance

Parameters:
numberOfInstances - the number of instance

setLearningMethodName

protected void setLearningMethodName(String name)
Sets the learning method name

Parameters:
name - the learning method name

getInstanceOutputStreamWriter

protected OutputStreamWriter getInstanceOutputStreamWriter(String suffix)
                                                    throws MaltChainedException
Returns the instance output writer. The naming of the file is standardized according to the learning method name, but file suffix can vary.

Parameters:
suffix - the file suffix of the file name
Returns:
the instance output writer
Throws:
MaltChainedException

getInstanceInputStreamReader

protected InputStreamReader getInstanceInputStreamReader(String suffix)
                                                  throws MaltChainedException
Returns the instance input reader. The naming of the file is standardized according to the learning method name, but file suffix can vary.

Parameters:
suffix - the file suffix of the file name
Returns:
the instance input reader
Throws:
MaltChainedException

getFile

protected File getFile(String suffix)
                throws MaltChainedException
Returns a file object. The naming of the file is standardized according to the learning method name, but file suffix can vary.

Parameters:
suffix - the file suffix of the file name
Returns:
Returns a file object
Throws:
MaltChainedException

readProblemMaltSVMFormat

public void readProblemMaltSVMFormat(InputStreamReader isr,
                                     libsvm28.svm_problem prob,
                                     ArrayList<Integer> cardinality,
                                     libsvm28.svm_parameter param)
                              throws LibsvmException
Reads an instance file into a svm_problem object according to the Malt-SVM format, which is column fixed format (tab-separated).

Parameters:
isr - the instance stream reader for the instance file
prob - a svm_problem object
cardinality - a vector containing the number of distinct values for a particular column.
param - a svm_parameter object
Throws:
LibsvmException

initParameters

public void initParameters(libsvm28.svm_parameter param)
                    throws LibsvmException
Assign a default value to all svm parameters

Parameters:
param - a svm_parameter object
Throws:
LibsvmException

toStringParameters

public String toStringParameters(libsvm28.svm_parameter param)
Returns a string containing all svm-parameters of interest

Parameters:
param - a svm_parameter object
Returns:
a string containing all svm-parameters of interest

getSVMParamStringArray

public String[] getSVMParamStringArray(libsvm28.svm_parameter param)

parseParameters

public void parseParameters(String paramstring,
                            libsvm28.svm_parameter param)
                     throws LibsvmException
Parses the parameter string. The parameter string must contain parameter and value pairs, which are seperated by a blank or a underscore. The parameter begins with a character '-' followed by a one-character flag and the value must comply with the parameters data type. Some examples: -s 0 -t 1 -d 2 -g 0.4 -e 0.1 -s_0_-t_1_-d_2_-g_0.4_-e_0.1

Parameters:
paramstring - the parameter string
param - a svm_parameter object
Throws:
LibsvmException

maltSVMFormat2OriginalSVMFormat

public static void maltSVMFormat2OriginalSVMFormat(InputStreamReader isr,
                                                   OutputStreamWriter osw,
                                                   ArrayList<Integer> cardinality)
                                            throws LibsvmException
Converts the instance file (Malt's own SVM format) into the LIBSVM (SVMLight) format. The input instance file is removed (replaced) by the instance file in the LIBSVM (SVMLight) format. If a column contains -1, the value will be removed in destination file.

Parameters:
isr - the input stream reader for the source instance file
osw - the output stream writer for the destination instance file
cardinality - a vector containing the number of distinct values for a particular column
Throws:
LibsvmException

atof

public static double atof(String s)
                   throws LibsvmException
Returns the double (floating-point) value of the string s

Parameters:
s - string value that should be converted into a double.
Returns:
the double (floating-point) value of the string s
Throws:
LibsvmException

atoi

public static int atoi(String s)
                throws LibsvmException
Returns the integer value of the string s

Parameters:
s - string value that should be converted into an integer
Returns:
the integer value of the string s
Throws:
LibsvmException

readProblemOriginalSVMFormat

public static void readProblemOriginalSVMFormat(InputStreamReader isr,
                                                libsvm28.svm_problem prob,
                                                libsvm28.svm_parameter param)
                                         throws LibsvmException
Reads an instance file into a svm_problem object according to the LIBSVM (SVMLight) format.

Parameters:
isr - the input stream reader for the source instance file
prob - a svm_problem object
param - a svm_parameter object
Throws:
LibsvmException

toString

public String toString()
Overrides:
toString in class Object

MaltParser 1.0.2

Copyright 2007 Johan Hall, Jens Nilsson and Joakim Nivre.