public class SimpleKMeans extends RandomizableClusterer implements NumberOfClustersRequestable, WeightedInstancesHandler, TechnicalInformationHandler
@inproceedings{Arthur2007, author = {D. Arthur and S. Vassilvitskii}, booktitle = {Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms}, pages = {1027-1035}, title = {k-means++: the advantages of carefull seeding}, year = {2007} }Valid options are:
-N <num> number of clusters. (default 2).
-P Initialize using the k-means++ method.
-V Display std. deviations for centroids.
-M Replace missing values with mean/mode.
-A <classname and options> Distance function to use. (default: weka.core.EuclideanDistance)
-I <num> Maximum number of iterations.
-O Preserve order of instances.
-fast Enables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances.
-S <num> Random number seed. (default 10)
RandomizableClusterer
,
Serialized FormConstructor and Description |
---|
SimpleKMeans()
the default constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
buildClusterer(Instances data)
Generates a clusterer.
|
int |
clusterInstance(Instance instance)
Classifies a given instance.
|
String |
displayStdDevsTipText()
Returns the tip text for this property.
|
String |
distanceFunctionTipText()
Returns the tip text for this property.
|
String |
dontReplaceMissingValuesTipText()
Returns the tip text for this property.
|
String |
fastDistanceCalcTipText()
Returns the tip text for this property.
|
int[] |
getAssignments()
Gets the assignments for each instance.
|
Capabilities |
getCapabilities()
Returns default capabilities of the clusterer.
|
Instances |
getClusterCentroids()
Gets the the cluster centroids.
|
int[][][] |
getClusterNominalCounts()
Returns for each cluster the frequency counts for the values of each
nominal attribute.
|
int[] |
getClusterSizes()
Gets the number of instances in each cluster.
|
Instances |
getClusterStandardDevs()
Gets the standard deviations of the numeric attributes in each cluster.
|
boolean |
getDisplayStdDevs()
Gets whether standard deviations and nominal count.
|
DistanceFunction |
getDistanceFunction()
returns the distance function currently in use.
|
boolean |
getDontReplaceMissingValues()
Gets whether missing values are to be replaced.
|
boolean |
getFastDistanceCalc()
Gets whether to use faster distance calculation.
|
boolean |
getInitializeUsingKMeansPlusPlusMethod()
Get whether to initialize using the probabilistic farthest
first like method of the k-means++ algorithm (rather than
the standard random selection of initial cluster centers).
|
int |
getMaxIterations()
gets the number of maximum iterations to be executed.
|
int |
getNumClusters()
gets the number of clusters to generate.
|
String[] |
getOptions()
Gets the current settings of SimpleKMeans.
|
boolean |
getPreserveInstancesOrder()
Gets whether order of instances must be preserved.
|
String |
getRevision()
Returns the revision string.
|
double |
getSquaredError()
Gets the squared error for all clusters.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
String |
globalInfo()
Returns a string describing this clusterer.
|
String |
initializeUsingKMeansPlusPlusMethodTipText()
Returns the tip text for this property.
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Main method for executing this class.
|
String |
maxIterationsTipText()
Returns the tip text for this property.
|
int |
numberOfClusters()
Returns the number of clusters.
|
String |
numClustersTipText()
Returns the tip text for this property.
|
String |
preserveInstancesOrderTipText()
Returns the tip text for this property.
|
void |
setDisplayStdDevs(boolean stdD)
Sets whether standard deviations and nominal count.
|
void |
setDistanceFunction(DistanceFunction df)
sets the distance function to use for instance comparison.
|
void |
setDontReplaceMissingValues(boolean r)
Sets whether missing values are to be replaced.
|
void |
setFastDistanceCalc(boolean value)
Sets whether to use faster distance calculation.
|
void |
setInitializeUsingKMeansPlusPlusMethod(boolean k)
Set whether to initialize using the probabilistic farthest
first like method of the k-means++ algorithm (rather than
the standard random selection of initial cluster centers).
|
void |
setMaxIterations(int n)
set the maximum number of iterations to be executed.
|
void |
setNumClusters(int n)
set the number of clusters to generate.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
void |
setPreserveInstancesOrder(boolean r)
Sets whether order of instances must be preserved.
|
String |
toString()
return a string describing this clusterer.
|
getSeed, seedTipText, setSeed
distributionForInstance, forName, makeCopies, makeCopy, runClusterer
public TechnicalInformation getTechnicalInformation()
TechnicalInformationHandler
getTechnicalInformation
in interface TechnicalInformationHandler
public String globalInfo()
public Capabilities getCapabilities()
getCapabilities
in interface Clusterer
getCapabilities
in interface CapabilitiesHandler
getCapabilities
in class AbstractClusterer
Capabilities
public void buildClusterer(Instances data) throws Exception
buildClusterer
in interface Clusterer
buildClusterer
in class AbstractClusterer
data
- set of instances serving as training dataException
- if the clusterer has not been
generated successfullypublic int clusterInstance(Instance instance) throws Exception
clusterInstance
in interface Clusterer
clusterInstance
in class AbstractClusterer
instance
- the instance to be assigned to a clusterException
- if instance could not be classified
successfullypublic int numberOfClusters() throws Exception
numberOfClusters
in interface Clusterer
numberOfClusters
in class AbstractClusterer
Exception
- if number of clusters could not be returned
successfullypublic Enumeration listOptions()
listOptions
in interface OptionHandler
listOptions
in class RandomizableClusterer
public String numClustersTipText()
public void setNumClusters(int n) throws Exception
setNumClusters
in interface NumberOfClustersRequestable
n
- the number of clusters to generateException
- if number of clusters is negativepublic int getNumClusters()
public String initializeUsingKMeansPlusPlusMethodTipText()
public void setInitializeUsingKMeansPlusPlusMethod(boolean k)
k
- true if the k-means++ method is to be used to select
initial cluster centers.public boolean getInitializeUsingKMeansPlusPlusMethod()
public String maxIterationsTipText()
public void setMaxIterations(int n) throws Exception
n
- the maximum number of iterationsException
- if maximum number of iteration is smaller than 1public int getMaxIterations()
public String displayStdDevsTipText()
public void setDisplayStdDevs(boolean stdD)
stdD
- true if std. devs and counts should be
displayedpublic boolean getDisplayStdDevs()
public String dontReplaceMissingValuesTipText()
public void setDontReplaceMissingValues(boolean r)
r
- true if missing values are to be
replacedpublic boolean getDontReplaceMissingValues()
public String distanceFunctionTipText()
public DistanceFunction getDistanceFunction()
public void setDistanceFunction(DistanceFunction df) throws Exception
df
- the new distance function to useException
- if instances cannot be processedpublic String preserveInstancesOrderTipText()
public void setPreserveInstancesOrder(boolean r)
r
- true if missing values are to be
replacedpublic boolean getPreserveInstancesOrder()
public String fastDistanceCalcTipText()
public void setFastDistanceCalc(boolean value)
value
- true if faster calculation to be usedpublic boolean getFastDistanceCalc()
public void setOptions(String[] options) throws Exception
-N <num> number of clusters. (default 2).
-P Initialize using the k-means++ method.
-V Display std. deviations for centroids.
-M Replace missing values with mean/mode.
-A <classname and options> Distance function to use. (default: weka.core.EuclideanDistance)
-I <num> Maximum number of iterations.
-O Preserve order of instances.
-fast Enables faster distance calculations, using cut-off values. Disables the calculation/output of squared errors/distances.
-S <num> Random number seed. (default 10)
setOptions
in interface OptionHandler
setOptions
in class RandomizableClusterer
options
- the list of options as an array of stringsException
- if an option is not supportedpublic String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class RandomizableClusterer
public String toString()
public Instances getClusterCentroids()
public Instances getClusterStandardDevs()
public int[][][] getClusterNominalCounts()
public double getSquaredError()
m_FastDistanceCalc
public int[] getClusterSizes()
public int[] getAssignments() throws Exception
Exception
- if order of instances wasn't preserved or no assignments were madepublic String getRevision()
getRevision
in interface RevisionHandler
getRevision
in class AbstractClusterer
public static void main(String[] args)
args
- use -h to list all parametersCopyright © 2012 University of Waikato, Hamilton, NZ. All Rights Reserved.