A suite of Algorithms available in the form of operators within our Pipeline tool.
REQUEST ACCESSBASE:
Mlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
SVMWithSGD
INPUT:
LabeledPoint, weights, intercept, iterations
BASE:
MLlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
LogisticRegressionWithLBFGS / LogisticRegressionWithSGD
INPUT:
LabeledPoint, weights, intercept, iterations
BASE:
MLlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
LinearRegressionWithSGD
INPUT:
LabeledPoint, initialWeights, regParam, regType
BASE:
MLlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
RidgeRegressionWithSGD
INPUT:
LabeledPoint, initialWeights, regParam, regType
BASE:
MLlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
LassoWithSGD
INPUT:
LabeledPoint, initialWeights, regParam, regType
BASE:
MLlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
IsotonicRegressionModel
INPUT:
boundaries(LabeledPoint), predictions, isotonic
BASE:
MLlib: RDD-Based
CATEGORY:
Classification/Regression
SUBCATEGORY:
Linear
PYSPARK NAME:
StreamingLinearRegressionWithSGDINPUT:
LabeledPoint, stepSize, numIterations, miniBatchFraction, convergenceTol*
BASE:
MLlib: RDD-Based
CATEGORY:
Collaborative filtering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
ALS
INPUT:
Ratings, rank, nonnegative
BASE:
MLlib: RDD-Based
CATEGORY:
Clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
KMeans
INPUT:
RDD, k, maxIterations, epsilon
BASE:
MLlib: RDD-Based
CATEGORY:
Clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
GaussianMixture
INPUT:
RDD, k, convergenceTol
BASE:
MLlib: RDD-Based
CATEGORY:
Clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
PowerIterationClustering
INPUT:
RDD, k
BASE:
MLlib: RDD-Based
CATEGORY:
Clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
LDA
INPUT:
RDD, k, docConcentration, topicConcentration, checkpointInterval, optimizer
BASE:
MLlib: RDD-Based
CATEGORY:
Clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
NaiveBayesModel
INPUT:
LabeledPoint, pi, theta
BASE:
MLlib: RDD-Based
CATEGORY:
Hierarchical clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
BisectingKMeans
INPUT:
RDD, k, minDivisibleClusterSize
BASE:
MLlib: RDD-Based
CATEGORY:
Dimensionality reduction
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
RowMatrix.computeSVD
INPUT:
k, computeU, rCond
BASE:
MLlib: RDD-Based
CATEGORY:
Clustering
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
StreamingKMeans
INPUT:
LabeledPoint, k, decayFactor, timeUnit
BASE:
MLlib: RDD-Based
CATEGORY:
Dimensionality reduction
SUBCATEGORY:
Nonlinear
PYSPARK NAME:
RowMatrix.computePrincipalComponents
INPUT:
k
BASE:
MLlib: RDD-Based
CATEGORY:
Frequent Pattern Mining
SUBCATEGORY:
Data mining
PYSPARK NAME:
FPGrowth
INPUT:
RDD, minSupport, numPartitions
BASE:
MLlib: RDD-Based
CATEGORY:
Frequent Pattern Mining
SUBCATEGORY:
Data mining
PYSPARK NAME:
PrefixSpan
INPUT:
RDD, minSupport, maxPatternLength, maxLocalProjDBSize
BASE:
MLlib: RDD-Based
CATEGORY:
Classification / Regression
SUBCATEGORY:
Tree
PYSPARK NAME:
DecisionTreeModel
INPUT:
LabeledPoint, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins
BASE:
MLlib: RDD-Based
CATEGORY:
Classification / Regression
SUBCATEGORY:
Tree
PYSPARK NAME:
RandomForestModel
INPUT:
LabeledPoint, numClasses, categoricalFeaturesInfo, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins
BASE:
MLlib: RDD-Based
CATEGORY:
Classification / Regression
SUBCATEGORY:
Tree
PYSPARK NAME:
GradientBoostedTreesModel
INPUT:
LabeledPoint, categoricalFeaturesInfo, loss, numIterations, learningRate, maxDepth, maxBins