SVM Talking algos: Using Interior Point Methods for SVM Training
支持向量机说话算法:利用内点方法进行支持向量机训练

吕昀鸿    内蒙古大学
时间:2022-09-23 语向:英-中 类型:人工智能 字数:1652
  • SVM Talking algos: Using Interior Point Methods for SVM Training
    支持向量机对话算法:用内点法进行支持向量机训练
  • SVM Talking algos: Using Interior Point Methods for SVM Training
    支持向量机对话算法:用内点法进行支持向量机训练
  • Training Support Vector Classifier from scratch using optimisation methods
    使用优化方法从头开始训练支持向量分类器
  • Alright, folks! We have already discussed the preliminaries of constrained optimisation and formulated the Support Vector Machine classification problem as quadratic program (QP). Today we learn how to train Support Vector Classifier numerically.
    好了,伙计!我们已经讨论了约束优化的初步问题,并将支持向量机分类问题表述为二次规划(QP)。今天我们学习如何对支持向量分类器进行数值训练。
  • This article can be split into two parts: theory and practice! If you have come across Interior Point methods before, you can just skip the theoretical part and scroll down to our practical example!
    本文可以分为两个部分:理论和实践!如果你以前接触过Interior Point方法,你可以跳过理论部分,向下滚动到我们的实际示例!
  • First, we try to provide some intuition to Interior Point Methods starting from Barrier Methods. Second, we formulate the Primal-dual method that comes directly from modified KKT conditions. This might not make sense right now but it will as you read! Finally, we code our classifier from scratch — that’s where all the building blocks learnt before will come together.
    首先,我们尝试从屏障法开始,为内点法提供一些直观的理解。其次,我们建立了直接来自修正KKT条件的原对偶方法。这可能现在没有意义,但当你阅读时就会明白了!最后,我们从头开始编写分类器的代码——这是之前学习的所有构建模块将结合在一起的地方。
  • Theory! Support Vector Classifier as QP
    理论!支持向量分类器作为QP
  • We start off by recalling that Support Vector Classifier can be expressed as quadratic program through its dual
    我们首先回顾了支持向量分类器可以通过它的对偶表示为二次规划
  • where
    其中
  • This is our optimisation problem! Let’s now learn how to tackle it numerically.
    这是我们的优化问题!现在让我们来学习如何从数字上解决这个问题。
  • Build your intuition: Interior Point Methods
    建立你的直觉:内点法
  • The important idea behind interior-point methods is that they try to solve the problem (written above) or the KKT conditions (formulated for the problem above) by applying Newton’s method to a sequence of equality constrained problems or to a sequence of modified versions of KKT conditions. Two natural questions are: where do inequality constraints disappear and what are modified versions of KKT conditions?
    内点方法背后的重要思想是,它们试图通过将牛顿方法应用到一系列等式约束问题或一系列修改版本的KKT条件来解决问题(上面写过)或KKT条件(针对上面的问题制定的)。两个自然的问题是:不等式约束在哪里消失了? KKT条件的修改版本是什么?
  • The former can be easily viewed through Barrier methods helping to build intuition for the latter. Let’s jump straight into it ?
    前者可以很容易地通过障碍方法来观察,帮助建立对后者的直觉。让我们直接跳下去
  • Where do inequality constraints disappear?
    不等式约束在哪里消失?
  • We attempt to approximately formulate the inequality constrained problem (written above) as an equality constrained problem to which Newton’s method can be applied. Our first step is to rewrite the problem (let’s take the general form discussed here) by making the inequality constraints implicit in the objective. This is done by introducing the indicator function. Hence we have
    我们试图将上面所写的不等式约束问题近似地表述为可以应用牛顿方法的等式约束问题。我们的第一步是重写问题(让我们采用这里讨论的一般形式),使不等式约束隐含在目标中。这是通过引入指示器函数来实现的。因此我们有
  • So far so good. The only problem here is that this objective cannot be twice differentiable, which means that our assumptions are violated and we cannot apply Newton’s method to it. So what can we do?
    到目前为止一切顺利。这里唯一的问题是这个目标不能是二次可微的,这意味着我们的假设被违背了我们不能将牛顿的方法应用到它上面。那么我们能做什么呢?
  • We approximate it one more time! Now we want to approximate the indicator function with something that can be differentiable. Likely we have one! We approximate the indicator function by another function
    我们再逼近一次!现在我们想用可微的东西来近似指示函数。很可能我们有一个!我们用另一个函数逼近指示函数
  • where t>0 is a parameter that sets the accuracy of the approximation.
    其中t>0是设置近似精度的参数。
  • The graph shows the accuracy of approximation for different values of t=0.5, 2, 4, 5. The blue line (t=5) demonstrates that as t parameter grows, we have a better approximation.
    该图显示了T=0.5,2,4,5的不同值的近似精度。蓝线(t=5)表明,随着t参数的增长,我们有一个更好的近似。
  • This gives us a new optimisation problem. By multiplying the objective by t, we get the equivalent minimisation problem that has the same minimisers
    这给了我们一个新的优化问题。通过将目标乘以t,得到具有相同极小点的等价极小化问题
  • The new function phi is called the logarithmic barrier which accounts for inequality constraints.
    新函数phi被称为对数障碍,它解释了不等式约束。
  • What are the modified KKT conditions?
    修改后的KKT条件是什么?
  • We now consider the problem written above. For every t>0 we define x*(t) as the solution. As we apply the Newton method, we have central points which are x*(t). This central path is strictly feasible which means that equality conditions are satisfied for these central points as well. We recall that the minimisers are found where the gradient is zero. We write the Lagrangian
    我们现在考虑上面写的问题。对于每一个t>0我们定义x*(t)作为解。当我们应用牛顿法时,中心点是x*(t)这条中心路径是严格可行的,这意味着这些中心点也满足相等条件。我们记得,最小值是在梯度为零的地方找到的。我们写出拉格朗日量
  • Wonder where nu is coming from? As before, we write our minimisation problem all in one expression in the form of Lagrangian (which means augmenting the objective function with Lagrangian multiplier for equality constraints). Then we take the derivative and make it equal to zero, we recall that this is where minimisers are. We state that there exists optimal nu such that the expression written blow holds.
    想知道nu是从哪里来的?和以前一样,我们把我们的最小化问题写在一个以拉格朗日形式表示的表达式中(这意味着用拉格朗日乘子来增加目标函数的等式约束)。然后我们取导数,使它等于零,我们回想一下,这就是极小点所在的地方。我们指出,存在最优的名词,使得书面打击的表达式成立。
  • From the expression above we can derive an important property of the central path: every central point x*(t) yields a dual feasible point and hence, the lower bound on the optimal solution p*. Recalling reading about it here?
    从上面的表达式我们可以推导出中心路径的一个重要性质:每个中心点x*(t)产生一个对偶可行点,因此是最优解p*的下界。记得在这里读到过吗?
  • If we define the lambda as following
    如果我们将lambda定义如下
  • This gives us a dual function and by plugging our lambda inside, we have
    这给了我们一个双重功能,通过插入我们的lambda,我们有
  • Whereas before we said that the duality gap is practically zero, practically we showed that it’s not quite like that. We have our duality gap that equals -m/t.
    之前我们说过对偶差实际上是零,实际上我们证明了它不是那样的。我们有对偶差等于-m/t。
  • This deforms the form of our stated KKT conditions. The new KKT conditions are simply called modified KKT conditions. Thus our last condition is replaced by taking into account the duality gap we discussed!
    这变形了我们陈述的KKT条件的形式。新的KKT条件简称为修正KKT条件。因此,我们的最后一个条件被替换为考虑到我们所讨论的二元性差距!
  • Thus are new modified KKT conditions, a.k.a centrality conditions, are
    因此,新的修正的KKT条件,即中心性条件
  • So far so good! We provided intuition where the modified KKT conditions are coming from, which hopefully helped you grasp the idea behind interior point algorithms better. From this moment onwards, all you should remember is the modified KKT conditions and that we are trying to solve the system using the Newton method! :)
    到目前为止还不错!我们提供了修改后的KKT条件来自哪里的直觉,希望能帮助您更好地理解内点算法背后的思想。从这一刻起,所有你应该记住的是修改的KKT条件和我们正在尝试用牛顿方法解决系统!:)
  • Solving numerically: Primal-dual method
    数值求解:原始-对偶法
  • Essentially, we express the modified conditions as
    本质上,我们将修改后的条件表示为
  • where we define
    我们定义的地方
  • and f(x) is a vector of inequality constraints and and Df is its derivative.
    f(x)是不等式约束的向量,Df是它的导数。
  • Once we find x, lambda, nu that satisfy the first equation, that means that we found our minimisers! Do you see where these matrix of equations is coming from? This is indeed our modified KKT conditions with all the terms on the left-hand side!
    一旦我们找到满足第一个方程的x,lambda,nu,那就意味着我们找到了我们的最小值!你知道这些方程组是从哪里来的吗?这确实是我们修改的KKT条件与所有条款在左手边!
  • Now what should we do to solve it? We consider the Newton step for solving nonlinear equations r_t for fixed t. We will denote the current point and Newton step as
    现在我们该怎么解决呢?我们考虑了非线性方程组Rt的Newton步长。我们将当前点和牛顿步长表示为
  • respectively. The Newton step is characterized by the nonlinear equations
    分别。牛顿步长由非线性方程组表征
  • So in terms of x, lambda and nu we have
    所以我们有x,lambda和nu
  • Practice! Translating into the code
    练习!翻译成代码
  • Finally, here we are! We have our final nonlinear equations that we want to solve numerically using the Newton method. How do we code it up? In this tutorial we use Matlab
    终于,我们到了!我们有了最终的非线性方程组,我们想用牛顿法数值求解。我们怎么把它编码起来?在本教程中,我们使用Matlab
  • Note about our data
    关于我们的数据的注意事项
  • We use the synthetically generated data set that contains two classes and cannot be easily separated by the straight line. This data set was generated using Python code and can be accessed here.
    我们使用合成生成的数据集,它包含两个类,并且不容易被直线分开。该数据集是使用Python代码生成的,可以在这里访问。
  • Step 1. Read the data from the CSV file
    步骤1。从CSV文件中读取数据
  • Step 2. Translate the problem into the code
    第二步。把问题翻译成代码
  • Remember, in the beginning of this article we discussed our QP for Support Vector Machine? Translating it into the code is very straightforward as we 1) know the objective and 2) know our inequality and equality constraints. Let’s put it together.
    还记得吗,在本文的开头,我们讨论了支持向量机的QP ?把它翻译成代码非常简单,因为我们1)知道目标2)知道不等式和等式的约束条件。让我们把它放在一起。
  • We first need to create our D matrix. Recall, it contains the product of x and y, hence we write for loop to find it. Once D is found, we write its first-order and second-order derivatives. Then we put together inequality and equality constraints and their derivatives.
    我们首先需要创建我们的D矩阵。回想一下,它包含x和y的乘积,因此我们编写for loop来查找它。一旦找到D,我们就写出它的一阶和二阶导数。然后我们把不等式和等式约束及其导数放在一起。
  • Lines from 33–38 show the problem setup. Recall we are trying to optimise x, lambda and nu, so we should start from somewhere! How do we pick a starting point? You can really start with whatever, the only requirement: the choice should be feasible, meaning that at this point the constraints of the problem are not violated!
    第33-38行显示了问题设置。回想一下,我们正在尝试优化x, lambda和nu,所以我们应该从某个地方开始!我们如何选择一个起点?你可以从任何东西开始,唯一的要求是:选择应该是可行的,这意味着在这一点上没有违反问题的约束!
  • Step 3. Picking default parameters
    第三步。拾取默认参数
  • Before we only discussed parameter t, however there are other parameters that we should choose by default. The primal dual algorithm assumes that at each iteration t is getting bigger, so we introduce m constant that we multiply t by at every iteration (we will see that further in the code). tol and maxIter are two parameters that are stopping criteria.
    之前我们只讨论了参数t,但是还有一些默认情况下我们应该选择的参数。原始对偶算法假设每次迭代t都变大,所以我们引入m常数,每次迭代都乘以t(我们将在后面的代码中看到)。tol和maxIter是停止条件的两个参数。
  • tol controls surrogate duality gap, so we only stop when the surrogate duality gap is very very small. Surrogate duality gap is introduced due to the fact that Primal-dual method does not necessarily produce points at every iteration that are necessarily feasible, this means that at each iteration we cannot necessarily compute the duality gap and stop when its very very small — hence, we introduce the surrogate duality gap.
    tol控制代理对偶间隙,所以我们只在代理对偶间隙非常非常小的时候才停止。引入代理对偶间隙是因为原始对偶方法不一定在每次迭代中产生一定可行的点,这意味着在每次迭代中,我们不一定计算对偶间隙并在其非常非常小时停止--因此,我们引入代理对偶间隙。
  • maxIter just ensures that we go on for a certain number of iterations so we won’t be solving forever.
    maxIter只是确保我们继续进行一定数量的迭代,所以我们不会永远解决问题。
  • Backtracking options allow to determine the length of the step. We will see how it is used inside of the algorithm.
    回溯选项允许确定步骤的长度。我们将看到它是如何在算法内部使用的。
  • Step 4. Code the algorithm up
    第四步。把算法编好
  • Firstly, I am deeply thankful to Dr. Marta Betcke who introduced me to Numerical Optimisation while I was studying in UCL. The algorithm presented in this section was inspired by Dr. Marta Betcke who is a lecturer in Numerical Optimisation in University College London.
    首先,我非常感谢Marta Betcke博士,当我在UCL学习时,她把我介绍给了数值优化。本节介绍的算法是受到Marta Betcke博士的启发,她是伦敦大学学院的数值优化讲师。
  • We create a separate function that would take all our parameters defined above as input.
    我们创建一个单独的函数,它将上面定义的所有参数作为输入。
  • We found our deltaY by inverting r_t for deltaY [1]
    我们通过反演deltaY的r_t来找到我们的deltaY[1]
  • Step 5. Determine step length
    第五步。确定步长
  • We found the direction we need to go, but we have not done a step in this direction yet. How should we determine the length of our step? We use line search but adapt it to work for the Primal-dual algorithm.
    我们找到了我们需要走的方向,但是我们还没有朝这个方向做一步。我们应该如何确定我们的步长呢?我们使用线搜索,但将其调整为适用于原始-对偶算法。
  • The basic line search should be modified by accounting for the norm of the residual and by ensuring that two constraints hold. The first constrain is
    应通过考虑残差的范数和确保两个约束条件成立来修改基本线搜索。第一个约束是
  • For the sake of simplicity, here we assume that this first constraint is never violated. Thus we set s_max=1.
    为了简单起见,这里我们假设这第一个约束从未被违反。因此,我们设置s_max=1。
  • The second constraint ensures that
    第二个约束确保
  • To ensure the second constraint, we need to start backtracking reducing s until the constraints are met. That’s where the optsBT options we introduced before kick in! As we don’t want to backtrack forever if the constraints are not met, we want to ensure that we stop when a certain number of iterations is met.
    为了确保第二个约束,我们需要开始回溯减少s,直到满足约束。这就是我们之前引入的optsBT选项发挥作用的地方!由于我们不希望在没有满足约束条件的情况下永远回溯,所以我们希望确保在满足一定数量的迭代时停止。
  • We keep doing this exercise until either our surrogate duality gap is smaller than the tolerance level or the maximum number of iterations is reached! The final trio x_k, lambda_k and nu_k will be our solution!
    我们一直做这个练习,直到我们的代理对偶性差距小于容差级别,或者达到最大迭代次数!最后的三人组x_k,lambda_k和nu_k将是我们的解决方案!
  • Step 6. Call to action
    第六步。行动号召
  • We call the created function that we named PrimalDual to work!
    我们调用创建的函数PrimalDual来工作!
  • Step 7. Plot the decision boundary
    第七步。绘制决策边界
  • Here we are! As we have been recording the iterations, we now can find the supporting vectors that are bigger than some threshold (we use 0.001)!
    我们到了!由于我们已经记录了迭代,我们现在可以找到比某些阈值更大的支持向量(我们使用0.001)!
  • Recall that our infoPD struct also contains information on the surrogate duality gap at every iteration. By plotting it as a function of a number of iteration, we can see that it’s rapidly decreasing. It does not go to zero exactly, but it’s smaller than the tol=1e-12 we set in the very beginning.
    回想一下,我们的infoPD结构还包含关于每次迭代时代理对偶性间隙的信息。通过绘制它作为迭代次数的函数,我们可以看到它在迅速减少。它不是精确到零,但它比我们一开始设置的tol=1e-12小。
  • References:
    参考资料:
  • [1] V. L. Boyd Stephen, Convex Optimization. 2004.
    [1]V.L.Boyd Stephen,凸优化。2004年。

400所高校都在用的翻译教学平台

试译宝所属母公司