LQR-Trees: Feedback Motion Planning via Sums-of-Squares Verification

Team: Hilgad Montelo and Gray Cortright Thomas

Final Presentation (time limited slide show)

https://docs.google.com/presentation/d/1JesOfOTPrU2TUCUC41i8DZpSN6zi4ur23JJtOIyYZV8/edit?usp=sharing

Source Code:

https://bitbucket.org/hmontelo/homework_2

Introduction

Reference papers

R. Tedrake, I. R. Manchester, M. Tobenkin, and J. W. Roberts. "LQR-Trees: Feedback motion planning via sums of squares verification". International Journal of Robotics Research, 2010.
P. Reist, P. V. Preiswerk, and R. Tedrake. "Feedback-motion-planning with simulation-based in LQR-Trees". The International Journal of Robotics Research, 2016.

Problem Setup

where

Implementation of SOS certification in python

Beginning as we did with the python language as a constraint, the problem of working towards an implementation of the LQR trees algorithm was largely composed of the subproblems SOS verification and trajectory optimization. Python's cvxpy library offered us some simple tools for disciplined convex programming, but the SOS verification problem is in fact several steps more abstract than what was provided. We set about the problem as a symbolic, nonlinear problem—we used sympy to represent the f(x,v) function and the g(x,v) function from the nonlinear equation of motion x_dot = f(x,v) + g(x,v)u—and exploited that library's automatic differentiation ability to find the low order taylor approximations which could be verified sum of squares. As it turned out, and I expect other groups found a similar conclusion, the order to which you approximate the sine function drastically changes the results of the convex optimization, but we did not know this at the time.

Using our own function which calculates the Taylor term for a monomial order, we populated an array of tuples to represent a polynomial. Each tuple element contained two integers, which represented the monomial order or exponents on the indeterminate x and v variables—x being the position of the pendulum, and v its velocity. A third variable in the tuple indicated the coefficient of this term within a polynomial. However, we soon realized that the computational losses of a more convenient dictionary format were insignificant, and wrote a conversion function from the list of tuples. This new format mapped pairs (two tuples) to values—as you may have expected, the two tuples represented monomial order while the value was the coefficient. We then went on to flesh out the algebra of polynomials, subclassing dictionary and adding the overloadings of addition, subtraction, and multilplication which one may expect for any polynomial class. The dictionary format made it quite convenient, and due to python's duck typing feature, we were even able to store the cvxpy library's mathematical expression type as a coefficient.

This proved extremely useful, since the value of the h polynomial's coefficients are naturally optimization variables, and these optimization variables then appear linearly in the coefficients of the sum of squares certificate polynomial. This certificate polynomial constraint being,

,

the negative derivative of the Lyapunov function plus a SOS polynomial h times the difference of rho and the Lyapunov function must be sum of squares. Yet even in this form we were not ready to perform a convex optimization, because the polynomial coefficients must be mapped to positive semi-definite matrices by an affine constraint depending on a pre-defined monomial vector.

Indeed this monomial vector is important, and its outer product must include all relevant monomials in the certificate polynomial. The algorithm we used to determine these constraints required specifying the monomial vector as a list of pairs, where each pair represented an order. We then built a dictionary which mapped the monomial order tuples which existed in the polynomial to a list of pairs of indexes, where the indexes specified an element of a matrix which when placed in quadratic form by the monomial vector would produce the appropriate monomial order in the resulting polynomial. Using this lookup dictionary, we then constructed an equality expression to be used as a constraint—one equality for each monomial order possible given the monomial vector—the vast majority of these equations summing many matrix elements to a coefficient of the polynomial, and the rest summing many matrix elements to equal zero—the case when the polynomial simply lacked this coefficient. Naturally we tried to minimize the number of this second type of constraint by properly designing the monomial vector.

In order to solve the sum of squares problem we had set up, we used the MOSEK solver, which improves over the splitting cone solver packaged in cvxpy in terms of speed and accuracy. A key point we noticed is that the optimizer has a fuzzy scale of success, and this played into our version of bisection. No one really trusts an optimizer to differ between "optimal" and "optimal-inaccurate" at a precicely determined line in the sand, so instead of bisecting to this boundary we bisected towards a region—this limits the accuracy we can attempt to claim. Discretizing rho between a high value and 1, we solve for every value of rho and analyze the result, which is in the set \{'optimal', 'optimal-inaccurate', 'infeasible', 'error'}, and search for the first transition from optimal to optimal-inaccurate and the last transition from optimal-inaccurate to either infeasible or error. These transition forms the new bounds for the next line search, and we converge pretty quickly to one region in which the solution is inaccurate. Taking the first inaccurate value as a proxy for infeasibility seems to be a good strategy based on some example plots, obviously though they all look alike and are not all worth displaying here.

Fig. Illustration of the quality of this strategy. The red region is a region of positive values, and the green ellipse is representative of the rhos our algorithm chooses. Fourth order model below, perfect model above.

Trajectory Optimization

As for the shooting method described in the paper, we recognized that this trajectory optimization problem is fundamentally a nonlinear problem, and felt the freedom that many feel to approch this intractable problem using whatever tools we fancied at the time. Tedrake's group favored shooting, probably using artificially low curvature inputs, while we chose to use sequential convex optimization. Since Tedrake's group has used sequential quadratic programming, our sequential convex programming can be said to generalize their technique, but realistically it was totaly different given the way we structured the problem. The input we split into discrete steps, beginning with random noise. We chose a random point in the phase space to represent the random point generated in the LQR-tree algorithm. From this random point we simulated out the behavior of following our scripted input sequence, and at every point along this line saved the linearization matrices—of course we had to convert from a continuous time linearization to a discrete time representation, and we did this using a simple approximation of matrix eponentiation. Our simulator used a simplistic Runge-Kutta solver to estimate the linearization as well as the nominal state trajectory. We cut our simulation when we got close to the goal, or at 40 samples, but then we re-sampled until we found a trajectory which naturally got close enough to be shorter than fourty samples. Using this sample, it's linear approximation, and it's input sequence as a base we performed a discretized optimal control problem with distance from the goal as a primary cost and input use as a secondary cost. We then re-simulated (possibly shortening the trajectory if it naturally got close early) and repeated the convex optimization. We re-simulated and re-optimized about fifteen times, and ended up generating several trajectories which tended to be input heavy, however the tradeoff between input use and duration was indirect—and depended on how close a value was to the goal before the simulator truncated the trajectory. Optimizing for efficiency is actually a very tractable problem for sequential convex optimization, but the inclusion of a time constraint makes it more difficult—time constraints are not convex, since the time cost of the sum of two trajectories is the larger of the two time costs.

Figs. two random starting points, evolution of the trajectory from yellow to black. Top is the input, middle the position, and bottom the phase space.

We also toyed with the idea of a more precise solver, one which explicitly solves first for reaching the goal, second for efficiently reaching the goal, and third for the best tradeoff of duration to cost—however this would represent far more computational effort, since the line search would be slow. In this regard we were limited by Dr. Sentis's request that we spend no more than 7 hours a week on the problem.

Fig. state machine solver, which first gets to the goal, then optimizes for minimum effort, before moving on to the next shortest path. This results in a mapping from time to energy cost, and the map should only go up in cost if time goes down. With a pre-defined cost function this could be used to direct the movement of the solver—either towards longer times or shorter times until the derivative of the cost function passes zero and indicates an optimum.

Naturally we have considered as well the problem of sum of squares verification in conjunction with trajectory tracking and the time varying Lyapunov and gain matrices this entails. We even have some slight improvements over the state of the art as presented in class, however we have run out of time in this aspect of the implementation as well. Considering our work on implementing the fundamental problem synthesis aspects of SOS in python, this is very much to be expected! Carrying on though, we have noticed a misunderstanding that is potentially leaving our classmates with woefully constrained stability envelope results: when ensuring that the Lyapunov function is decreasing in a region about a trajectory, using the Lyapunov function's iso surface to define this region, then to define a decrease at the edge of the tube, we can add a quadform of S—a scaling of the Lyapunov function—to ensure that decrease. By definition the value of V/rho is one at this boundary. Thus the true bound on the rho value of the previous time step can be found using the following constraint:

Which should allow the funnels to grow substantially backwards in time, rather than limiting them to shrink. For instance, if we imagine a sum of squares verification trajectory which simply sits at the top of the swing, and if this trajectory terminates at a tiny rho, for whatever reason, then we can expect this above equation will allow rho to grow backwards in time until it asymptotically approaches the rho we found for steady state.

Discussion: What is the place for SOS verification and LQR-Trees in the grand scheme of robot control systems?

One enormous take away message from this project was the utter futility of this kind of state filling algorithm for more complex systems than two degrees of freedom. Not only would this drastically increase the search space of the LQR-Trees trajectories, it would also cause the central convex problem to drastically grow in dimension. However, despite its shortcoming the LQR-Tree algorithm provides a glimpse of what may be possible with verification working beforehand to certify wide paths through space. My personal favorite aspect—Gray writing—is the idea of the tube around the trajectory. Just this simple expanded idea of a trajectory seems so appropriate for robots which have permanent noise and modelling error—they do not follow trajectories, they merely live within tubes surrounding the trajectories. If we could shrink the tubes of our robot's trajectories then we could rely more on their behavior. Perfectly modeled non-linearity is only one aspect which can void the guarantee of the Lyapunov function, and another is model uncertainty as defined in robust control. I expect we will find a more tractable way to model the error than polynomial approximations, but that eventually robot planning will be done with a trajectory-tube in mind.