Tag Archives: Integral Windup Problem

PID Integral Windup Problem in MoveToDesired(Front|Back|Left|Right)Distance()

Posted 14 November 2023

After getting the charging station (re)integrated with the rest of the system, I have been working on complete travel-charge-travel charge cycles, where WallE3 travels around the house, finds and connects to the charger, disconnects, travels around the house some more, and then finds its way back to the charger – lather, rinse, repeat.

However, in the process I have run into a problem with the MoveToDesired(Front|Back|Left|Right)Distance() function. On several occasions the robot has blown right by the desired distance and run headlong (or backlong?) into a wall. Investigating has led me to realize that the cause of this problem is the infamous ‘integral windup’ characteristic inherent in insufficiently sophisticated PID algorithms. Here’s the relevant telemetry from a recent MoveToDesiredFrontDistance() run, and an Excel plot showing the ‘integral windup’ issue.

MoveToDesiredFrontDistance run showing integral windup problem

In the above Excel plot, the initial error is -221, which causes an output of ~300. The motor speed is clamped to -75, so the distance and error start slowly heading toward the target of 30cm and an error of zero. However, the integral (I) term continues to increase from near zero to well over -1000, and the output term was completely dominated by the integral value, keeping the motor speed clamped at -75 even as the measured distance approached and passed the target. In this particular case, I got lucky as the actual distance and the target distance were within the +/- 1cm termination window and the loop terminated. In other cases where the actual distance went through the termination window too rapidly, the robot would basically continued forever – or at least as long as necessary to ‘unwind’ the integral term.

So, what to do? Reading up on ‘integral windup’, I found this article from around 1990 (judging by dates on the references), and I decided to try the method described there as the ‘back-calculation and tracking’ method. The idea is that when the output gets clamped to some maximum value (+/- 75 in this application), the integral value is recomputed to a value that would produce the output that would naturally produce the clamp value. For instance for an error value of -221, the Ival is -22.1 which results in an output of -309, which gets clamped to -75. For an output of -75 with an input error of -221 we have -221 + Ival*(-221) + (-221*1.5) = -75 –> Ival = (-75 + 331.5)/-221 = -1.16. Checking, -1.16*(-221) = 256.36, 1.5*(-221) = 331.5, 256.36 – 331.5 = -75.14.

After thinking about this some more, I wondered if instead of back-calculating a new I value, maybe I could simply zero out the retained ‘last_Ival’ parameter whenever the PID output value is high enough so that it would get clamped. Assuming the same clamping value of 75 and a ‘P’ value of 1.5, this would happen any time the absolute value of the error value is greater or equal to 75/1.5 = 50.

Using Excel to apply this algorithm to the telemetry above telemetry data, I get the following plot.

MoveToDesiredFrontDistance run showing integral windup problem, with modified Integral term algorithm

In the above plot, the integral term (yellow line) was modified to be exactly zero whenever the output would have been > 75 or < -75 (the motor speed clamping values), even with a zero I value. As can be seen above, it stays zero until point 75. After that it smoothly decreases to about -55.6 at point 99, and then smoothly increases to about 30.98 at point 135, at which point is gets clamped to zero again. The motor speed value (green line) stays at -75 to point 75, at which point it smoothly (and linearly it appears) increases to +75, where it is clamped again.

I believe this might just do the trick. It will certainly prevent infinite runaway if the error term goes through zero too quickly to cause the loop to exit, as the robot will stop and then back up to the target distance.

I *think* I can modify my ‘PIDCalcs()’ function, and then everything that uses it will get the benefit of the ‘non-winding integral term’ algorithm.

17 November 2023 Update:

One day to go until SpaceX makes its second attempt at getting the world’s largest rocket into space – yeah!

To work the ‘integral term windup’ issue, I ported the FrontBackMotionTest code from an earlier program into the WallE3_Quicksort_V4 project so I could iterate easier, and got that running. As a baseline, here is the telemetry and a short video from the first run:

As can be seen in the above telemetry, the ‘Ival’ column shows that the integral term increases monotonically from -19.7 to -998.4, forcing the speed to its maximum value (75) for the entire run. This causes the robot to badly overshoot the target distance, which is why I decided to have the robot perform the ‘second time through’ action shown in the telemetry and video.

I added the following code to MoveToDesiredFrontDistCm() to zero out the ‘lastIval’ value if the error term * Kp > MOTOR_SPEED_QTR

Here is the telemetry and the video from the run with the above change

As can be seen from the above, this change did not affect the robot’s behavior significantly – it still badly overshot the target distance and the ‘second try’ was still required to bring the robot back to nearer the target. However, from inspection of the ‘Ival’ values it is clear that the added code is doing its job of zeroing out the ‘lastIval’ term when [error_term]*Kp > Max speed. Here’s an Excel plot showing the ‘Ival’ and ‘output’ terms from both the above runs.

comparing Ival, output, speed vals before and after Ival clamp mod

In the above plot, the ‘after Ival term stays at a very low (negative) value for almost the entire run, due to the new clamping code. Consequently the ‘after’ output term decreases linearly with the decreasing error term until the resultant ‘after’ speed command comes off the -75 ‘stop’ as shown by the gray line in the above plot.

Unfortunately, the speed reduction from ‘lastIval’ clamping isn’t enough to prevent the robot from overshooting almost as much as it did before. This indicates (at least to me) that at least the Kp value is way too high. Just as a thought, the Kp value should be just high enough so that if the distance error is, say, 100cm, then the output would just clamp at MOTOR_SPEED_LOW –> 75. So, 100*Kp = 75 –> Kp = 0.75, or about half its current value.

It turned out that I not only needed to cut the Kp value in half, but the Ki value as well, so now PID = (0.75, 0.05, 0.2). Here’s the telemetry and video from a run using these values:

I wasn’t able to eliminate the ‘second try’ requirement, although I was able to reduce the overshoot to about half its previous value.

One possible fly in the ointment is the rapid drop of the front variance value during the second run through MoveToDesiredFrontDistCm(); this *should* have triggered an exit from the function with anomaly code = ANOMALY_STUCK_AHEAD, but didn’t – and I don’t know why.

Too tired to go down this rabbit hole tonight – try again tomorrow while waiting for Space X to make history with its second StarShip test flight!

18 November 2023 Update:

Wow! Wow! Wow! I had the pleasure of watching SpaceX’s literally historic second Starship test launch this morning, and I’m still psyched! I have now had the pleasure of watching both a Saturn V launch as part of the Apollo moon landing program and now the Starship and booster launch as part of (I hope) the Mars landing program.

OK, back to robots. Last night while drifting off to sleep, it occurred to me that the ANOMALY_STUCK_AHEAD alert is conditioned on the current motor configuration as well as the front variance value. Here is the actual code that determines this:

So in this case, the motors were running in reverse, so neither of the motor conditions were TRUE, and therefore IsStuckAhead() returned FALSE. Mystery solved!

At this point I think I’ve done as much as I can with the FrontBackMotionTest program. I will modify the other ‘MoveTo’ functions to clamp the Ival term as described above. However, it is also clear that the ‘second try’ part of the ‘MoveTo’ functions cannot be eliminated without having to accept significant target distance overshoot, and that the original PID values (1.5, 0.1, 0.2) actually work better than (0.75, 0.05, 0.2), as the lower PI values cause the second try operation to take significantly longer. Here’s an Excel plot comparing just the ‘second try’ results for both:

‘Second try’ results for (0.75,0.1,0.2) and (0.75, 0.05, 0.2)

In the above plot, the ‘2’ results are for (0.75, 0.05, 0.2). As can be seen, the ‘2’ configuration takes over twice as long to complete as the ‘normal’ (1.5, 0.1, 0.2) run, and the issue with the front variance value going to zero doesn’t exist for the ‘normal’ run.

This makes it clear that while clamping the Ival does help significantly, the change in PID values does not.

18 November 2023 8:43PM EST Update:

While cleaning up the code, I noticed I had missed one spot where the global OffsetDistKp value should have been replaced by ‘Kp’, the local value, and unfortunately it was in the section that determined whether or not the Ival value would be clamped, as shown below:

This had the effect of skewing the point at which the Ival stopped being clamped, so the previous results are a bit suspect. I ran a few more tests, and discovered the function actually performed better with a higher Kp value than the original 1.5. The reason for this is that a higher Kp value moves the point at which Ival can start accumulating later in the run (closer to the target distance thus less error), and so more quickly reduces motor speed. In addition, the Ival clamping action has no effect on the second pass through the function as the motor speed values never approach the max speed, so this part operates as designed as well. Here’s a run showing the result of using a PID of (2.0, 0.1, 0.2):

As the data shows, the overshoot is only 6cm (as opposed to the 18cm overshoot without the Ival clamp), and the recovery pass completes in about 2 sec instead of over 5. Here are Excel plots showing the primary and secondary passes.

Second pass correction with PID (2.0,0.1,0.2). Time scale is 20 units/sec, so total time ~ 1.5sec

So, it looks like I want to change the global OffsetKp value from 1.5 to 2.0, leaving Ki & Kd unchanged.

19 November 2023 Update:

Looking back at the results from the above tests, I realized that the second call to MoveToDesiredFrontDistCm(Offset, Kp, Ki, Kd) (the ‘second time though’ step) isn’t actually in the MoveToDesiredFrontDistCm() function – it is part of the test harness. So, I think I need to incorporate the second pass directly into the function so it will get performed everywhere MoveToDesiredFrontDistCm() is called. I wonder if I can do this recursively, by calling MoveToDesiredFrontDistCm() from inside MoveToDesiredFrontDistCm()?

So, I moved the ‘Second Time Through’ code from the test harness into MoveToDesiredFrontDistCm(), as shown below:

Here’s the telemetry from the run with recursive MoveToDesiredFrontDistCm() calls:

The first run through MoveToDesiredFrontDistCm() occurs normally, with the exit at the 30cm target distance as expected, and then the expected overshoot to 25cm. Then the second call to MoveToDesiredFrontDistCm() starts at 24cm and gets to 29cm in approximately 1.5sec, followed by two more 10-item distance displays. I guess the first distance display is from the one inside the MoveToDesiredFrontDistCm() call, and the second one is the one from the test harness. I need the delay caused by the 10-distance display, but I can probably replace it with just a 1-sec delay.

Here’s another run with the loop replaced by a 1-sec delay:

From the above, it looks like the function is doing what it should. It keeps iterating until the distance measured after a 1-sec delay fits within the +/- 1cm window.

MoveToDesiredRearDistCm():

Now that MoveToDesiredFrontDistCm() is working, I started on MoveToDesiredRearDistCm(). I started by copying MoveToDesiredFrontDistCm() and changing all relevant occurrences of ‘Front’ to ‘Rear’. This was about 99% of the required effort. However, there were two issues that surfaced during the port. The first was that the test program needed the ability to pass the PID values into the function, but all the mainline code uses just a single parameter (the desired offset). As a temporary fix I created a four-parameter version of the function with all the code, and had the single parameter version pass the global PID values to the four-parameter on. However a much better solution was to remove the single parameter version of MoveToDesiredFrontDistCm() entirely, and instead declare the four-parameter version with three default values right after the definition, as shown here:

The second issue was the sign of the speed variable in the RunBothMotorsBidirectional() function. For MoveToDesiredFrontDistCm() the parameters to this function had to both be negative, as in RunBothMotorsBidirectional(-speed, -speed), but in the MoveToDesiredRearDistCm() they have to be positive as in RunBothMotorsBidirectional(speed, speed). Once this was accomplished, the MoveToDesiredRearDistCm() operation was successful.

MoveToDesiredLeft/RightDistCm():

I started on these two functions, but soon realized that these two functions aren’t called anymore. They were originally used as part of wall offset capture operations, but were abandoned in favor of an algorithm that uses a perpendicular approach. Soooo, I will just remove these two functions entirely and call it good!

Stay tuned,

Frank