Monthly Archives: July 2023

Using Out-of-Range Steerval for Anomaly Detection

Posted 29 July 2023

WallE3 has to be able to handle anomalous conditions as it wanders around our house. An anomalous condition might be running into an obstacle and getting stuck, or sensing an upcoming wall (in front or in back). It generally does a pretty good job with these simple situations, but I have been struggling lately with what I refer to as ‘the open door’ problem. The open door problem is the challenge of bypassing an open doorway on the tracked side when there is a trackable wall on the non-tracked side. The idea is to simplify WallE3’s life by not having it dive into every side door it finds and then have to find its way back out again. Of course, I could just close the doors, but what’s the fun in that?

My current criteria for detecting the ‘open door’ condition is to look for situations where the tracking-side distance increases rapidly from the nominal tracking offset distance to a distance larger than some set ‘max tracking distance’ threshold, with the additional criteria that the non-tracking side distance is less than that same threshold. When this criteria is met, WallE3 will switch to tracking the ‘other’ side, and life is good.

However, it turns out that in real life this criteria doesn’t work very well, as many times WallE3’s tracking feedback loop sees the start of the open doorway just like any other wall angle change, and happily dives right into the room as shown at the very end of the following short video – oops!

In the above video, the robot easily navigates a 45º break at about 7-8 sec. At about 12 sec, an open doorway appears on the left (tracked) side, and the back of a kitchen counter appears on the right (non-tracked) side. What should happen is the robot will ‘see’ the left-side distance increase past the ‘max track’ threshold, while the right-side distance decreases below it, causing the robot to shift from left-side to right-side tracking. What actually happens is the left-side distance doesn’t increase fast enough, and the robot happily navigates around the corner and into the room.

So, what to do? I started thinking that the steering value (5th column from left) might be a reliable indicator that an anomaly has occurred that may (or may not) need attention. In the telemetry file below, the steering value goes out of range (-1 to +1) in two places – at the 45º break (7.9 – 8.1sec), and again at the ‘open doorway’ event at the end (13.3 sec). This (out-of-range steering value) condition is easy to detect, so maybe I could have the robot stop any time this happens, and then decide what to do based on relevant environment values. In the case of the 45º break, it would be apparent that the robot should continue to track the left-side wall, but in the case of the open doorway, the robot could be switched to right-side tracking.

The function that checks for anomalous conditions is UpdateAllEnvironmentParameters(WallTrackingCases trkdir), shown below:

Hmm, I see that I tried this trick before (last May) but commented it out, as shown below:

Looking back in time, I see a note where MAX_STEERING_VAL was added in September of 2022 for use by ‘RunToDaylight()’, which calls another function called ‘RotateToMaxDistance(AnomalyCode errcode)’. However, neither of these are active in the current code.

OK, back to reality. I plan to change the MAX_STEERING_VAL constant from 0.9 to 0.99, so steering values of 1 will definitely be larger, and most other values will not be.

Then I plan to uncomment ‘IsExcessiveSteerVal(trkdir)’, and the call to it in ‘UpdateAllEnvironmentParameters(trkdir)’. This should cause TrackLeft/Right to exit when the steering value goes to 1 as it does at the 45 break and the open doorway. Then, of course, the question is – what to do? I plan to have the robot stop (mostly for visible detection purposes), then move forward slightly so all three distance sensors are looking at the same environment, then turn parallel to the nearest wall, then start tracking again.

After the usual number of mistakes and bugfixes, I think I now have a working ‘out-of-range steerval’ recovery algorithm working implemented in WallE3_Complete_V4. Here’s a short video showing the action:

In the above video the robot goes past the end of the left wall at about 6sec, and actually starts to turn left before the ‘out of range’ condition is detected and the robot stops at 7sec. Then the robot moves ahead slowly for 0.5sec to ensure the distance sensors get a ‘clean look’ at the new environment. Then the robot spins very slightly clockwise due to a ‘RotateToParallelOrientation(TRACKING_RIGHT)’ call, and stops again (this is mostly for visual recognition purposes). Starting at 14sec, the robot starts tracking the right-hand wall. Below is the complete telemetry readout from the run:

31 July 2023 Update:

After cleaning up the code a bit, I decided to see how the new ‘excessive steerval’ algorithm handles a 45º break situation set up in my office. The following short video shows the action, followed by the recorded telemetry:

In the video, WallE3 negotiates the wall break without stopping – a result that was unexpected. Looking at the telemetry, I noticed that the steering value never exceeded the 0.99 threshold value for ‘excessive steerval’ detection. The short excerpt from the telemetry (immediately above) shows the time segment from 4.7 to 5.1sec, where the steering value can be seen to range from +0.1 to -0.8 and then back down to -0.1 as WallE3 goes around the break.

I actually think what happened here is the break angle wasn’t actually acute enough to drive the steering value in to the ‘excessive range’.

I made another run with the break angle increased to over 50º, and this did trigger the ‘excessive steerval’ condition. Here’s the video:

In the above video, the break occurs at about 4sec. The telemetry excerpt below shows how the ‘excessive steerval’ algorithm works through the situation, and then continues tracking the left side

02 August 2023 Update:

The ‘excess steering value’ algorithm works, but is not an unalloyed success. Here’s a run where WallE3 appears to negotiate the 50º break OK, but later dives nose-first into the wall – oops:

Here is the telemetry from this run:

Looking through the above telemetry, the ‘ANOMALY_EXCESS_STEER_VAL’ case was detected at 3.9sec (~5sec in video). WallE3 then stopped, performed a 53º CCW turn to parallel the new wall, moved ahead 1/2sec to make sure all left-side distance sensors were ‘seeing’ the new wall, and then started tracking the new wall. However, because WallE3 started from 59cm away, it caused another EXCESS_STEER_VAL anomaly at 10.7sec (~11sec in video). WallE3 again stopped, rotated about 51º CW to (re)parallel the wall, and then continued tracking, starting right at the correct offset of 30cm. At the very end of the run WallE3 ran off the end of the test wall, thus triggering a ‘ANOMALY_OPEN_CORNER’ anomaly.

So, I’m beginning to think that the ‘EXCESS_STEER_VAL’ algorithm might actually be working even better than I thought. I thought I might have to re-implement the ‘offset capture’ phase I had put in earlier and then took out, but this last run indicates that I might not have to.

I made another run, this time starting with WallE3 well outside the offset distance. The video and the telemetry are shown below:

As shown in the above video and telemetry, WallE3 does a good job of approaching and then capturing the desired offset. During the capture phase, the steering value rises from -0.17 to -0.99 (at 0.6sec, almost causing a ‘EXCESS_STEER_VAL’ anomaly detection), decreases to zero at 5.3sec (~3sec in video) and then goes positive with an offset distance of 36.6cm, as shown in the following excerpt:

The above shows that a separate ‘offset capture’ algorithm probably isn’t needed; either the robot will capture the offset without triggering an ‘excess steerval’ anomaly, or it will. If the anomaly is triggered, it will cause the robot to stop, turn to a parallel heading, and then restart tracking – which is pretty much exactly what the previous ‘offset capture’ algorithm did.

05 August 2023 Update:

I may have been a bit premature in saying that WallE3 didn’t need an ‘offset capture’ phase, as I have seen a couple of cases where the robot nose-dived into the opposite wall after trying to respond to an ‘Open Doorway’ condition. It worked before because the procedure was to track the ‘other’ wall at whatever distance the robot was at when the anomaly detection occurred. This obviated the need for an approach maneuver, and thus eliminated that particular opportunity to screw up. However, when I tried to add the constraint of tracking the ‘other’ wall at the desired 30cm offset, bad things happened – oops!

06 August 2023 Update:

I’ve been working on the ‘open corner’ problem, and although I think I have it solved, it isn’t very pretty at the moment. There are some ‘gotchas’ in how and when WallE3 actually updates its distance sensor values, so I think my current solution needs a bit more work. Here’s the video, telemetry and relevant code from a recent ‘open corner’ run in my office.

The video shows the robot stopping after detecting the ‘excessive steering value’ condition. Then it checks for left or right wall availability, and finding none, defaults to the ‘open corner?’ section. This section first commands a 90º deg turn in the direction of the last-tracked wall (left in this case), then moves ahead for 1sec to ensure that all three left-side distance sensors are ‘seeing’ the same wall. Then it calls RotateToParallelOrientation() to take out any initial off-angle orientation, and then calls TrackLeftWallOffset(). The first few times I tried this trick WallE3 just wasn’t cooperating, and when I added some more diagnostics telemetry, I saw that the side distances weren’t updating as I thought they should. I kind of brute-forced the problem by adding a 3-iteration ‘for’ loop with a 200mSec delay to see if there was some sort of latency issue, and this fixed the problem. Here’s the code section and the resulting telemetry:

As can be seen from the above results, the reported distances change dramatically from the first to the second iteration. Don’t quite know why at the moment, but it is definitely something I’ll have to figure out.

07 August 2023 Update:

I modified the code to show the elapsed time in milliseconds rather than decimal seconds to highlight the time differences between distance sensor reads. Here’s the same run with the new times shown:

As can be seen from the above telemetry, the first timed distance printout occurs at 4961mSec, and shows the ‘open corner’ situation (both left and right distances greater than the max tracking distance of 100cm). This first readout occurs after the EXCESS_STEERVAL detection, stop, and subsequent 500mSec ‘skosh’. The second printout is 1mSec later from inside the ‘else //open corner? ‘if’ statement and shows the same distances. Then the robot does the 90º CCW turn and 1sec translation.

Then the next distance readouts both occur at 8046 (about 3 sec later) and are from inside the ‘for’ loop, after a call to ‘UpdateAllDistances()’. This set of 2 readouts still show the ‘open corner’ condition with left/right of 271.3/155.4 cm, and left front, center and rear measurements of 28.4, 271.3 and 325.0 cm respectively. I’m not at all sure why, after the approximately 3sec required for the 90º CCW turn and 1sec translation, that the subsequent UpdateAllDistances() call didn’t return updated measurements.

However, 200mSec later at 8258mSec, the same set of printout does show updated measurements – left/right = 27.4/148.4, left front, center, and rear = 25.1, 27.4, 29.0 cm

Its a mystery!

08 August 2023 Update:

In order to troubleshoot the problem described above where distances didn’t seem to update in a timely manner, I modified the ‘distances only’ feature of the program to print out left/right distances while moving in a straight line between two walls in a ‘V’ shape. The idea is to see whether or not the distances are updating each time the program cycles through the (currently 50mSec) update loop. Any hiccups should show up as ‘flat spots’ in the plot of distances vs time. The run is shown in the short video below, along with an Excel chart of the results:

Then I used Excel’s conditional formatting feature to highlight any cells containing duplicate distance values – the ‘flat spots’ I referred to earlier

As can be seen from the above, there were two sets of duplicates in the left distance column, and three sets in the right distance column. These duplicates could be an artifact of robot speed – i.e. the robot may not be moving fast enough to actually ‘see’ a different distance in the 50mSec between measurements.

I ran the experiment again after bumping the robot’s base speed from MOTOR_SPEED_QTR to MOTOR_SPD_HALF. This time, I got the following:

Hmm, I’m a bit worried about the four duplicate right-side distances at the start, but I think they are a ‘startup artifact’ (at least I hope so). The only other set of dupes is rows 16/17 in the left distance column. All in all, I think everything is working OK.

The differences between the code that produced the above results and the actual code that produced the problem is the actual code performs a ‘Spin Turn’ of 90º followed by a motor run of 1sec between the first and second distance measurements. These two maneuvers take about 4 Sec, so the distance sensors should have updated at least 80 times, and at least 20 times during the 1Sec straight motor run after the 90º turn.

The only other possible difference is the actual code calls ‘UpdateAllDistances()’, while the ‘V Run’ test code uses ‘UpdateAllEnvironmentalParameters()’ (which calls ‘UpdateAllDistances()’ as part of its update process).

09 August 2023 Update:

I changed the line

To

And re-ran the ‘V’ distance test. The results were basically identical as shown below

So, I think it is safe to say that there is no difference in behavior from the use of UpdateAllEnvironmentParameters() and UpdateAllDistances(), which should be a no-brainer as UpdateAllEnvironmentParameters() uses UpdateAllDistances() to update the distances.

Now I’m left with the situation described by Sherlock Holmes – “When you have eliminated all which is impossible then whatever remains, however improbable, must be the truth.

So, maybe those ‘incidental’ duplicate distance values highlighted in the above runs are actually real? In the first run there were two sets of two dupes on the left side right at the start, and the other two runs there was one set of four dupes (one on the left, one on the right) at the start. If they are real, then that means it took about 100mSec to clear the dupe on the first run, and about 200mSec on the 2nd and 3rd runs. This could be consistent with the behavior shown in the actual ‘open corner’ experiment, where the first measurement after the 1Sec movement was the same as the last measurement before the movement started.

Now I’m beginning to suspect that there is some sort of buffering going on in the VL53L1X distance sensors, maybe due to the recent change to a 50mSec update rate. If the returned measurement was always one (or two?) measurement(s) behind, then that would explain why the measurement(s) received after the 90º turn and subsequent 1Sec motor run was the same as the one(s) before.

I changed the update interval from 50mSec to 75mSec to see if that eliminated the dupes in the ‘V’ run, and got the following:

Inconclusive; there wasn’t a block of four dupes at the start, but this could be just due to physics, as the robot would have moved 50% farther in 75 vs 50mSec during each measurement interval.

I changed my wall configuration back to the ‘open corner’ setup, and made another run using the code that showed the problem before (and with the 3-iteration loop that ‘solved’ it), just to make sure I still had a good baseline.

The baseline run showed the same behavior as before (whew!) so now that I have a solid baseline I can begin to troubleshoot. Again, the code that attempts to solve the problem by looping through ‘UpdateAllDistances()’ is shown below:

The first thing I tried was reducing the delay from 200 to 50mSec. This gave me the following output:

Note that the ‘At top of loop()’ measurement readout of left/right = 197.7/160.9 at 7413mSec and the ‘After RunBothMotors()’ readout of left/right = 197.7/160.9 at 10487mSec are before and after respectively, the SpinTurn() call (approx 2Sec) and the RunBothMotors() call (approx 1Sec). 30.3/29.4/35.8

Hmm, This time the distance report after the very first UpdateAllDistances() call shows reasonable (not great, but reasonable) numbers for left-side front/center/rear distances – 30.6/38.2/49.5. 50mSec later, the report after the 2nd call to UpdateAllDistances() shows 30.3/29.4/35.8. Note that the center and rear distance readings changed quite a bit (about 9 and 14cm respectively) even though the robot wasn’t moving. Clearly something is ‘catching up’ here.

Next I reduced the loop delay() call from 50 to 1mSec and re-ran the experiment, as shown below:

Well, that’s odd! the left-side rear distance is shown as 338.4cm for all three loop iterations! That can’t be right, as the starting front/rear/steerval averages for the orientation turn were 30.04/32.11/-0.21, so now I have no idea what’s going on. I made another run and this time printed out the 10-point average of front/rear distance values performed by RotateToParallelOrientation(). This is what I got:

The run itself was a ‘failure’ in that the robot performed the ‘parallel orientation maneuver’, it made a huge turn in the wrong direction. Looking at the data, it is clear that the culprit here is the first term (215.40cm) in the ‘rear’ 10-point average above. This value comes from the ‘rear’ distance value reported all three times in the 3-iteration loop, utilizing a 1mSec loop delay.

I think what I am seeing here is due to the minimum cycle time of the VL53L1X distance sensors. from the setup code for the VL53L1X sensors I see:

Which I believe means that the minimum time required for a new value to appear in the VL53L1X buffer is 50mSec. For larger distances, the time required might be longer. So that explains why, if the distance reading at the start of my little 3-iteration loop is ‘X’ and the loop time is 1mSec, it is highly likely that all three iterations will report ‘X’. This still doesn’t tell me why the initially reported value is ‘X’ when the robot has moved so that it is at a completely different physical distance from the wall when ‘UpdateAllDistances()’ is called.

Next, I tried breaking the 1Sec ‘move’ into three separate moves, with a call to ‘UpdateAllDistances()’ at the breaks. This shouldn’t make any physical difference, as the moves will all run together. The difference is that the distances reported at the end of the travel should be much more accurate.

Well that trick didn’t work at all! Even with the motor run broken up into 4 250mSec pieces, the errant measurement still showed up as the first element of the 10-element averaging array – bummer!

I made another run, but this time I moved the 200mSec delay in the 10-element averaging loop in front of the measurement, so there would be an initial 200mSec delay before the first element is written.

Aha! This run worked fine, as shown in the telemetry below:

Note that this run was also performed with the 3-element ‘catchup’ loop commented out. Also, it is clear from the above telemetry that the ‘chunked’ motor run wasn’t effective at all, as even the last reported left-side rear distance still shows over 300cm.

It is interesting that the progression from ‘no wall in sight’ to ‘a wall in sight’ can be seen as the robot progresses; the left-side front distance starts at 59.2cm and decreases to the correct value of 27.4cm, while the left-side center distance starts at 286cm, stays the same for the next two measurements, and then decreases dramatically from 296.6 to 46.4cm (still wrong, but better), and the left-side rear distance stays above 300 for all measurements. It is clear from this data that what is being reported is somewhat behind the actual robot position. This may be in part a result of the VL53L1X measurement physics – as the sensor uses a ‘cone’ of light to illuminate the environment and then (I think) computes the histogram of the results. If more of the rear sensor is still ‘seeing’ past the open corner, it will report a ‘no wall’ result.

So, I think we are looking at a two-part problem. The first part is due to sensor physics; the sensor has to ‘see’ the wall throughout it’s FoV (Field of View) cone to produce an accurate measurement. The datasheet shows the default FoV to be 27º, so this is a reasonable conjecture IMHO. The second part is an apparent time lag from the time the FoV changes (in this case – from ‘no wall’ to ‘wall’) to the time the new distance value shows up at the output in response to a measurement request in the code. As I found out from the above experiments, this second issue seems to be completely solved by moving the 200mSec loop delay in the part of the RotateToParallelOrientation() routine that computes a 10-element average to the front of the loop(), so it takes place before the first measurement request. I now believe this 200mSec delay gives the sensors time to ‘catch up’ to the actual environment.

I made a ‘confirmation run’ with the following setup:

  • WALL_TRACK_UPDATE_INTERVAL_MSEC = 50
  • ‘Chunked’ motor run removed – now one 1000mSec run
  • 3-element ‘measurement catchup’ loop replaced by single 200mSec delay followed by a single call to UpdateAllDistances();
  • 200mSec loop delay in RotateToParallelOrientation() 10-element averaging loop moved to top so that it executes before the first measurement request.

Yay! It all worked! Here’s a short video and the relevant telemetry from the run:

Everything looks good with the above results – I think I can now put the ‘open corner’ issue to bed. As a side-benefit, I think I have also improved the function of ‘RotateToParallelOrientation()’, so other calling functions will benefit as well. A quick search of the code shows 7 or 8 places where the function is called.