WallE3_Complete_V5 Code Cleanup

Posted 25 August 2023

After (I hope) getting WallE3’s wall-switching feature figured out, I am trying to clean up the code in WallE3_Complete_V5:

  • Enums: Removed NavCases, TrackingState enums. Removed ‘OPEN_CORNER’, ‘OPEN_DOORWAY’ and ‘TRACKING_WRONG_WALL’ anomaly codes from AnomalyCode enum list, and corresponding strings from AnomalyStrArray. Removed TRACKING_LEFT/RIGHT_CAPTURE entries from WallTrackingCases enum list, and corresponding strings from WallTrackStrArray. Now there are only three enums in enums.h.
  • Cleaned out unused/commented-out PID values from WALL_FOLLOW_SUPPORT #pragma region
  • Deleted previously commented-out global boolean variables (see this post for details)
  • Removed dead code from loop() #pragma region WALL_TRACKING section. Now this section contains only one line – ‘HandleAnomalousConditions(gl_CurTrackingCase);’
  • Removed dead code from UpdateAllEnvironmentParameters()
  • Removed IsOpenCorner() (was already commented out)
  • Removed IsTrackingWrongWall() (was already commented out)
  • Removed CheckForAnomalies() (was already commented out)
  • Added braces to ANOMALY_NONE: case in HandleAnomalousConditions() to eliminate any possible scoping issues.
  • Removed dead code from ExecuteStuckRecoveryManeuver()
  • Removed dead code from ExecuteRearObstacleRecoveryManeuver()

When done, The code cleanup resulted in about 700 fewer lines of code (~6400 vs ~7100)

Troubleshooting Wall Switch Maneuver

Posted 18 August 2023

I’m running some ‘open doorway’ tests in my office test setup, and I keep getting an uncontrolled spin maneuver at the end of the run. The following short video illustrates the problem:


And here is the telemetry for the above run:

The very first time through the program, gl_LastAnomalyCode is ANOMALY_NONE, so the CASE statement gets skipped entirely (this is why there are two “Just after UpdateAllEnvironmentParameters() at top of loop()…” printouts here, but only one thereafter).

At 2.7Sec an anomaly (ANOMALY_EXCESS_STEER_VAL) is detected when the robot hits the gap in the right-hand wall. This causes the program to start over at the top of loop(), and this time gl_LastAnomalyCode = ANOMALY_EXCESS_STEER_VAL, which causes HandleExcessSteervalCase() to be called. HandleExcessSteervalCase() in turn calls TrackLeftWallOffset(), which in turn calls CaptureWallOffset(TRACKING_LEFT,78.4).

The capture operation moves the robot to the ‘other’ wall and turns the robot parallel, and then continues TrackLeftWallOffset().

From 19.0 to 21.2Sec the robot tracks the left wall, where it again detects a ANOMALY_EXCESS_STEERVAL anomaly. Unfortunately, due to the proximity of a bookshelf on the left, the reported left/right distances at this point are 35.9/63.7cm, so the program calls TrackLeftWallOffset() instead of TrackRightWallOffset(). Because the starting distance (3.59Cm) is less than 1.5*[desired offset of 30cm], the capture phase is skipped entirely and TrackLeftWallOffset() is continued. The column header line is printed, but the program immediately detects an ANOMALY_EXCESS_STEERVAL anomaly. From the video it looks like the robot should have tracked the left wall (the book case) for a second or so.

The bottom line (I think) is that I had a flawed test wall configuration; I wanted the robot to transition back from the left wall to the right one, but the end of the left wall was too close to the actual left wall of my office – bummer.

23 August 2023 Update:

I think I may have finally gotten the multiple-wall-switch scenario working properly! Here’s a short video and the telemetry for the run:

The robot starts out tracking the right wall. At about 3sec, it hits the first wall gap and generates an EXCESSIVE_STEER_VAL anomaly. This caused the program to exit wall tracking and restart the loop() function from the top.

It took another 2.7sec to figure out it needed to transition to the left wall, calling “TrackLeftWallOffset: Start tracking offset of 30cm” at 5.7sec. This involved an ‘offset capture step, so the actual left wall tracking phase didn’t begin until about 13sec.

At 15.8sec the robot generated another EXCESSIVE_STEER_VAL anomaly, again causing the program to exit wall tracking and restart the loop() function from the top. This time the anomaly was caused by running out of wall on the left side, so the robot has to transition back to the right wall – that was what was supposed to happen. What actually happened is the robot still thought there was a left wall available at 36.4cm and a right wall at 52.4cm, so it started tracking the (nonexistent) left wall again. Of course this cause an immediate EXCESSIVE_STEER_VAL anomaly, and this time the left/right distance was 127.9/50.6, so this time the robot correctly transitioned back to the right wall, starting at 19.2sec. The transition involved an ‘offset capture’ step, so the actual right-wall tracking operation started at 29.1sec.

All in all, I thought this was a very successful run, with the minor nit about momentarily trying to track a non-existent wall. I think this may be an instance where distance reporting is lagging slightly behind reality.

24 August 2023 Update:

I added a second 200mSec delay and a second call to ‘UpdateAllEnvironmentParameters()’ to troubleshoot the above ‘lagging distance problem’, but it didn’t solve the problem – still get the same problem with a ‘phantom left wall’ measurement. After looking at the code a bit, I now see there is a definite coding problem – oops!

HandleAnomalousCondition vs Switch(gl_LastAnomalyCode)

Posted 14 August 2023

In previous versions of the WallE3 operating system I used three functions to deal with anomalous conditions. At each update interval (currently set to 50mSec) I called a function called UpdateAllEnvironmentalParameters() which updated all sensor measurements; then a second function called CheckForAnomalousConditions() to update anomaly condition flags (like ‘gl_DeadBattery’, ‘gl_bStuckAhead’ and the like), and a third function, called HandleAnomalousConditions() to actually respond properly to any anomalous condition identified in CheckForAnomalousConditions().

However, I wasn’t really happy with the program flow with this arrangement, so in the current version (WallE3_Complete_V4) I eliminated CheckForAnomalousConditions() entirely and pulled the anomaly flag update code into UpdateAllEnvironmentalParameters(). I also replaced HandleAnomalousConditions() with a ‘switch(gl_LastAnomalyCode)’ block at the top of the loop() function, just before the code that decides which wall (left or right) is to be tracked. So now the program flow looks like this:

So, very simple and very direct. The ‘case’ block looks like this (only the ‘ExcessSteerVal’ case has been populated at the moment):

I think this is going to be a robust flow; Any anomaly causes the current tracking operation to break and return control to the top of loop(). Then the case statement handles the anomaly as required, and drops the program right back into left/right wall tracking determination. If no anomalies are encountered (unlikely, but could happen) then the tracking block runs forever.

15 August 2023 Update:

I’m working on populating the ‘ANOMALY_STUCK_AHEAD’ case in the ‘switch (gl_LastAnomalyCode)’ block. The function ExecuteStuckRecoveryManeuver(WallTrackingCases trkdir) already exists, but it requires a ‘WallTrackingCases trkdir’ parameter to specify which wall is currently being tracked. I have a global ‘TrackingCases’ variable called ‘gl_CurTrackingCase’, but it isn’t clear to me how it is initiated and updated in the code.

Searching for ‘gl_CurTrackingCase’ produces the following hits:

  • it is initialized to ‘WallTrackingCases::TRACKING_NEITHER’ in the pre-setup initialization block
  • It is set to ‘TRACKING_LEFT’ at the top of TrackLeftWallOffset()
  • It is set to ‘TRACKING_RIGHT’ at the top of TrackRightWallOffset()
  • In ‘HandleExcessSteervalCase()’ it is updated to either TRACKING_LEFT or TRACKING_RIGHT, depending on current left/right distances, and then used as the input to several ‘RotateToParallelOrientation()’ and ‘SpinTurn()’ calls.

Looking at the above list, it appears that the value stored in gl_CurTrackingCase by the TrackLeftWallOffset() & TrackRightWallOffset() functions is never used; in HandleExcessSteervalCase() (the only function that references gl_CurTrackingCase), the value of the variable is updated locally by checking current left/right wall distances. IOW, the initialization lines in TrackLeftWallOffset() & TrackRightWallOffset() could be removed and the references to gl_CurTrackingCase in HandleExcessSteervalCase() could be converted to local variables and nothing would change.

So it appears that there are two ways to skin this cat.

  • Keep gl_CurTrackingCase as a global variable that is updated in TrackLeftWallOffset() & TrackRightWallOffset() to reflect the current tracking case, and then reference it in HandleExcessSteervalCase() without regenerating the value from left/right wall distances. Keep the current definition of ExecuteStuckRecoveryManeuver(WallTrackingCases trkdir) and call it from the ‘ANOMALY_STUCK_AHEAD’ case block, with ‘trkdir’ replaced by gl_CurTrackingCase.
  • Remove the ‘gl_CurTrackingCase’ global variable entirely, and use current left/right distances to determine the tracking case anywhere it is needed.

Of these two options, I prefer the first one. At the point where either TrackLeftWallOffset() & TrackRightWallOffset() are called, the tracking case is known current left/right distance comparisons, and inside either tracking function, the tracking case doesn’t change. When the active tracking function exits to the top of loop() due to an anomaly (the only way it can exit), the extant tracking case is still what it was at function exit. The only way it should change is from another left/right distance comparison after the current anomaly has been resolved.

16 August 2023 Update:

Oops! I ‘simplified’ HandleExcessSteervalCase() and now it doesn’t work anymore – ugh! The problem is I didn’t have a good understanding of how this function decides which wall to track. Its usually (but not necessarily) the other wall from the one identified in gl_CurTrackingCase, so I actually have to check left/right distances as was done in the ‘unsimplified’ version. So most of the time, HandleExcessSteervalCase() will change gl_CurTrackingCase to the ‘other’ wall

19 August 2023 Update:

Well, things are a bit more complicated than I thought. While chasing other bugs, I realized that there are other places in the code that use anomaly detection – oops. Now I have a CASE block at the top of loop() that I thought replaced the code in HandleAnomalousConditions(), but now I see that it only replaced one usage – there are several more places in the program that still use the function

  • 2 places in ExecuteStuckRecoveryManeuver()
  • MoveToDesiredRightDistCm()
  • MoveToDesiredLeftDistCm() – it should be in there but isn’t – strange!
  • MoveToDesiredFrontDistCm() – it should be in there but isn’t – strange!
  • MoveToDesiredRearDistCm() – it should be in there but isn’t – strange!

So this is a real problem – I have a CASE block doing anomaly handling at the top of loop(), and HandleAnomalousConditions() doing the same thing (but different code!) in several other places in the program – yikes!

Here is the code for HandleAnomalousConditions:

So HandleAnomalousConditions() is just a bunch of if and ‘else if’ blocks looking at a bunch of global boolean variables denoting different anomaly conditions. The boolean variables are updated in UpdateAllEnvironmentParameters(), and the associated anomaly code is loaded into gl_LastAnomalyCode.

So, the current program flow associated with anomalies goes something like this:

  • UpdateAllEnvironmentParameters() is called each time through the timing loop for any function that has a timing loop. It updates all the anomaly-related global boolean variables.
  • HandleAnomalousConditions() is also called each time through, and it both updates gl_LastAnomalyCode with the current anomaly condition, AND actually takes action to address the anomaly condition (either directly or by calling a specific handler function)
  • The new CASE block at the top of loop() switches on the value of gl_LastAnomalyCode (as updated by HandleAnomalousConditions()) and ALSO attempts to address the current anomaly condition before dropping back into either TrackLeftWallOffset() or TrackRightWallOffset().

So it appears that the CASE block and HandleAnomalousConditions() are doing the same thing, but at different program scope levels; The CASE block only runs at the top of loop(), but HandleAnomalousConditions() is ‘local’ to most functions that have their own ‘local’ operating loops. Moreover there are two duplicative ways to describe anomalies – the various global boolean variables like ‘gl_bStuckAhead’ and the global enumerated values for gl_LastAnomalyCode like ‘ANOMALY_STUCK_AHEAD’.

The idea behind the enumerated anomaly codes was to consolidate anomaly detection in ‘while’ loops to just something like ‘while (gl_LastAnomalyCode == ANOMALY_NONE)’ rather than having to list all applicable error conditions with something like ‘while(!gl_bStuckAhead && !gl_bStuckBehind && !gl_ObstacleAhead …). If an anomaly was detected, then the idea was that the function would exit in a way that caused the main loop() function to run again from the top, and the CASE block would actually respond to the anomaly. In this scheme, error handling is removed from the context in which the error occurred – probably OK for TrackLeft/RightWallOffset(), but not so much for MoveToDesiredFront/Back/Left/RightDistance() as the potential anomalies are few and the response to those errors are heavily context dependent (I think).

So now I’m beginning to think that this entire ANOMALY_XXX thing (with accompanying enum.h) is a bust, and unneeded. Part of the reason for doing it was to (as noted above) avoid having long strings of “!gl_bStuckAhead && !gl_bStuckBehind && !gl_ObstacleAhead …” in the ‘while’ statements. It just occurred to me that maybe I could consolidate these into a function call, like ‘IsAnomaly()’ that would return TRUE if any anomaly was found; so now the ‘while’ statement would look like:

And ‘IsAnomaly() would just be a big OR string.

So I can replace the big CASE statement at the top of loop() with just

if (gl_bExcessiveSteerVal) {....}

OK, so I created another ‘clean’ version – WallE3_Complete_V5 to see whether or not I can remove the ANOMALY_XXX stuff without completely screwing up the code. First I made sure that _V5 would compile clean – check. Then I commented out the ANOMALY_XXX lines from enum.h. This blew a bunch (as in 28) of errors, so I started going through them from top to bottom:

Not going to work; I can (and did) create the IsAnomaly() function, but using it instead of ‘&& gl_LastAnomalyCode == ANOMALY_NONE eliminates the ability to print out the name of the anomaly that caused the loop to break – and I want to keep this feature.

I guess I could simply replace the code in ‘HandleAnomalousConditions() with the CASE block that switches on gl_LastAnomalyCode, and that would at least get me to the point of having only ONE function that deals with anomaly handling.

So, first order of business is to try and eliminate the global gl_bXXX anomaly variables: In UpdateAllEnvironmentParameters() we have:

So a typical line like


would be replaced with:

This would allow human-readable printout of the exact anomaly type, and the use of a CASE block with

  • Replaced all gl_bxxx lines in UpdateAllEnvironmentParameters() with ‘if(xx) statements & recompiled – OK
  • Replaced the code in ‘HandleAnomalousConditions()’ with a CASE block vs ‘if’ and ‘elseif’ statements – Recompiled OK.
  • Replaced the ‘switch (gl_LastAnomalyCode)’ statement at the top of loop() with a call to HandleAnomalousConditions(gl_CurTrackingCase). Recompiled OK
  • Did a search for each gl_bXXX to make sure it was no longer used:
    • gl_bIRBeamAvail: moved the ‘if(gl_IRBeamAvail)’ code inside #pragma region IR_HOMING into the ANOMALY_IR_BEAM_AVAILABLE case inside HandleAnomalousConditions. The call to UpdateIRHomingValues() here isn’t needed – it is called by IsIRBeamAvail() in UpdateAllEnvironmentParameters(). Also removed it from ‘while (gl_LastAnomalyCode == ANOMALY_NONE && !gl_bIRBeamAvail)’ statement in TrackLeft/RightWallOffset(). Now compiles OK w/o gl_bIRBeamAvail defined.
    • gl_bChgConnect: Can’t remove because it is state memory for charge connnect/disconnect state.
    • gl_bObstacleAhead: Compiles OK w/o gl_bObstacleAhead
    • gl_bWallOffsetDistAhead: Compiles OK w/o gl_bWallOffsetDistAhead
    • gl_bObstacleBehind: In ExecuteStuckRecoveryManeuver() replaced ‘gl_bObstacleBehind’ with equivalent ANOMALY_CODE treatments
    • gl_bStuckAhead/Behind: Basically the same as for gl_bObstacleBehind
    • gl_DeadBattery: No changes required; the only place it was used had already been replaced by a call to IsDeadBattery() and assignment of ANOMALY_DEAD_BATTERY to gl_LastAnomalyCode.
    • gl_bTrackingWrongWall: No changes required; It never found a home in the current program – and there is no existing comparable ANOMALY_CODE enumeration.
    • gl_bOpenCorner/gl_bOpenDoorway: no No changes required; these anomalies are both handled as sub-cases of ANOMALY_EXCESSIVE_STEERVAL
    • gl_bIsSpinning: No changes required; It never found a home in the current program – and there is no existing comparable ANOMALY_CODE enumeration. This may come back at some time – but if it does, it will be via an added ANOMALY_CODE enumeration – not a global bool.
    • gl_bExcessiveSteerVal: No changes required; Replaced by a call to IsExcessiveSteerVal() and assignment of ANOMALY_EXCESS_STEER_VAL to gl_LastAnomalyCode.

OK – now I have eliminated all global booleans associated with anomalies – everything now is handled in the switch(gl_LastAnomalyCode) CASE block. At this point I’m going to commit to my local Git repository – with all the old code still in the codebase but commented out until I can run some basic tests to see how badly I have screwed everything up.

22 August 2023 Update:

This morning I uploaded _V5 to the robot and set it loose on my ‘two wall changes’ test configuration, and I’m happy to say that it performed identically to what it was doing prior to the changes described above. It even failed in almost exactly the same way at the end with the ”aggressive spin move” behavior (which I’m almost sure is due to calling SpinTurn() with a very large rotation value).

24 August 2023 Update:

I think I now have the ‘right-left-right’ wall switching algorithm working fairly well now, and the actual code is starting to look a lot like the flow diagram posted above. The latest change was to move left/right wall selection code out of loop() and into ‘HandleAnomalousConditions()’, bringing the code more into line with the flow diagram. Now the robot goes through both transitions (right-to-left and left-to-right) with no problem and no duplicative steps – yay!

Stay tuned,

Frank

Using Out-of-Range Steerval for Anomaly Detection

Posted 29 July 2023

WallE3 has to be able to handle anomalous conditions as it wanders around our house. An anomalous condition might be running into an obstacle and getting stuck, or sensing an upcoming wall (in front or in back). It generally does a pretty good job with these simple situations, but I have been struggling lately with what I refer to as ‘the open door’ problem. The open door problem is the challenge of bypassing an open doorway on the tracked side when there is a trackable wall on the non-tracked side. The idea is to simplify WallE3’s life by not having it dive into every side door it finds and then have to find its way back out again. Of course, I could just close the doors, but what’s the fun in that?

My current criteria for detecting the ‘open door’ condition is to look for situations where the tracking-side distance increases rapidly from the nominal tracking offset distance to a distance larger than some set ‘max tracking distance’ threshold, with the additional criteria that the non-tracking side distance is less than that same threshold. When this criteria is met, WallE3 will switch to tracking the ‘other’ side, and life is good.

However, it turns out that in real life this criteria doesn’t work very well, as many times WallE3’s tracking feedback loop sees the start of the open doorway just like any other wall angle change, and happily dives right into the room as shown at the very end of the following short video – oops!

In the above video, the robot easily navigates a 45º break at about 7-8 sec. At about 12 sec, an open doorway appears on the left (tracked) side, and the back of a kitchen counter appears on the right (non-tracked) side. What should happen is the robot will ‘see’ the left-side distance increase past the ‘max track’ threshold, while the right-side distance decreases below it, causing the robot to shift from left-side to right-side tracking. What actually happens is the left-side distance doesn’t increase fast enough, and the robot happily navigates around the corner and into the room.

So, what to do? I started thinking that the steering value (5th column from left) might be a reliable indicator that an anomaly has occurred that may (or may not) need attention. In the telemetry file below, the steering value goes out of range (-1 to +1) in two places – at the 45º break (7.9 – 8.1sec), and again at the ‘open doorway’ event at the end (13.3 sec). This (out-of-range steering value) condition is easy to detect, so maybe I could have the robot stop any time this happens, and then decide what to do based on relevant environment values. In the case of the 45º break, it would be apparent that the robot should continue to track the left-side wall, but in the case of the open doorway, the robot could be switched to right-side tracking.

The function that checks for anomalous conditions is UpdateAllEnvironmentParameters(WallTrackingCases trkdir), shown below:

Hmm, I see that I tried this trick before (last May) but commented it out, as shown below:

Looking back in time, I see a note where MAX_STEERING_VAL was added in September of 2022 for use by ‘RunToDaylight()’, which calls another function called ‘RotateToMaxDistance(AnomalyCode errcode)’. However, neither of these are active in the current code.

OK, back to reality. I plan to change the MAX_STEERING_VAL constant from 0.9 to 0.99, so steering values of 1 will definitely be larger, and most other values will not be.

Then I plan to uncomment ‘IsExcessiveSteerVal(trkdir)’, and the call to it in ‘UpdateAllEnvironmentParameters(trkdir)’. This should cause TrackLeft/Right to exit when the steering value goes to 1 as it does at the 45 break and the open doorway. Then, of course, the question is – what to do? I plan to have the robot stop (mostly for visible detection purposes), then move forward slightly so all three distance sensors are looking at the same environment, then turn parallel to the nearest wall, then start tracking again.

After the usual number of mistakes and bugfixes, I think I now have a working ‘out-of-range steerval’ recovery algorithm working implemented in WallE3_Complete_V4. Here’s a short video showing the action:

In the above video the robot goes past the end of the left wall at about 6sec, and actually starts to turn left before the ‘out of range’ condition is detected and the robot stops at 7sec. Then the robot moves ahead slowly for 0.5sec to ensure the distance sensors get a ‘clean look’ at the new environment. Then the robot spins very slightly clockwise due to a ‘RotateToParallelOrientation(TRACKING_RIGHT)’ call, and stops again (this is mostly for visual recognition purposes). Starting at 14sec, the robot starts tracking the right-hand wall. Below is the complete telemetry readout from the run:

31 July 2023 Update:

After cleaning up the code a bit, I decided to see how the new ‘excessive steerval’ algorithm handles a 45º break situation set up in my office. The following short video shows the action, followed by the recorded telemetry:

In the video, WallE3 negotiates the wall break without stopping – a result that was unexpected. Looking at the telemetry, I noticed that the steering value never exceeded the 0.99 threshold value for ‘excessive steerval’ detection. The short excerpt from the telemetry (immediately above) shows the time segment from 4.7 to 5.1sec, where the steering value can be seen to range from +0.1 to -0.8 and then back down to -0.1 as WallE3 goes around the break.

I actually think what happened here is the break angle wasn’t actually acute enough to drive the steering value in to the ‘excessive range’.

I made another run with the break angle increased to over 50º, and this did trigger the ‘excessive steerval’ condition. Here’s the video:

In the above video, the break occurs at about 4sec. The telemetry excerpt below shows how the ‘excessive steerval’ algorithm works through the situation, and then continues tracking the left side

02 August 2023 Update:

The ‘excess steering value’ algorithm works, but is not an unalloyed success. Here’s a run where WallE3 appears to negotiate the 50º break OK, but later dives nose-first into the wall – oops:

Here is the telemetry from this run:

Looking through the above telemetry, the ‘ANOMALY_EXCESS_STEER_VAL’ case was detected at 3.9sec (~5sec in video). WallE3 then stopped, performed a 53º CCW turn to parallel the new wall, moved ahead 1/2sec to make sure all left-side distance sensors were ‘seeing’ the new wall, and then started tracking the new wall. However, because WallE3 started from 59cm away, it caused another EXCESS_STEER_VAL anomaly at 10.7sec (~11sec in video). WallE3 again stopped, rotated about 51º CW to (re)parallel the wall, and then continued tracking, starting right at the correct offset of 30cm. At the very end of the run WallE3 ran off the end of the test wall, thus triggering a ‘ANOMALY_OPEN_CORNER’ anomaly.

So, I’m beginning to think that the ‘EXCESS_STEER_VAL’ algorithm might actually be working even better than I thought. I thought I might have to re-implement the ‘offset capture’ phase I had put in earlier and then took out, but this last run indicates that I might not have to.

I made another run, this time starting with WallE3 well outside the offset distance. The video and the telemetry are shown below:

As shown in the above video and telemetry, WallE3 does a good job of approaching and then capturing the desired offset. During the capture phase, the steering value rises from -0.17 to -0.99 (at 0.6sec, almost causing a ‘EXCESS_STEER_VAL’ anomaly detection), decreases to zero at 5.3sec (~3sec in video) and then goes positive with an offset distance of 36.6cm, as shown in the following excerpt:

The above shows that a separate ‘offset capture’ algorithm probably isn’t needed; either the robot will capture the offset without triggering an ‘excess steerval’ anomaly, or it will. If the anomaly is triggered, it will cause the robot to stop, turn to a parallel heading, and then restart tracking – which is pretty much exactly what the previous ‘offset capture’ algorithm did.

05 August 2023 Update:

I may have been a bit premature in saying that WallE3 didn’t need an ‘offset capture’ phase, as I have seen a couple of cases where the robot nose-dived into the opposite wall after trying to respond to an ‘Open Doorway’ condition. It worked before because the procedure was to track the ‘other’ wall at whatever distance the robot was at when the anomaly detection occurred. This obviated the need for an approach maneuver, and thus eliminated that particular opportunity to screw up. However, when I tried to add the constraint of tracking the ‘other’ wall at the desired 30cm offset, bad things happened – oops!

06 August 2023 Update:

I’ve been working on the ‘open corner’ problem, and although I think I have it solved, it isn’t very pretty at the moment. There are some ‘gotchas’ in how and when WallE3 actually updates its distance sensor values, so I think my current solution needs a bit more work. Here’s the video, telemetry and relevant code from a recent ‘open corner’ run in my office.

The video shows the robot stopping after detecting the ‘excessive steering value’ condition. Then it checks for left or right wall availability, and finding none, defaults to the ‘open corner?’ section. This section first commands a 90º deg turn in the direction of the last-tracked wall (left in this case), then moves ahead for 1sec to ensure that all three left-side distance sensors are ‘seeing’ the same wall. Then it calls RotateToParallelOrientation() to take out any initial off-angle orientation, and then calls TrackLeftWallOffset(). The first few times I tried this trick WallE3 just wasn’t cooperating, and when I added some more diagnostics telemetry, I saw that the side distances weren’t updating as I thought they should. I kind of brute-forced the problem by adding a 3-iteration ‘for’ loop with a 200mSec delay to see if there was some sort of latency issue, and this fixed the problem. Here’s the code section and the resulting telemetry:

As can be seen from the above results, the reported distances change dramatically from the first to the second iteration. Don’t quite know why at the moment, but it is definitely something I’ll have to figure out.

07 August 2023 Update:

I modified the code to show the elapsed time in milliseconds rather than decimal seconds to highlight the time differences between distance sensor reads. Here’s the same run with the new times shown:

As can be seen from the above telemetry, the first timed distance printout occurs at 4961mSec, and shows the ‘open corner’ situation (both left and right distances greater than the max tracking distance of 100cm). This first readout occurs after the EXCESS_STEERVAL detection, stop, and subsequent 500mSec ‘skosh’. The second printout is 1mSec later from inside the ‘else //open corner? ‘if’ statement and shows the same distances. Then the robot does the 90º CCW turn and 1sec translation.

Then the next distance readouts both occur at 8046 (about 3 sec later) and are from inside the ‘for’ loop, after a call to ‘UpdateAllDistances()’. This set of 2 readouts still show the ‘open corner’ condition with left/right of 271.3/155.4 cm, and left front, center and rear measurements of 28.4, 271.3 and 325.0 cm respectively. I’m not at all sure why, after the approximately 3sec required for the 90º CCW turn and 1sec translation, that the subsequent UpdateAllDistances() call didn’t return updated measurements.

However, 200mSec later at 8258mSec, the same set of printout does show updated measurements – left/right = 27.4/148.4, left front, center, and rear = 25.1, 27.4, 29.0 cm

Its a mystery!

08 August 2023 Update:

In order to troubleshoot the problem described above where distances didn’t seem to update in a timely manner, I modified the ‘distances only’ feature of the program to print out left/right distances while moving in a straight line between two walls in a ‘V’ shape. The idea is to see whether or not the distances are updating each time the program cycles through the (currently 50mSec) update loop. Any hiccups should show up as ‘flat spots’ in the plot of distances vs time. The run is shown in the short video below, along with an Excel chart of the results:

Then I used Excel’s conditional formatting feature to highlight any cells containing duplicate distance values – the ‘flat spots’ I referred to earlier

As can be seen from the above, there were two sets of duplicates in the left distance column, and three sets in the right distance column. These duplicates could be an artifact of robot speed – i.e. the robot may not be moving fast enough to actually ‘see’ a different distance in the 50mSec between measurements.

I ran the experiment again after bumping the robot’s base speed from MOTOR_SPEED_QTR to MOTOR_SPD_HALF. This time, I got the following:

Hmm, I’m a bit worried about the four duplicate right-side distances at the start, but I think they are a ‘startup artifact’ (at least I hope so). The only other set of dupes is rows 16/17 in the left distance column. All in all, I think everything is working OK.

The differences between the code that produced the above results and the actual code that produced the problem is the actual code performs a ‘Spin Turn’ of 90º followed by a motor run of 1sec between the first and second distance measurements. These two maneuvers take about 4 Sec, so the distance sensors should have updated at least 80 times, and at least 20 times during the 1Sec straight motor run after the 90º turn.

The only other possible difference is the actual code calls ‘UpdateAllDistances()’, while the ‘V Run’ test code uses ‘UpdateAllEnvironmentalParameters()’ (which calls ‘UpdateAllDistances()’ as part of its update process).

09 August 2023 Update:

I changed the line

To

And re-ran the ‘V’ distance test. The results were basically identical as shown below

So, I think it is safe to say that there is no difference in behavior from the use of UpdateAllEnvironmentParameters() and UpdateAllDistances(), which should be a no-brainer as UpdateAllEnvironmentParameters() uses UpdateAllDistances() to update the distances.

Now I’m left with the situation described by Sherlock Holmes – “When you have eliminated all which is impossible then whatever remains, however improbable, must be the truth.

So, maybe those ‘incidental’ duplicate distance values highlighted in the above runs are actually real? In the first run there were two sets of two dupes on the left side right at the start, and the other two runs there was one set of four dupes (one on the left, one on the right) at the start. If they are real, then that means it took about 100mSec to clear the dupe on the first run, and about 200mSec on the 2nd and 3rd runs. This could be consistent with the behavior shown in the actual ‘open corner’ experiment, where the first measurement after the 1Sec movement was the same as the last measurement before the movement started.

Now I’m beginning to suspect that there is some sort of buffering going on in the VL53L1X distance sensors, maybe due to the recent change to a 50mSec update rate. If the returned measurement was always one (or two?) measurement(s) behind, then that would explain why the measurement(s) received after the 90º turn and subsequent 1Sec motor run was the same as the one(s) before.

I changed the update interval from 50mSec to 75mSec to see if that eliminated the dupes in the ‘V’ run, and got the following:

Inconclusive; there wasn’t a block of four dupes at the start, but this could be just due to physics, as the robot would have moved 50% farther in 75 vs 50mSec during each measurement interval.

I changed my wall configuration back to the ‘open corner’ setup, and made another run using the code that showed the problem before (and with the 3-iteration loop that ‘solved’ it), just to make sure I still had a good baseline.

The baseline run showed the same behavior as before (whew!) so now that I have a solid baseline I can begin to troubleshoot. Again, the code that attempts to solve the problem by looping through ‘UpdateAllDistances()’ is shown below:

The first thing I tried was reducing the delay from 200 to 50mSec. This gave me the following output:

Note that the ‘At top of loop()’ measurement readout of left/right = 197.7/160.9 at 7413mSec and the ‘After RunBothMotors()’ readout of left/right = 197.7/160.9 at 10487mSec are before and after respectively, the SpinTurn() call (approx 2Sec) and the RunBothMotors() call (approx 1Sec). 30.3/29.4/35.8

Hmm, This time the distance report after the very first UpdateAllDistances() call shows reasonable (not great, but reasonable) numbers for left-side front/center/rear distances – 30.6/38.2/49.5. 50mSec later, the report after the 2nd call to UpdateAllDistances() shows 30.3/29.4/35.8. Note that the center and rear distance readings changed quite a bit (about 9 and 14cm respectively) even though the robot wasn’t moving. Clearly something is ‘catching up’ here.

Next I reduced the loop delay() call from 50 to 1mSec and re-ran the experiment, as shown below:

Well, that’s odd! the left-side rear distance is shown as 338.4cm for all three loop iterations! That can’t be right, as the starting front/rear/steerval averages for the orientation turn were 30.04/32.11/-0.21, so now I have no idea what’s going on. I made another run and this time printed out the 10-point average of front/rear distance values performed by RotateToParallelOrientation(). This is what I got:

The run itself was a ‘failure’ in that the robot performed the ‘parallel orientation maneuver’, it made a huge turn in the wrong direction. Looking at the data, it is clear that the culprit here is the first term (215.40cm) in the ‘rear’ 10-point average above. This value comes from the ‘rear’ distance value reported all three times in the 3-iteration loop, utilizing a 1mSec loop delay.

I think what I am seeing here is due to the minimum cycle time of the VL53L1X distance sensors. from the setup code for the VL53L1X sensors I see:

Which I believe means that the minimum time required for a new value to appear in the VL53L1X buffer is 50mSec. For larger distances, the time required might be longer. So that explains why, if the distance reading at the start of my little 3-iteration loop is ‘X’ and the loop time is 1mSec, it is highly likely that all three iterations will report ‘X’. This still doesn’t tell me why the initially reported value is ‘X’ when the robot has moved so that it is at a completely different physical distance from the wall when ‘UpdateAllDistances()’ is called.

Next, I tried breaking the 1Sec ‘move’ into three separate moves, with a call to ‘UpdateAllDistances()’ at the breaks. This shouldn’t make any physical difference, as the moves will all run together. The difference is that the distances reported at the end of the travel should be much more accurate.

Well that trick didn’t work at all! Even with the motor run broken up into 4 250mSec pieces, the errant measurement still showed up as the first element of the 10-element averaging array – bummer!

I made another run, but this time I moved the 200mSec delay in the 10-element averaging loop in front of the measurement, so there would be an initial 200mSec delay before the first element is written.

Aha! This run worked fine, as shown in the telemetry below:

Note that this run was also performed with the 3-element ‘catchup’ loop commented out. Also, it is clear from the above telemetry that the ‘chunked’ motor run wasn’t effective at all, as even the last reported left-side rear distance still shows over 300cm.

It is interesting that the progression from ‘no wall in sight’ to ‘a wall in sight’ can be seen as the robot progresses; the left-side front distance starts at 59.2cm and decreases to the correct value of 27.4cm, while the left-side center distance starts at 286cm, stays the same for the next two measurements, and then decreases dramatically from 296.6 to 46.4cm (still wrong, but better), and the left-side rear distance stays above 300 for all measurements. It is clear from this data that what is being reported is somewhat behind the actual robot position. This may be in part a result of the VL53L1X measurement physics – as the sensor uses a ‘cone’ of light to illuminate the environment and then (I think) computes the histogram of the results. If more of the rear sensor is still ‘seeing’ past the open corner, it will report a ‘no wall’ result.

So, I think we are looking at a two-part problem. The first part is due to sensor physics; the sensor has to ‘see’ the wall throughout it’s FoV (Field of View) cone to produce an accurate measurement. The datasheet shows the default FoV to be 27º, so this is a reasonable conjecture IMHO. The second part is an apparent time lag from the time the FoV changes (in this case – from ‘no wall’ to ‘wall’) to the time the new distance value shows up at the output in response to a measurement request in the code. As I found out from the above experiments, this second issue seems to be completely solved by moving the 200mSec loop delay in the part of the RotateToParallelOrientation() routine that computes a 10-element average to the front of the loop(), so it takes place before the first measurement request. I now believe this 200mSec delay gives the sensors time to ‘catch up’ to the actual environment.

I made a ‘confirmation run’ with the following setup:

  • WALL_TRACK_UPDATE_INTERVAL_MSEC = 50
  • ‘Chunked’ motor run removed – now one 1000mSec run
  • 3-element ‘measurement catchup’ loop replaced by single 200mSec delay followed by a single call to UpdateAllDistances();
  • 200mSec loop delay in RotateToParallelOrientation() 10-element averaging loop moved to top so that it executes before the first measurement request.

Yay! It all worked! Here’s a short video and the relevant telemetry from the run:

Everything looks good with the above results – I think I can now put the ‘open corner’ issue to bed. As a side-benefit, I think I have also improved the function of ‘RotateToParallelOrientation()’, so other calling functions will benefit as well. A quick search of the code shows 7 or 8 places where the function is called.

Help! I’m Spinning out of Control!!

I have been running ‘field’ (i.e. in my house) tests with WallE3, my autonomous wall-following robot. Unfortunately, WallE3 has demonstrated an unfortunate tendency to lose its mind and start spinning out of control – “around and around the robot goes, and when he stops, nobody knows!”. After a number of trials where this happened, I realized I’m going to have to figure out how to detect this condition so I can get WallE3 to recover properly.

Fortunately, WallE3 knows its relative heading, thanks to its onboard GY-521 MPU-6050 3 Axis Accelerometer/Gyroscope. So, I thought I should be able to detect the ‘spinning’ condition by monitoring the relative heading numbers; if the relative heading values traversed a full 360⁰ within a reasonably short period of time like 3-5 sec, then the robot should be stopped and a recovery algorithm of some sort implemented.

As usual, a seemingly simple algorithm turns out to not be quite as simple as it seems at first glance. The first thing I tried to do was to use my new robot telemetry visualization tool to go back through my recorded telemetry files to find some runs where spinning occurred. Unfortunately, I couldn’t find any – bummer! Not to worry, I decide I could use Excel to ‘invent’ a spinning event by generating a series of monotonically increasing heading values. Then I used Excel and VBA to work out an algorithm for ‘help, I’m spinning’ detection. Shown below is a screenshot of the Excel spreadsheet, and a screenshot of the VBA code that does the detection.

Now to see if this idea actually works ‘in the wild’ (or at least ‘in the robot’)

13 June 2023 Update:

I wanted to capture data from a real ‘spinning’ event to further test the above algorithm, so naturally WallE3 has refused to cooperate, even after several trial runs. So, being the sneaky person I am, I decided to add ‘#define HEADINGS_ONLY’ and associated code section to WallE3’s code base, so I can capture heading date while manually spinning the robot. This worked well, and because it is in a #define block, it gets compiled out for normal operations. After getting that working, I captured a bunch of heading data and dropped it into my Excel setup to see how my VBA code worked with ‘real’ data. As it turned out, this exposed a bug in the algorithm – I had forgotten to handle the case where the cumulative heading is negative, but with a magnitude greater than 360. The fix was to compare the absolute value to 360, as shown in the revised code below:

Stay tuned,

Frank

VL53L1X Distance Measurement Compensation

Posted 28 April 2023

After replacing all the VL53L0X Time-of-Flight distance sensors on WallE3 with VL53L1X versions with twice the effective range (400cm vs 200cm), I followed the procedure described in this post to derive distance correction expressions for each sensor.

First I set up a distance test range as shown in the following photo:

Distance Calibration ‘Range’

I captured the distance readings from all seven sensors at 10cm intervals from 10 to 60cm, using ‘WallE3_Complete_V2.ino’ in ‘Distances Only’ mode. For each reading I recorded a 20-point average as shown in the following table and plot:

Distance calibration Excel table and plot

This data looks exceptionally good except for the recorded values for the rear sensor. It looks like I may have made a mistake on the 40cm measurement. To investigate this, I re-ran just the rear sensor measurements, resulting in the following table and plot:

Revised plot showing corrected rear distance readings

Looking at the data I suspect I simply forgot to move the robot to the new distance for the 40cm ‘rear’ measurement. I dropped the faulty dataset and then used Excel’s ‘trendline’ feature to show a linear expression fit to the data for each sensor, as shown below:

Distance plots with linear trendline expressions

It turned out to be pretty difficult to select individual sensor plot lines to add trendlines, because (with the exception of the left front sensor) they were all almost identical. In order to add trendlines, I had to use Excel’s ‘Select Data’ feature to show only one line at time. Then when I added the associated trend line, I was able to edit the displayed expression to tag it to the associated sensor, as shown above.

As I did in my previous work with the VL53L0X sensors, I used the trendline expressions to come up with compensation expressions for each sensor, as follows:

  • LF: LFCorr = (LF-3.44)/0.8746
  • LC: LCCorr = (LC-0.7533)/0.9751
  • LR: LRCorr = (LR+0.34)/0.9961
  • RF: RFCorr = (RF+0.4133)/1.0001
  • RC: RCCorr = (RC+0.1867)/0.9877
  • RR: RRCorr = (RR+0.8)/0.9971
  • Rear: RearCorr = (Rear-0.7267)/0.9969

When I plotted the corrected values, I got the following plot:

Compensated VL53L1X sensor data

As can be seen in the above plot, the compensation is almost perfect (for that matter, most of the sensors, with the exception of the LF (Left-Front) sensor, were pretty much spot on without compensation.

With the above information, I modified all seven ‘lidar_XX_Correction()’ functions in ‘Teensy_7VL53L1X_I2C_Slave_V1.ino’ to use the new correction factors, and then made another distance measurement run, with the results shown below:

After adding compensation to WallE3 programs

At this point I believe it is time to declare victory and move on with WallE3 ‘field testing’.

Stay tuned,

Frank

Replacing VL53L0X Time-of-Flight Distance Sensors on WallE3 with VL53L1X

Posted 21 April 2023

This material was an addition to my earlier ‘More ‘WallE3_Complete_V2’ Testing‘ post, but I decided it deserved its own post.

As one result of my recent ‘field’ tests with ‘WallE3_Complete_V2’, I discovered that the maximum distance capability (approximately 120cm) of my VL53L0X sensors was marginal for some of the tracking cases. In particular, when the robot passes an open doorway on the tracking side, it attempts to switch to the ‘other’ wall, if it can find one. However, If the robot is tracking the near wall at a 40cm offset, and the other wall is more than 120cm further away (160cm total), then the robot may or may not ‘see’ the other wall during an ‘open doorway’ event. I could probably address this issue by setting the tracking offset to 50cm vs 40cm, but even that might still be marginal. This problem is exacerbated by any robot orientation changes while tracking, as even a few degrees of ‘off-perpendicular’ orientation could cause the distance to the other wall to fall outside the sensor range – bummer.

I had the thought that ST Micro’s latest brainchild, the VL53L5CX sensor (investigated in this post) might be the answer, and would radically simplify the ‘parallel find’ problem as well. Unfortunately, the reality turned out to be somewhat less than spectacular. See the ’20 March 2023′ update to the above post for details.

Noodling around the STMicro site, I ran across the VL53L1X, which appears to offer about twice the maximum range than the VL53L0X; could this be the answer to my range issue? A quick check in my ‘sensors’ drawer didn’t turn up any, so I’m now on the prowl for a source for VL53L1X units. I Found a couple of VL53L1X units on eBay that I can get in a couple of days – yay!

Then I searched for and found a source for VL53L1X units that have roughly the same form factor and pinout as my current VL53L0X units, which should allow me to use a modified form of the 3-element array (one for each side) PCB I created earlier (see this post). Then I opened up the PCB design in DipTrace, modified it as required to get the pinouts correct, and then sent the design off to JLPCB for manufacture. With any luck, the PCB and the VL53L1X sensors should get here at about the same time!

After the usual number of errors, I was able to get a 3-element array of VL53L1X units working with a Teensy 3.5 on a plugboard, and then I moved the sensors to my newly-arrived V2 PCB’s, as shown in the following photo:

3-element array of VL53L1X sensors on new PCB

As shown in the photo, the array was pointed diagonally up toward the ceiling about 2.4-2.5m away, while the test was running, I waved my hand rapidly back and forth through the ‘beam’, to see how quickly the sensors could react. As shown in the following Excel plot, the answer is “pretty darned quickly!”.

My wife and I spent last week in Gatlinburg, Tenn. For those of you who have never heard of Gatlinburg, it is a small town nestled in the Great Smoky Mountains National Park. For most people, it is known for it’s beautiful scenery, great shopping, and its colorful history. However, for us bridge buffs, it is famous as the host of a regional bridge tournament, one of the largest in the nation.

There is a fair amount of down time between games, so I brought my VL53L1X test setup along to play with. This morning I was able to test my two new 3-element VL53L1X arrays connected to a Teensy 3.5 on a plugboard as shown in the following photo:

two 3-element VL53L1X arrays, 5 of which worked fine

I used my ‘VL53L1X_Pololu_V1.ino’ (shown in it’s entirety below) to test the arrays.

This program instantiates an array of VL53L1X objects named ‘sensors’, and the user initializes this array with the pin numbers attached to the XSHUT input of each device. In its original ‘out of the box’ configuration, the program expects three sensors to be attached to the default I2C port (Wire0), with XSHUT lines connected to controller pins 4, 5, & 6. As I described earlier in my 15 April update, I first modified the program to use Teensy 3.5 pins 32, 31, & 30 and verified that all three sensors were recognized and produced good data. Then I added a second 3-element sensor array on the same I2C bus, with XSHUT pins tied to Teensy 3.5 pins 4,5, & 6. Unfortunately, the program refused to recognize any of the sensors on the second array. So, I used the ‘sensorCount’ value and the contents of the xshutPins[sensorCount] array to selectively disable individual sensors on the second array, and I was able to determine that one of the VL53L1X sensors on the second array wasn’t responding for some reason, but the other two worked find. So, now I know that the Wire1 I2C bus on the Teensy 3.5 can handle at least 5 sensors, with no external pullup resistors – yay!

The next step was to move one of the arrays to Wire2 to more closely emulate the current situation on the robot. Here is the completed code, with only two of the three elements in the second array being utilized:

Here is a short section of the output from the above program:

From the above telemetry I have picked out the following lines:

Note that in the above there are two sensors set for 0X2A and two for 0X2B. This works, because the first three sensors (0, 1, & 2) are on the Wire1 I2C bus, and the remaining two (also sensors 0 & 1) are on the Wire2 bus. This setup essentially duplicates the dual 3-element sensor arrays on WallE3.

After getting the above program working with Wire1 & Wire2, I used it to modify my previous Teensy_VL53L0X_I2C_Slave_V4.ino program to use the new VL53L1X sensors. The new program , ‘Teensy_VL53L1X_I2C_Slave_V4.ino’ is shown in it’s entirety below:

22 April 2023 Update:

We got back home from Gatlinburg last night, and so this morning I decided to verify my theory that one of my six VL53L1X distance sensors was indeed defective. I have a pretty healthy skepticism about blaming hardware failures in a hardware/software system; in fact my motto is “Hardware never fails” (it does occasionally fail, but much much less often than a software issue causing the hardware to LOOK like it fails).

So, I loaded my one-sensor VL53L1X_Demo.ino example onto the Teensy 3.5 and used Wire0 (pins 18/19) to drive just this one sensor. Naturally it worked fine, as shwon below:

Well, as I suspected, the hardware seems OK so now I have to figure out why it didn’t respond properly in my six-element setup.

I replaced the ‘XSHUT’ wire from Teensy pin 5 to the sensor, but this did not solve the problem. I also tried driving the XSHUT line HIGH instead of letting it float, but no joy. Next I tried switching the suspect sensor with the one right next to it, to see if the problem follows the sensor. After several iterations, it now appears that the problem stays with the sensor associated with whatever sensor’s XSHUT pin is connected to Teensy 3.5 pin 5, or possibly with the 3-element array PCB itself.

I moved the XSHUT wire on T3.5 pin 5 to T3.5 pin 9, and re-ran the program. Same problem. Replaced the jumper wire from T3.5 pin 9 to the sensor; no change.

So now it is looking more likely that there is a problem on the PCB associated with the sensor socket closest to the T3.5-to-PCB cable. A glance at the back of the sensor socket revealed the problem – some idiot (whose name is being withheld to protect the author) had failed to solder three of the four socket pins to the PCB – oops!

How to waste a week of work – forget to solder three out of four socket pins to the PCB!

After fixing my solder (or lack thereof) screwup, everything started working – yay! Just as an aside, I claim credit for starting this troubleshooting effort with the statement “Hardware Never Fails”, which turned out to indeed be the case. Small comfort, but I’ll take it!

After confirming that both 3-element arrays were working properly, I added the rear sensor VL53L1X as a fourth sensor on Wire2, with XSHUT connected to pin 8. This mimics the hardware arrangement implemented on WallE3. Here’s a photo showing the plugboard setup:

In the above image, the ‘rear’ sensor is show standing upright on the right side of the plugboard.

And some typical output:

At this point I have the above ‘VL53L1X_Pololu_V1.ino’ program doing exactly what I want – handling seven different VL53L1X sensors on two different I2C busses. Now I need to port the necessary changes into my ‘Teensy_7VL53L1X_I2C_Slave_V1.ino’ program, which itself is a clone of ‘Teensy_7VL53L0X_I2C_Slave_V4.ino’, the program currently running on WallE3.

After a few minor missteps, I believe I now have ‘Teensy_7VL53L1X_I2C_Slave_V1.ino’ working with all seven VL53L1X sensors. Here’s the complete program:

And a sample of the output:

At this point the only remaining step is to physically swap out the sensors currently on WallE3 with the new ones, and reprogram the 2nd deck Teensy 3.5 with the new ‘Teensy_7VL53L1X_I2C_Slave_V1.ino’. With any luck at all, WallE3 won’t even notice anything has changed, except he will now be getting valid side/rear distance values from much farther away. We’ll see!

26 April 2023 Update:

After carefully installing all seven sensors (two 3-element arrays plus a rear-facing one) on WallE3’s second deck, and running the same Pololu example program on the second-deck Teensy 3.5, I discovered that one of the sensors was initializing properly, but was reporting ‘0 TIMEOUT’ for the distance – major bummer! After removing the left-hand sensor array from WallE3 and re-attaching it to my free-standing test Teensy 3.5, I eventually found that the problem was a broken connection INSIDE one of the 4-pin female headers on the PCB – yikes!

Anyway, got that fixed, re-attached the left-hand array to WallE3, and ran the Pololu program to verify proper operation, and now all seven element report believable distances, as shown below. In the output below, I used my hand to block the right, left, and rear sensors to verify proper performance.

Next, I loaded my new ‘Teensy_7VL53L1X_I2C_Slave_V1.ino’ program on WallE3’s second-deck Teensy 3.5 and verified that all seven sensors were operating properly, as shown in the output below:

The photo below shows both 3-element arrays mounted on WallE3. The rear sensor is hidden behind the red support tower:

The next step was to load ‘WallE3_Complete_V2.ino’ onto the Teensy 3.5 main controller and verify that it could indeed get distance information from the second-deck Teensy 3.5 VL53L1X array controller. Here’s some output from WallE3_Complete_V2.ino in ‘DISTANCES_ONLY’ mode. Again I used my hand to block the left, right, and rear sensors to verify proper operation.

The next step is to re-implement the distance compensation algorithms for each sensor. For the VL53L0X sensors, this was done using the procedure described in this post. The procedure involves taking sensor readings at several known distances, and using the data to develop a correction expression for each sensor.

To make this happen, I had to first disable the current compensation scheme for all the sensors. Then I ran ‘WallE3_Complete_V2.ino’ again in Distances Only mode to get the data needed to develop the compensation expressions.

Stay tuned,

Frank

C# Drawing Text In Window with Y-up Transform

Posted 02 April 2023,

In a previous post I described my effort to use ‘Processsing’ to graphically depict the wall-following behavior of WallE3 my autonomous wall-following robot. This worked ‘ok’ (lower case ‘OK’), but with some significant issues that prompted me to try again using C#.Net. I have done quite a bit of work in C#, so I was pretty sure I could make something useful. However, I almost immediately ran into a problem that turned out to be non-trivial (at least to me) to solve.

The problem was that I wanted to use a traditional engineering/scientific coordinate system, with the origin at the lower left-hand corner of the viewing area, with x increasing to the right and y increasing upwards. Unfortunately, the default system in Windows has the origin at the top left-hand corner with x increasing to the right and y increasing downwards. Should be a piece of cake, right?

Well, it is, and it isn’t. Flipping the y-increase direction and moving the origin to bottom-left wasn’t that bad, but then I discovered that if you wish to draw some text (like ‘x’ and ‘y’ at the ends of coordinate axis marker lines), the ‘y’ shows up flipped vertically (the ‘x’ is also vertically flipped, but a vertically flipped ‘x’ is….. ‘x’ 😉).

So, I bumbled around in Google-land for a while and ran across a post where someone else (Andrew Norton, I think) was having (and had ‘solved’) the same issue. Here is his solution:

So I fired up my VS2022 Community edition IDE and played with this for a while, and it worked – sort of. However, it seemed the text sizing and placement was ‘off’, and I couldn’t figure out why. After lots of playing around, I finally worked out what was happening, and was able to boil it down to what I thought was the simplest possible example. I put all the code into the ‘Paint’ event handler for a Windows Form project, as shown below:

When run, this produces the following output:

Original and flipped/aligned “Sample Text” drawings

In the above figure, the vertically flipped “Sample Text” was drawn after applying the transforms that flipped the y direction and moved the origin to 100,100 with respect to the bottom left-hand corner. The second correctly placed and oriented rendition of “Sample Text” was obtained after implementing steps 4-6 in the above code.

This was pretty cool, but I also wanted to be able to pull in robot telemetry data in Cm and display it in a way that makes sense. I found the ‘Graphics.PageUnit’ method, and I found a small example to show a rectangle drawn with the default ‘pixels’ setting and also with the ‘Point’ setting. I modified this to add the line ‘e.Graphics.PageUnit = GraphicsUnit.Millimeter;’ and got the following:

50w x 100h unit rectangle with top left corner at (20,20) units in pix, points, and mm

According to my trusty digital calipers, the orange ‘mm’ rectangle was very close to 50 x 100 mm (at least on my screen).

So, I *should* be able to combine these two effects and get what I’m after – a screen with the origin at the bottom, left-hand corner and calibrated in mm. My data is actually in cm, but the inherent 10:1 scale factor should work out pretty well, given that I’m working with distances from a few cm to as much as 10m.

03 April 2023 Update:

After a lot of fits and starts, I think I have finally arrived at a drawing algorithm that allows me to use a x-right, y-up coordinate system in which I can draw text correctly (i.e. it doesn’t display upside-down). I posted this in the Stack Overflow thread from a few years ago that gave me my first big clue about how to solve this problem, so hopefully it will help some other poor soul, and I’m also including it below. To use this example, create a Windows .NET form application and enable the ‘Paint’ and ‘ResizeEnd’ handlers (the ‘ResizeEnd’ handler isn’t strictly required, but it allows the user to re-run the example by just resizing the screen slightly). Then replace the contents of the Paint and Resize handlers with the example code, and also paste in the two helper functions in their entirety.

Here are a couple of screenshots of my form after running the example code. The first image shows the default Windows form size, with the top portion (and the ‘Y’ label) cut off. The second image shows the situation after resizing the form down a bit, allowing the ‘ResizeEnd’ handler to force the program to re-run and re-draw.

Default Form1 size doesn’t show top part of ‘Y’ coordinate line
Screenshot after dragging the bottom of the form down

Using ‘Processing’ to Display Robot Wall-following Trials

Posted 27 March 2023,

While waiting for my new VL53L1X time-of-flight distance sensors to arrive, I’m in a bit of a lull. So, I decided to make a run at using the ‘Processing’ language & IDE to try out some ideas on displaying the telemetry from recent autonomous wall following trials.

I currently use Excel to produce 2D plots of various telemetry parameters, and this does a pretty good job of giving me good insight into the behavior of WallE3, my autonomous wall-following robot. However in this last batch of trials after giving WallE3 the ability to ignore open doorways if another wall is available for tracking, it became more difficult to tell what was going on using just the 2D plots. I also video the runs, but I have discovered it is very difficult to synchronize the video with the 2D Excel plots; when I look at the video I can see specific events, but I have real trouble seeing those same events in the 2D plot, and vice versa. In one of my going-to-sleep brainstorming sessions I recalled some web research I had done regarding the ‘Processing’ language & IDE – and thought this might be an opportunity to try using Processing to display robot telemetry visually. Maybe ‘Processing’ would allow me to create the equivalent of a video directly from the telemetry, which hopefully would give me greater insight into the details of robot behavior, especially when dealing with ‘open doorway’ and ‘wrong wall’ events.

So, I downloaded and installed Processing. I really hate the name; it’s too vague and doesn’t work very well in the English language; it always sounds like the sentence is missing a noun or a verb. You wind up with sentences like “using Processing, I processed robot telemetry to create a visual representation of a wall-following run.” Oh, well – maybe all the other names were taken?

In any case, after fumbling around with some tutorials and trying (with mixed success) to get my head around Processing’s architecture, I was able to make a crude representation of some recent wall-following telemetry.

first try at displaying robot wall-following using ‘Processing’

For such a crude result, the above plot is actually pretty informative. I could easily see where the robot started off by capturing the desired offset, and then tracked the offset very accurately up to the point where the wall changed direction. The robot did OK, but the above display doesn’t accurately display the wall direction change.

Here’s the entire Processing ‘RobotTelemetry.pde’ sketch:

First impressions of using ‘Processing’:

Pros:

  • Easy to get started – fast download, quick install, lots of tutorials
  • Java programming syntax close enough to C++ to be usable
  • Very easy to get a working program to display in viewing window
  • Lots of examples
  • Lots of extensions

Cons:

  • Not easy to expand beyond simple fixed-window-size displays
  • Not obvious how to handle user interactions with screens
  • Not obvious how to handle non-fixed window sizes, viewports with different size than display window, scrollbars, or other user controls
  • Can’t change default IDE window size – very frustrating.
  • Programming IDE kind of clunky; no ‘Intellisense’ capability, have to keep jumping to Processing ‘Reference’ website for function syntax, etc.

After going through the ‘process’ of using ‘Processing’ (I hate that name!) to ‘process’ robot telemetry to produce a graphical view of the robot’s behavior, I realized that although it was relatively easy to get to ‘first display’, In my particular case I would probably have been better off building this in C#/Visual Studio due to the wealth of support for GUI systems.

09 April 2023 Update:

In a little more than a week I was able to put together a pretty decent Windows Forms App using my normal 2022 Visual Studio Community Edition IDE and C#/.Net. Here’s a screenshot displaying approximately the same section of wall as is displayed above using ‘Processing’ (I still hate that title!): For convenience, I also copied the image from above:

first try at displaying robot wall-following using ‘Processing’
C#/Windows Forms App showing same section of wall as above

The C#/.Net environment was so much easier to use – there is almost no comparison (although I may be a bit biased by the fact that I have been programming Windows graphical user interfaces for over 30 years).

Pros:

  • It is actually very easy to get started in the modern Windows Forms App genre. The Visual Studio Community Edition is free, and there are more tutorials than you can shake a stick at
  • Great community support. If you have a problem, the probability that someone else has already experienced the same problem and solved it! is essentially 1. For instance, on this project I wanted to use CTRL-SCROLLWHEEL input to zoom in or out of the view, but the ‘pictureBox’ control I was using as a drawing surface didn’t support this feature ‘out of the box’. After a few Google searches I easily found multiple posts about the issues, and several different solutions, and now I have the feature enabled in my app – neat!
  • Great language support from Microsoft. The Visual Studio IDE allows the programmer to press F1 on any language element, and this links to the relevant reference page – also neat.
  • You get the benefit of a huge number of graphical entities and a very successful IDE. Even a novice can generate a complete, working Windows app in just a few minutes. The app won’t do much, but it will have a visible window that can be resized, moved around, minimized, maximized, and everything else you would expect all Windows apps to do. From there that same novice can easily add graphical elements like scrollbars, labels, data entry boxes and more.

Cons:

  • The Visual Studio IDE can be a bit daunting at first, as it is meant to be everything to everyone. However, an installation with a basic set of features for Windows Forms apps can be easily downloaded and installed for free from Microsoft, and the installation process is straightforward
  • If you haven’t done any graphical programming at all, then there will be a learning curve to get used to concepts like the drawing surface and drawing operations (but this would be true of ‘Processing’ as well).
  • In order to share a Windows Forms app, either the recipient has to have the same programming environment so the app can be run in DEBUG mode, or the app must be compiled into an executable that can then be run on the recipient’s machine in a stand-alone fashion. With a ‘Processing’ sketch, it is a bit easier (I think) to send someone a Processing ‘sketch’ which they can then run in their ‘Processing’ environment (there may also be a way to compile a ‘Processing’ sketch into an executable, but I didn’t investigate that).

In any case, as the man says, “Your mileage may vary”

Stay Tuned,

Frank

More ‘WallE3_Complete_V2’ Testing

Posted 13 March 2023,

Now that I have the new Garmin LIDAR installed, It’s time to return to ‘real world’ testing. Here’s a short video of a run starting in our entry hallway and proceeding past two open doorways into our dining/living area:

As can be seen in the video, WallE3 handles the oblique turn to the left and the two open doorways perfectly, but then loses it’s way when (I think) the right-hand wall disappears entirely. Not sure what happened there. Here’s the telemetry from the entire run:

Here’s the Excel plot of the run up to the point where the first open doorway is detected. Note this also covers the oblique turn. From the video, the oblique turn occurs at about six seconds from the start, which should put it somewhere in the 20,000 mSec area. However, comparing the video and the plot, it looks like this turn actually occurs at about 21,500 – 22,000 mSec, where the distance to the right-hand wall falls from the max 200 to about 60-70 cm. The turn itself goes very smoothly, and then the first open doorway condition occurs about four seconds later, at 26,200 mSec.

Segment from start to the first ‘open doorway’ detection

This next plot shows the period from the time of the first ‘open doorway’ detection to the end of the run, including the time where the robot passes the second open doorway (apparently without detecting it).

Segment from the first ‘open doorway’ detection to the end of the run

The left-hand wall distance starts out at 200 (max distance) due to the first open doorway. During this period the robot is tracking the right-hand wall (kitchen counter). When the left-hand distance returns to normal at about 27500 mSec, the robot continues to track the right-hand wall. Apparently the very short section of wall between the first and second doorways wasn’t enough to trigger the ‘open doorway’ condition. However, when the left-hand wall distance comes back down again after the second open doorway (at approximately 30400 mSec, the robot should have reverted to left-wall tracking, but it obviously didn’t. Soon thereafter both the left and right-hand distances started increasing to max values and the poor little robot lost its way – so sad!

After looking through the data and the code, I began to see that although the code could detect the ‘open doorway’ condition, it wasn’t smart enough to detect the end of the ‘open doorway’, so the robot continued to track the right-hand wall – “to infinity and beyond!”. After making some changes to fix the problem, I mad a run in my test range (aka my office) to test the changes. Here’s a photo of the setup:

‘Tracking Wrong Wall’ bugfix test setup

The test wall are set up so the ‘wrong side’ wall starts before the open doorway, and ends after it, and the distance between the two walls was set such that the measured distance to either side would be less than MAX_TRACKING_DISTANCE_CM (100 cm).

Here’s the telemetry from the run, with an additional column added to show the current ANOMALY CODE:

The robot starts the run tracking the left wall at the default offset (40cm), with an anomaly code of ANOMALY_NONE. This continues until the 19 sec mark (19,039 mSec), where the robot detects the ‘open doorway’ condition. This causes the code to re-assess the tracking condition, and it decides to track the right wall, starting at about 19,181 mSec.

During this segment, the AnomalyCode value is OPEN_DOORWAY. This continues until about 20,890 mSec where the ‘Tracking Wrong Wall’ condition is detected. Note that the actual/physical ‘open doorway condition ended at about 20,389 mSec when the left-side distance changed from 200 (max) to 67 cm, but it took another 0.5 sec for the algorithm to catch the change.

The ‘Tracking Wrong Wall’ detection caused the robot to once again re-assess the tracking configuration, whereupon it changed back to left-wall tracking at 21,027 mSec with the new anomaly code of ‘ANOMALY_TRACKING_WRONG_WALL’. Left side wall tracking continues until the run is terminated at 23,235 mSec. Note that the right-hand wall stops at about 22,235 mSec and the right side distance measurement goes to 200 (max) cm and stays there for the rest of the run.

Looking at the above, I believe the fixes I implemented were effective in addressing the ‘wandering robot syndrome’ I observed on the previous run. Next I will remove the debugging printout code, clean things up a bit, and then repeat the last ‘real-world’ run from before.

10 May 2023 Update:

I was definitely having problems with the ‘Open Doorway’ condition, so I wound up back in my ‘indoor range’ (AKA my office) to see if I could work through the issues. It turned out I was not detecting the onset or end of the ‘Open Doorway’ condition properly. I made some changes to the code and to the telemetry output to more thoroughly describe the action, and then ran the test again. The short movie and the telemetry output show the results:

230510 Open Doorway Run

Salient points in the video and telemetry printout:

  • WallE3 captures and then tracks the desired 30cm offset with pretty decent accuracy up until 3.8sec where it encounters the ‘open doorway’ on the left. This results in an ‘OPEN_DOORWAY’ anomaly report, which in turn causes TrackLeftWallOffset() function to exit, which in turn causes the program to start over at the top of loop().
  • The ‘top of loop’ code reevaluates the tracking condition, and because the left distance is well over 100cm and the right distance is about 50cm, it decides to track the right wall instead.
  • The right wall is tracked from 4.0 to 6.2 sec (where the right wall ends) and again detects an ‘Open Doorway’ condition, which forces the loop() function to restart. This time the right distance is about 143cm and the left distance is about 42cm, so the code chooses the left wall for tracking
  • Left wall tracking continues from 6.4 to 8.7sec – the end of the run.

Stay tuned,

Frank