Tag Archives: robots

MPU6050 FIFO Buffer Management Study

Posted 13 October 2019

I have been attempting to use the Invensense MPU6050 6-axis IMU for some time now in both my two and four-wheel robots for improved wall-following ability. By measuring the relative heading change during turns, I could get the robot to accurately acquire and maintain a specified distance from the currently tracked wall.  I say ‘attempting’, as I have experienced somewhat mixed results in getting reliable results from the IMU.  A large part of the problem, as I described in this post, wasn’t the IMU at all, but rather the sensitivity of the Arduino I2C bus to RFI/EMI caused by the motor drivers.  However, even after solving this problem, my programs would still occasionally ‘lose synch’ with the IMU’s FIFO buffer and start returning garbage for heading values – not good!

In addition to the I2C issue, there are several factors that make the MPU6050 harder do work with:

  • Invensense, the company that makes the MPU6050 chip, appears to have been purchased by TDK, and their technical support forums don’t appear to be supported any longer.
  • Apparently there are significant pieces of the MPU6050 firmware that aren’t freely available as human-readable code, so much of the MPU6050 magic is just that – magic.  In particular, the details of how the MPU6050 handles its internal FIFO are (at least to me) completely unknown, except by reverse-engineering.
  • While there is a lot of MPU6050-related information and code floating around out there in the i-verse, that is as much a trial as a blessing; it has been very difficult for me to wade through everything and to try to sort out the wheat from the chaff.
  • Jeff Rowberg’s wonderful I2CDevLib contains support for the MPU6050, with examples.  While it is fairly easy to get started using Jeff’s examples, it was difficult for me to understand how the examples work so I would know how to modify them for my application without running off into the bushes.
  • Almost all the example code out there assumes an interrupt based IMU management scheme.  For my wall-following robot application, the interrupt scheme was overkill and then some, so I wanted to use a polling arrangement, which is very poorly documented. Eventually I developed a working polling algorithm (described here) , which I now use in my robots.
  • Invensense (now TDK) has released several updates to the MPU6050 firmware, and it is difficult (at least for me) to figure out what the differences are and whether or not those differences are worthwhile for my application.  There is some information in the header files provided by Jeff Rowberg, but if there is any sort of formal change history, I haven’t found it.

Despite all this, the MPU6050 is a wonderful device, and it’s EVERYWHERE – you can get MPU6050 modules from Adafruit, Sparkfun, DFRobots, and GY-521 Chinese knockoffs from eBay.  The GY-521 modules have some reputed issues with quality control, but at about $1-2 per module, it’s hard to go wrong.

In my continued attempts to understand how the MPU6050 works, and the details of some of the latest example code provided on Jeff Rowberg’s Github site, I posted an issue related to a particular part of the example code that defied my ability to understand, as shown below:

This code obviously works, but I get a headache everytime I try to figure out how it makes sense.  One of the replies I got from this post was from Homer Creutz, who I knew to be one of the very best experts on all things MPU6050.  The gist of Homer’s reply was “yeah, it looks kinda clumsy, but it does work”. But then Homer went on to suggest an alternative using the modulus (%) operator that piqued my interest, as I had used this technique in my four-wheel robot code.  In subsequent email conversations Homer went WAY out of his way to thoroughly answer my stupid questions about the MPU6050, especially about the issue with FIFO overflow.  He sent me a link to this video explaining how circular buffers work, and the following graphic illustration (slightly edited for clarity) of the problem of MPU6050 overflow and multiple-byte packets

The combination of the video and the above simple graphic finally allowed me to understand why properly managing FIFO overflow is so critical for successful MPU6050 implementations. Ironically, FIFO overflow management is more of an issue in my low-speed, high mass, low-dynamics environment than at the other end of the scale.  In high-dynamics applications, FIFO overflow is almost never an issue because the application sucks data out of the FIFO as fast as it is put in, in order to provide the best possible stabilization and control.  However, in low-dynamics applications like my wall-following robots, there is no need for IMU information more than a few times per second, meaning that the FIFO will almost certainly overflow if it isn’t proactively drained even if the information isn’t really needed.

Homer also sent me a couple of untested alternatives for FIFO management to replace the ‘while() within an if()’ algorithm, so I decided to test them and report the results back to Homer. After all, Homer was going WAY out of his way to answer my ignorant questions, so the least I could do was to be his lab tech.  So, I started with Jeff Rowberg’s MPU6050_DMP6_using_DMP_V6.12 example (the one with the ‘while() within an if()’ snippet) and modified it to deliberately cause FIFO overflows. After a bit of debugging and some very slight changes to Homer’s code, I got the following output from a run using a 100 mSec delay at the start of loop():

As can be seen from the above, about 20 interrupt cycles are skipped in each loop() iteration, causing the 1024-byte FIFO contents to expand by either 280 or 252 bytes each time, until it overflowed and was reset.  The example code handled FIFO overflows properly resetting the FIFO each time so that the retrieved yaw values continued to make sense.

The next step was to replace the example code with Homer’s proposed setup using the modulus operator for FIFO management. The first section below shows the example code loop() function before Homer’s modifications:

And the following shows the same loop() function after implementing Homer’s suggested code:

This resulted in the following output:

Showing that FIFO overflow was indeed handled properly.  The FIFO overflowed after the 3rd time through the loop, (the returned count was capped at the 1024-byte physical length of the FIFO), and Homer’s code correctly removed the 16 corrupted bytes at the beginning of the FIFO, plus one more 28-byte packet.  The subsequent mpu.getFIFOBytes() call retrieved an entire valid packet, which was then process to produce a valid yaw value.  Of course, since only one extra packet was removed, the FIFO overflowed again when the next 336 bytes were loaded during the 100 mSec delay at the start of the next loop() iteration.

When the code snippet to retrieve all available packets

was uncommented from the above program, I got an almost perfect output as shown below:

There were 4 bad values starting at 15250 mSec, as shown below

I’m not real sure what happened here.  A FIFO count of 308 is nowhere near the overflow condition, and it is an even multiple of 28 (the packet size), so everything should have gone swimmingly.  However the displayed value of 2.04 degrees at time 8859 mSec is clearly incorrect, as I was manually (and slowly) rotating the MPU at the time.

Another issue with all of this is the FIFO count associated with the number of interrupts shown in the output.  22 interrupts should produce 22*28 = 616 bytes, but mpu.GetFIFOCount() only returns 308 – exactly half the expected value.  So, either the packet size is actually 14 bytes (not very likely, as mpu.GetPacketSize() returns 28) or the IMU is only loading half a packet on each interrupt.  I added digitalWrite() statements to the ISR so I could directly monitor interrupt occurrences with my O’scope, and the interrupts happened exactly as expected, at approximately 4.58 mSec intervals (about 220Hz).  The 100 mSec delay at the top of loop() should produce approximately 100/4.58 = ~22 interrupts each iteration, and that is what is reported in the output.  So, why the x2 error in the reported FIFO count?

I ran another test, and this one responded perfectly for as long as I let it run (about 22 seconds). During the run I manually rotated the robot (and its attached IMU) back and forth, as shown in the following Excel plot

14 October 2019 Update:

There is clearly something not-quite-right with the way the MPU6050 reports the current length of the FIFO contents.  Here are the results from a recent run:

Aside from the fact that 22 x 28 = 616 not the reported 308, there is also the problem that after one more interrupt (23 vs 22), the reported FIFO content length didn’t go up by 14 (half the expected 28, but…) but instead by 12 bytes – what the heck?  This clearly implies that MPU interrupts aren’t entirely synchronous with actually filling the FIFO – like some sort of race condition?  In other words, the number that is reported by mpu.getFIFOCount() isn’t always an integer multiple of the packet size!  This is contrary to Homer’s assumption that the only way for mpu.getFIFOCount() to retrieve a non-integral multiple was for the FIFO to overflow. This clearly is not happening here, but I’m still getting non-integer multiple results.  Here’s another snippet from the same run:

In the above snippet, it can be seen that a 22 interrupt interval can sometimes result in 316 bytes being reported rather than the expected (ignoring for the moment the issue of a 2x error), and the ‘success’ of removing 36 bytes still resulted in a ‘bad’ yaw value computation (-179.17) 308.  In the very next loop iteration, 22 interrupts resulted in 328 bytes being reported. This time removing the excess allowed a valid yaw computation (-30.46).  So, a 22 interrupt loop interval can result in 308 (the ‘normal’ result), 316, or 328.

15 October 2019 Update:

I changed the loop() delay time from 100 mSec to 200 mSec, and (with no other changes) re-ran the test, with the following output:

The above results showed the expected 42-43 interrupt count between loop() iterations, and the expected (ignoring for the moment the 2x error factor) FIFO contents byte count of 588-616.  However, there were a couple of anomalous occurrences on two consecutive loop() iterations at 19094 and 19312 mSec.  The first one reported a FIFO contents count of 600 instead of 616 and (even after error correction) a clearly erroneous yaw value of -179.20, and the second one reported 604 bytes and (after error correction) an apparently valid yaw value of 61.54.

After an email exchange with Homer, I tried replacing this line

with this one:

In the following section of Homer’s algorithm

After this change, I re-ran the test at the 100 mSec loop() iteration delay setting with and without the above code change.  In both cases, I still got errors trapped,  as shown below (The test conditions for each run below are noted at the top of the text file)

So then, also at Homer’s suggestion, I instrumented the code to get the FIFO count rapidly several times after an error detection to see if the first mpu.getFIFOCount() occurred while data was actually being loaded into the FIFO and therefore got an erroneous count. So, I changed Homer’s code correction section to the following:

and re-ran the 100 mSec loop() iteration delay test.  What I got was this:

Wow!  the FIFO count changed!  The first mpu.getFIFOCount() at the top of the detection section got 334, and the next 3 calls all got 336!  So the first mpu.getFIFOCount() call must have hit the mpu 26/28 of the way through the packet load!

So, the MPU6050 packet load scheme isn’t atomic and there is, in fact, some sort of a race condition.  I think we have all been assuming that the MPU6050 loads the FIFO with a complete packet and then triggers an interrupt, while it appears that it is actually happening the other way around; the interrupt is triggered and then the packet is loaded into the FIFO. Most of the time this doesn’t cause a problem, but if you ‘get lucky’, the register associated with the mpu.getFIFOCount() call is read before the entire packet is loaded

16 October 2019 Update:

Homer asked me to change the code to determine exactly how long it takes to “clear the error”, which I take to mean “how long would a program have to wait for the MPU6050 to finish loading the rest of the packet into the FIFO”.  Homer sent me some sample code, which I modified slightly to produce the output Homer was looking for, as shown below:

When I ran this code, I got the following output:

 

21 October 2019 Update:

After several more email exchanges with Homer, he believes that he has now come up with pretty ‘bullet-proof’ way of handling MPU6050 errors, with the following subroutine

The idea behind this subroutine is to ensure that any overflow condition is detected and managed properly. The routine is completely independent of interrupts, so it can be used in a program using interrupts or polling.

Homer also sent me some test results using the program, with a variable loop delay time designed to be just below and then just above the delay required to overflow the buffer. This demonstrated that his subroutine properly handles FIFO overflow conditions, and returns valid packet data whenever possible.

In the above output, the first column is the loop delay in mSec, then the ‘Flag’ value returned by the subroutine, and then the ypr (yaw,pitch, roll) values extracted from the buffer filled by the subroutine.  As is shown, loop delay values above about 177 mSec start returning ‘2’ Flag values, indicating the routine detected (and recovered from) an overflow condition.

I replicated this experiment on my end, but discovered that for my installation, I had to use a loop delay almost exactly twice the value used by Homer (360-370 vs 177-178). The implication is that either my MPU6050 is loading the FIFO at one-half the rate of Homer’s unit, or my IMU has a buffer size twice the one being used by Homer.  Curioser and Curioser ;-).

Here’s my code and results:

Summary of results to date:

 

  • Homer has clearly created an effective algorithm for detecting and recovering from FIFO overflow events, and the subroutine that implements his algorithm can be used in an interrupt-driven or polling configuration.  I personally like the polling arrangement because it requires one less connecting wire, and removes the need for an ISR.
  • Both Homer and I have demonstrated that the algorithm works as designed, but the loop delay required to just trigger FIFO overflows in my configuration is almost exactly twice the delay needed for Homer’s. This needs to be explained.
  • There is still the problem of a factor of 2 error between the expected return from mpu.getFIFOCount() and the number calculated by multiplying the number of interrupts times the expected packet length. In my configuration using an interrupt-driven arrangement, a 22-interrupt loop delay resulted in a FIFO count of 308.  But, 22 x the expected packet size of 28 yields 616, not 308!  This also needs to be explained.

23 October 2019 Update:

To further investigate the ‘factor of 2 error’ problem, I went back and re-ran the initial experiment that produced the problem, just to verify that it was still there.  Here’s the entire program to recreate the results, and a partial printout of the results:

Significant points to note from the above output:

  • The time (in mSec) between adjacent output lines is about 111 mSece on average (110.5762 mSec according to Excel), and the reported number of interrupts is either 20 or 21 in almost every case (average is 20.0833333 according to Excel).  O’Scope observations confirm this is the case, as the output from the interrupt monitoring pin shows almost exactly 5 mSec between interrupts and an almost exactly 200 Hz interrupt frequency.  So, an interrupt count of 20 or 21 is reasonable, and cannot be the reason for the ‘factor of 2’ error in the FIFO buffer count.  However, there is an apparent ‘off by 2’ problem with the interrupt count, as the reported FIFO counts are consistent with interrupt counts of 22 & 23 rather than 20 or 21 as shown
  • Every so often a 22 interrupt span produces a FIFO count of 336 instead of 308 – a 28 byte difference.  In fact, over a run time of about 6.5 minutes (388,671 mSec), this phenomenon occurred 264 times, about 0.07% of the time.

The inference I draw from the above two points is that the MPU6050 chip isn’t actually loading an entire 28-byte packet during each interrupt cycle, but is in fact loading only 14 bytes each time.  With an artificially imposed 100 mSec (about 22 interrupt cycles) loop delay, the MPU loads 22 * 14 = 308 bytes.

An alternative explanation is that the MPU6050 loads complete packets into the FIFO every other interrupt. Under this scenario, it takes 11 cycles (22 interrupts) to load 11*28 = 308 bytes, and 12 cycles (24 interrupts) to load 12 * 28 = 336 bytes.

Another (and maybe even more reasonable) possibility is that the MPU6050 loads packet data into the FIFO one byte at a time, at an average rate of 14 bytes per 5 mSec, or 2.8 KBS. To the outside world, this might not be noticeable unless the application was trying to retrieve packets at the full 200 Hz.  At rates of 100Hz or less, the MPU would still have loaded at least one complete packet every time one was requested.

29 October 2019 Update:

As someone once said as both a benediction and a curse – “May you live in interesting times”.  In our ongoing investigation into the depths of MPU6050 behavior, we now have solved one mystery solved, only to encounter another one.

As I have noted several times above, there appears to be a factor of 2 mismatch between the number of interrupts counted from ISR activations and the number of bytes reported by mpu.getFIFOCount().  Either the interrupt count is off, or the FIFO count is off, and there doesn’t seem to be any other explanation.  Well, now I have determined that there is apparently a third option – that the MPU6050 only loads a packet into the FIFO on every other interrupt! This doesn’t make a whole lot of sense to me, but I now have what I believe to be irrefutable proof that this is exactly what is happening.

I set up a very simple program to get the FIFO byte count as rapidly as possible.  In order to avoid slowing the system down, I stored the results in a 1000-entry array during the retrieval process, and then printed out all 1000 entries at the end. Then I plotted the results in Excel as shown in the following figure:

The stairsteps in the above plot are (almost) uniformly the expected packet size of 28 bytes;  the MPU6050 is clearly loading entire 28-byte packets into the FIFO each time, contrary to one of the possibilities I had considered to explain the factor of 2 inconsistency between the interrupt count and the FIFO count.

However, when I started looking at the time interval between FIFO loads, I got the following plot:

As the above plot clearly shows, the MPU6050 loads a new packet into the FIFO every 10 mSec or so (I think it is exactly 10 mSec, with the differences explained by the lack of resolution in the time axis).  But wait – the MPU6050 produces an interrupt on the Arduino interrupt pin every 5 mSec – not every 10 mSec!  So, the mystery of the ‘factor of 2’ error is now solved.  The MPU6050 loads a new packet into the FIFO every other interrupt – not every interrupt as expected. So, the interrupt count is too high by a factor of 2 when compared to the actual FIFO count.

Unfortunately (or fortunately depending on one’s point of view), that simply begs the question – why is the MPU6050 producing interrupts on a 5 mSec schedule when it only changes the FIFO count on a 10 mSec schedule?  Who knows?

Changing the subject slightly, I got another ‘overflow proof’ version of GetCurrentFIFOPacket() from Homer Creutz to try.  I set up a small program to test it.   Since the idea was that GetCurrentFIFOPacket() would return a valid packet no matter how long it had been since the last time it was called, I set up a program to iterate through a sequence of delay times from 100 mSec to 500 mSec. For each loop delay setting I called GetCurrentFIFOPacket() multiple times and printed out the extracted yaw value and the return status from the function.

As the plot below shows, everything works swimmingly up to a loop delay of 350 mSec. Unfortunately, the wheels came off with a 400 mSec loop delay, and the packet values were all invalid after that – bummer!

 

06 November 2019 Update:

Holy cow!  Homer and I started this marathon project back in mid October, and we don’t seem to be any closer to nirvana than we were before.  What I can say though, is that I have learned a lot more about the MPU6050 and Jeff Rowberg’s driver software ;-).

One of the major issues we encountered with the polling method (vs interrupt-driven) is that, without the synchronization with MPU6050’s internal processes provided by the interrupt model, we can’t count on (no pun intended) the FIFO count returned by mpu.getFIFOCount() being accurate.  Depending on the timing of the call, the return value could be wildly inaccurate.  However, we discovered that two back-to-back calls to mpu.getFIFOCount() always resulted in an accurate count, although there was still a very small probability that a 3rd call would be required.  So, I created a small routine called ‘getPollingFIFOCount()’ to wrap this construct, as follows:

This function simply loops until two adjacent calls to mpu.getFIFOCount() return the same value. As always, however, there is a backup counter to force the function to exit if it gets hung up for any reason.  In the calling program I have a line like:

to set the backup loop counter value.

Another major issue with the MPU6050 is that it can overflow its packet FIFO buffer, and there doesn’t appear to be any way to prevent this, other than removing packets from the FIFO at the same rate or higher than they are loaded.  This may not be a problem for ‘high dynamics’ applications like quadcopters or balancing robots that need continuous attitude information, but for ‘slow dynamics’ applications like my wall-following robot where yaw information is only needed a few times per second, overflow becomes a practical certainty.  In an interrupt -driven environment, it might be reasonable to simply retrieve and then discard DMP packets as fast as they arrive, and then only process the latest packet when the application needs an update.  However, for a polling strategy, doing this may or may not work depending on what else is going on. So, for polling we need a way of ensuring we can retrieve a valid packet from the DMP FIFO, whether or not the FIFO has overflowed. Doing so would be trivial if the FIFO length was an integral multiple of packetSize, but it isn’t – yuk!!  So, now when the FIFO overflows, the first packet in the FIFO is guaranteed to be corrupted. The good news is, the last complete packet (i.e. the most recent information)  in the FIFO is always valid, but getting to that last good packet is non-trivial.  To summarize:

  • FIFO overflow is almost certain to happen in a ‘low dynamics’ polling-based program
  • When FIFO overflow occurs, the first packet is always corrupted, but the last one is still valid
  • The last (valid) packet isn’t the last [packetSize] bytes, due to the non-modular size of the FIFO
  • The MPU6050 DMP may start loading another packet at any time, but there will always be 10 mSec or so between packet loads
  • Any last-valid-packet retrieval algorithm must work for any loop delay time.

So, the ‘last-valid-packet-retrieval’ algorithm is something like this:

  1. Get the current packet count, using the ‘getPollingFIFOCount()’ routine above
  2. Extract [packetSize] chunks until there are less than 2* [packetSize]  bytes left. This ensures there is exactly one valid packet remaining in the FIFO.
  3. Extract one more [packetSize] chunk and return it as the result.

Note that implementation of this algorithm will require several ‘while’ loops, so there must also be provision for forcibly terminating all such loops in case some edge condition causes the normal exit condition to never be reached.  Homer and I have been creating and testing versions of this function for the last couple of weeks, without quite getting there yet.  Either they don’t handle all the edge conditions, or they are too slow as the loop delays get larger.

While I was testing my most recent concoction, I ran into a third major problem with the MPU6050 API.  I wanted to be able to remove up to 35 [packetSize] chunks (980 bytes) in one go, and I expected the mpu.getFIFOBytes() API call to manage the required chunking for me.  When I tried this trick, the getFIFOBytes() function crashed repeatedly.  Eventually I figured out that the reason it was crashing wa that it’s ‘length’ parameter is declared as a ‘uint8_t’ object and of course it couldn’t handle a value greater than 255 without choking.  I thought that was a little odd, but that maybe changing the declaration from ‘uint8_t’ to ‘uint16_t’ in MPU6050.h/cpp would solve the problem.  Nope – It turns out that, due to the way that the Arduino I2C bus operates, there is a limitation on how many bytes can be transferred across the bus in one operation, and this limit is currently set at 32 bytes.  As a result of this fundamental limitation, all the I2CDev functions that use the I2C bus also have the same underlying limitation and all of them have their length parameters declared as ‘uint8_t’. This reminds me of the old pre-scientific myth about the earth resting on the back of a turtle.  When the myth was challenged, the defender says “its no use – it’s turtles all the way down”.  In our case “it’s no use – it’s uint8_t all the way down!”.

So, I had to figure out a way around this problem, so I decided to create yet another wrapper function, this one cleverly called ‘getManyFIFOBytes()’. The idea for this would be to pull one [packetSize] chunk off the FIFO at a time using the normal ‘mpu.getFIFOBytes()’ call and place the result in a destination buffer large enough to hold the entire result.  Since it has been a (long, long) while since I last played with pointer gymnastics, I decided to write a short test program to figure out a reasonable technique.  Here is the program, and some results:

As the output shows, the ‘getManyFIFOBytes(uint16_t* buffer, uint16_t len)’ function can take an arbitrary ‘uint16_t’ length parameter and fill the destination buffer with [packetSize] (28 bytes in this case) chunks, followed by the non-modular remainder.  Although this test was done with a simulated receive buffer and simulated packet contents, I believe it will work using the actual contents of the MPU6050 FIFO and repeated calls to ‘mpu.getFIFOBytes()’ to retrieve the ‘chunks’ and any non-modular remainder bytes.

Having convinced myself that my two helper functions actually did what I wanted, I revised my latest MPU6050 test program (MPU6050_Overflow8.ino) to test the whole thing out. The test program steps through a series of loop delays from 50 to 550 msec, and takes 25 yaw measurements at each loop delay setting.  The Excel plot and a snippet of the output log from the program is shown below

  • There were no invalid packets in the entire run, so the attempt to avoid invalid packet retrieval was a success.
  • The actual loop delay per measurement pass varied quite a bit from the desired loop delay setting.  For instance the average measured loop delay for the 25 yaw measurement passes at the 200 mSec loop delay setting was actually 274.08 mSec, almost 50% higher than desired.  At the 150 mSec loop delay setting, the average loop delay was 223.88 mSec, and at the 100 mSec setting it was 133.24 mSec.  So, if the application needs 5 measurements/sec, the allowable loop delay between passes needs to be between 100 and 150 mSec.

12 November 2019 Update:

Based on some comments and data from Homer’s experiments, it appears that a FIFO reset can be done in less than 10 mSec.  This means that a packet retrieval algorithm based on a mpu.resetFIFO() call will miss at most one 10 mSec FIFO load interval, which is insignificant compared to the typical polling interval (200 mSec in my case).  So I decided to try a ‘brute force’ approach to ‘GetCurrentFIFOPacket()’ as follows:

When I first tested this algorithm, I ran into an occasional glitch where the FIFO would somehow fail to reset, resulting in a FIFO count > 28. So I added an outer loop to allow multiple shots at getting a clean FIFO reset.

Shown below are some Excel plots from some long runs

 

The first plot above shows a long run (over 17 minutes) of valid yaw data (the perturbations in the yaw plot are due to occasional manual rotations of the sensor to verify that the sensor was still responding).  The interesting thing about this plot is the yellow curve, showing the ‘outer loop’ count. The only way this value can be greater than 1 is if the first mpu.resetFIFO() call fails to actually clear the FIFO, which appears to happen an average of about once per minute, or about once every 6,000 10 mSec MPU6050 FIFO cycles.

The second plot is a closeup of the first 171.42 seconds of the overall plot, showing the detail of the FIFO clear failures, occurring about once per minute.

So, it is clear that the MPU6050 has some significant behavioral quirks, especially when used in a non-interrupt-driven environment.  That being said, I believe the ‘brute force’ algorithm shown above is a reliable way of interfacing with the MPU6050 in a polling environment, and obviates the need for a separate interrupt line, and the associated ISR software.

This will probably be the last update on this subject, as I now think Homer and I have pretty much beat the MPU6050 FIFO issue to death.

Stay tuned!

Frank

 

 

 

 

 

 

 

 

Heading-based Turns Using MPU6050 and Polling vs Interrupts

Posted 06 October 2019

In previous posts I have described my efforts to integrate heading-based wall tracking into my two-wheel and four-wheel robots.  I installed a MPU6050 module into Wall-E2, my primary four-wheel robot, some time ago but was never able to make heading-based turns work for one reason or another.  In conjunction with some other experiments, I installed an MPU6050 module on my two-wheel robot so that I could investigate the issues with heading-based turns and heading-based wall tracking with a simpler hardware configuration.

With the two-wheel robot I was able to demonstrate successful heading-based wall tracking, but I was unable to port the capability to my four-wheel configuration.  Not only that, but for some reason I started having problems getting reliable yaw/heading values from my two-wheel robot configuration.  This post describes the steps I took to troubleshoot the problem, ultimately arriving at a stable polling-only (no interrupt line required) yaw/heading value retrieval algorithm suitable for both the two-wheel and four-wheel robot configurations.

Back to Basics:

As I always do when faced with a complex problem with conflicting results, I decided to simplify the problem as much as possible.  In this case that meant reducing the hardware configuration to just a MPU6050 module and an Arduino Mega controller, as shown below:

Arduino Mega and MPU6050

On the software side, I started with the simplest possible Arduino sketch – Jeff Rowberg’s ‘MPU6050_DMP6.ino’ example, included in his latest I2CDevLib library and described in this post.

After getting everything running properly in this very basic configuration using an interrupt-driven algorithm, I moved on to working with the polling-driven arrangement, to confirm that polling was a viable strategy.  To do this I modified the hardware to disconnect the interrupt line from the MPU6050 to the controller board, and modified the software as described in this post to use a polling arrangement vs interrupts.

After confirming that this simple example worked properly and seemed stable, it was time to work my way back into the two-wheel robot hardware and software configuration (again!).  To do this I started with the basic controller/MPU6050 only hardware configuration, but running my two-wheel robot software program, modified to eliminate everything but the ‘RollingTurn()’ function that uses heading information from the MPU6050 to initiate and terminate turns.  After some false starts and blind alleys, I finally arrived at a stable software configuration demonstrating consistent heading-based turn performance using polling only – no interrupts!  the code is shown below:

In the above code, the only relevant functions are the ‘GetIMUHeading()’ and ‘RollingTurn()’ functions, as shown below:

When the ‘RollingTurn()’ function is called, it waits for mpu.dmpPacketAvailable() to return TRUE, and then it calls GetIMUHeadingDeg(), which updates a global variable (subtly named ‘global_yawval’.  This value is then used to determine turn completion.

GetIMUHeadingDeg() reads bytes from the FIFO and computes a yaw value using the retreived quaternion data.

After getting everything going to my satisfaction, I added code to setup() for a 30-degree turn to the right followed by a 30-degree turn to the left, followed by an infinite loop of yaw value readouts.  The output from one test run is shown below.

Shown below are the yaw values plotted against time in Excel

The next step will be to port the updated software back into my two-wheel robot to confirm that heading-based turns can be accomplished automatically (this is something that I had going before, but…).

Stay tuned!

Frank

 

 

 

 

 

Polling vs Interrupt with MPU6050 (GY-521) and Arduino

Posted 04 October 2019,

In my last post I described my Arduino Mega test program to interface with the popular Invensense MPU6050 IMU and it’s GY-521 clone.  In this post I describe the interface configuration for using a polling strategy rather than relying on the IMU6050’s interrupt signal.  A polling strategy can be very useful as it is much simpler, and saves a pin connection from the MPU6050 to the controller; all that is required is +V, GND, SDA & SCL, as shown below:

With this wiring setup, the control program is shown below:

In the above program, the interrupt service routine (ISR) and the accompanying ‘attachInterrupt()’ setup function have been removed as they are no longer needed for the polling arrangement.  Instead, the program calls ‘mpu.dmpPacketAvailable()’ every time through the loop, and if a packet is available, GetIMUHeadingDeg() is called to read the packet and return a yaw value.  The rest of the code in the loop() function is the place holder for the ‘other stuff’ that the program does when it isn’t paying attention to the IMU.

In this test program, I have set this section up to execute every 100 Msec, but in my robot programs I usually set it up for a 200 Msec interval; 5 cycles/sec is plenty fast enough for a wheeled robot that uses only the IMU yaw information for turn management.

So far, this arrangement seems very stable; I have been running it now for several hours without a hiccup.

Stay tuned,

Frank

 

 

Basic Arduino/MPU6050 (GY-521) test

Posted 29 September 2019,

In my quest to figure out WTF happened to my ability to acquire real-time relative heading information on both my 2-wheel and 4-wheel robots, I have been trying to start from scratch with very simple controller/IMU hardware configurations.  After succeeding with a basic functionality demonstration using a Teensy 3.2 and a Sparkfun MPU6250 IMU breakout board, I decided the next step would be to do the same thing with an Arduino Mega controller and a GY-521 )MPU6050 clone) to more closely replicate the hardware configuration on my 2-wheel and 4-wheel robots.

As usual I started this project with a web search for basic MPU6050/Arduino examples, and I found this YouTube video showing just what I was after.  After going through the video several times to make sure I understood what was going on, I decided to try and duplicate it so I could compare my (hopefully) working demo code with my (currently non-working) robot code.

In my past efforts with the MPU6050, I had struggled with the complexities of using Jeff Rowberg’s wonderful (but quite massive and convoluted) I2CDevLib GitHub repository. There was always something that didn’t quite fit the situation, and making it fit invariably required a trip down the rabbit hole into Alice’s wonderland.  Getting the right combination of files in the right places seemed to be more a matter of luck than skill.  However, this particular video does a nice job of explicitly demonstrating what has to go where.  Essentially the magic steps are:

  • Download Jeff Rowberg’s IC2DevLib repository from GitHub into a ZIP file.
  • UnZip the repository files into a temporary folder
  • Copy the Arduino/I2CDev and Arduino/MPU6050 folders into the Arduino/Libraries folder. This makes them available to the Arduino IDE (and the VS2017/Visual Micro setup I use).
  • Open a new sketch in the Arduino IDE (or a new project in the VS/VisMicro environment) and then:
    • In the Arduino IDE select ‘File-Examples, and scroll down to the ‘Examples from Custom Libraries’ section. Then select ‘MPU6050->MPU6050_DMP6’  This will load the example code into the sketch.
    • In the VS/VM environment, select the Visual Micro Explorer (under the vMicro tab). Then click on the Examples tab, expand the ‘MPU6050’ section and then select the MPU6050_DMP6 example. This will load the code into the edit window.

Assuming you have the wiring setup correct, the example should run ‘out of the box’ with no required modifications.  However, after verifying that everything was working, I made the following changes:

  • The unmodified MPU6050_6Axis_MotionApps20.h file configures the MPU6050 DMP to send data packets to the controller at a fairly high rate – like 100Hz.  This is way too high for my robot application, so I changed the configuration to send packets at a 10Hz rate, by changing the MPU6050_DMP_FIFO_RATE_DIVISOR constant from 0x01 to 0x09 (lines 271-274) as shown below
  • The Arduino I2C library (Wire.h) has a well-known and documented flaw that causes the I2C bus to hang up on an intermittent basis, so I modified I2CDev.h lines 50-57 to use the SBWIRE library that contains timeouts to prevent this problem from happening

And the last change I made was to disable the interrupt service routine (ISR) and use a polling technique.  Instead of waiting for an interrupt, I simply poll the DMP register with

‘mpuIntStatus = mpu.getIntStatus();’

every time through the loop.  If the return value indicates that a data packet is ready, it is read; otherwise it does nothing.  This appears to be entirely equivalent to the interrupt technique as long as the loop is fast enough service the DMP’s FIFO.

30 September Update:

Well, something’s not equivalent, as the yaw values are fine for a few minutes, but then start showing up as ‘179.000’.  From my previous work I know this means that the line

mpu.getFIFOBytes(fifoBuffer, packetSize);

is getting out of sync with the DMP and isn’t reading a complete packet.  When I then changed the code back to the original interrupt-driven model, the yaw values stay valid forever.

03 October Update:

I modified the code to break the ‘put other programming stuff here’ block out of the ‘if()’ within a ‘while()’ within a ‘loop()’ structure for two reasons:

  • It gave me a headache every time I tried to figure out how it worked
  • I wanted to do ‘the programming stuff’ only once every K Msec where K was something like 100 or 200.  With the above nested structure, that would never work.

After removing extraneous comments and unused code, the resulting program is shown below:

Notes about the above program:

  • I used the SBWIRE library vs the normal Arduino WIRE library to avoid the well-known and well documented infinite blocking problems in the WIRE code.  This was accomplished by editing the I2C interface implementation section in I2Cdev.h as follows

  •  
  • I lowered the MPU6050 interrupt rate to 20Hz (I don’t need anything faster for my wall-following robot by modifying MPU6050_6AxisMotionApps20.h as follows:
  • The loop() function has just three blocks
    • if (!dmpReady) return; this bypasses everything else if the MPU6050 didn’t init correctly

    • All this section does is call GetIMUHeadingDeg() whenever an interrupt has been processed in the ISR

    • This section is the ‘everything else’ block. In my robot programs, this section runs the robot, using the yaw value output from the MPU6050 as appropriate.
  • I discovered that the local variable ‘fifoCount’ can become desynchronized from the actual FIFO count resulting in a situation where the line:

if (mpuInterrupt && fifoCount < packetSize)

in the loop() function fails with fifoCount == packetSize.  The fix for this was to remove the fifoCount comparison from the if() statement, making it just ‘if (mpuInterrupt)’.  This means the if() block will execute every time the interrupt occurs, whether or not there is data in the FIFO.

With the above modifications, the program has run for many hours with no problems, so I’m convinced I have most, if not all, the problems licked.  I’m still using the interrupt-driven version rather than the polling version I would prefer, but that’s a small price to pay for the demonstrated stability of the interrupt-driven version.

Future Work:

Next I plan to try the new MotionDriver 6.12 version of the MPU6050 DMP firmware, which is reputed to be faster, better, and more stable than the present 2.0 version.

04 October Update.

As it happens, the only thing that was required to change from MotionApps V2 to MotionApps V6.12 was to change #include “MPU6050_6Axis_MotionApps20.h” to #include “MPU6050_6Axis_MotionApps_V6_12.h” in little test program.  This compiled and ran fine, and the only difference I could see is that V6.12 has a fixed interrupt rate of about 200Hz, whereas V2.0 could be adjusted down to about 20Hz.  According to some Invensense documentation, the newer version has better/faster calibration capabilities and (maybe?) lower drift rates??

Stay Tuned

 

Frank

 

 

 

 

 

Accessing the Internet with an ESP32 Dev Board

Posted 27 August 2019

During my recent investigation of problems associated with the MPU6050 IMU on my 2-motor robot (which I eventually narrowed down to I2C bus susceptibility to motor driver noise), one poster suggested that the Espressif ESP32 wifi & bluetooth enabled microcontroller might be a good alternative to Arduino boards because the ESP32 chip is ‘shielded’ (not sure what that means, but…).  In any case, I was intrigued by the possibility that I might be able to replace my current HC-05 bluetooth module (on the 2-motor robot) and/or the Wixel shield (on the 4-motor robot) with an integrated wifi link that would be able to send back telemetry from anywhere in my house via the existing wifi network.  So, I decided to buy a couple (I got the Espressif ESP32 Dev Board from Adafruit) and see if I could get the wifi capability to work.

As usual, this turned out to be a much bigger deal than I thought.  My first clue was the fact that Adafruit went to significant pains on their website to note that the ESP Dev Board was ‘for developers only” as shown below:

Please note: The ESP32 is still targeted to developers. Not all of the peripherals are fully documented with example code, and there are some bugs still being found and fixed. We got many sensors and displays working under Arduino IDE, so you can expect things like I2C and SPI and analog reads to work. But other elements are still under dev

Undaunted, I got two boards, and set about connecting my ESP32 dev board to the internet.  I found several examples on the internet, but none of them worked (or were even understandable, at least to me).  That’s when I realized that I was basically clueless about the entire IoT world in general, and the ESP32’s place in that world in particular – bummer!

So, after lots of screaming, yelling, and hair-pulling (well, not the last because I don’t have much left), I finally got my ESP32 to talk to the internet and actually retrieve part of a web page without crashing.  In order to consolidate my new-found knowledge (and maybe help other ESP32 newbies), I decided to create this post as a ‘how to’ for ESP32 internet connections.

General Strategy

Here’s the general strategy I followed in getting my ESP Dev Board connected to the internet and capable of downloading data from a website.

  1. Install ESP32 libraries and tools into either the Arduino IDE or the Visual Micro extension to Microsoft Visual Studio (I have the VS 2019 Community Edition).
  2. Install and run a localhost server.  This was a great troubleshooting tool, as with it I could monitor website requests to the server.
  3. Install ‘curl’, the wonderful open-source tool for internet protocol data transfers.  This was absolutely essential for verifying the proper http request syntax needed to elicit the proper response from the server.
  4. Use curl to figure out the proper HTTP ‘GET’ string syntax.
  5. Modify the WiFiClientBasic example program to successfully retrieve a document from my localhost server.

Install ESP32 libraries and tools

This step by itself was not entirely straightforward;  I wound up installing the libraries & tools using the Arduino IDE rather than in the VS2019/Visual Micro environment.  I’m sure it can be done either way, but it seemed much easier in the Arduino IDE.  Once this is done, then the ESP32 Dev Board can be selected (in either the Arduino IDE or the VS/VM environment) as a compile target.

Install and run a localhost server

This step is probably not absolutely necessary, as there are a number of ‘mock’ sites on the internet that purport to help with IoT app development.  However, I found having a ‘localhost’ web server on my laptop very convenient, as this gave me a self-contained test suite for working through the myriad problems I encountered.  I used the Node.js setup for Win10, as described in this post.  The cool thing about this approach is the console window used to start the server also shows all the request activity directed to the server, allowing me to directly monitor what the ESP32 is actually sending to the site. Here are two screenshots showing some recent activity.

The first log fragment above shows the server starting up, and the first set of http requests.  The first half dozen or so requests are from another PC; I did this to initially confirm I could actually reach my localhost server.  This first test failed miserably until I figured out I had to disable my PC’s firewall – oops!  The next set of lines are from my curl app showing what is actually received by the server when I send a ‘GET’ request from curl.

The screenshot above shows some more curl-generated requests, and then a bunch of requests from ‘undefined’.  These requests were generated by my ‘WiFiClientBasic’ program running on the ESP32 – success!!

Install ‘curl’

Curl is a wonderful command-line program to generate http (and any other web protocol you can imagine) requests.  You can get the executable from this site, and unzip and run it from a command window – no installation required.  Using curl, I was able to determine the exact syntax for an http ‘GET’ request to a website, as shown in the screenshot below

The screenshot above shows curl being used from the command line.  The first line C:\Users\Frank>curl -v http://192.168.1.90:1337/index.html generates a ‘GET’ request for the file ‘index.html’ to the site ‘192.168.1.90’ (my localhost server address on the local network), and the -v (verbose) option displays what is actually sent to the server, i.e.

GET /index.html HTTP/1.1
> Host: 192.168.1.90:1337
> User-Agent: curl/7.55.1
> Accept: */*

This was actually a huge revelation to me, as I had no idea that a simple ‘GET’ request was a multi-line deal – wow! Up to this point, I had been trying to use the ‘client.send()’ command in the WiFiClientBasic example program to just send the ‘GET /index.html HTTP/1.1’ string, with a commensurate lack of success – oops!

Modify the WifiClientBasic example program

Armed with the knowledge of the exact syntax required, I was now able to modify the ‘WifiClientBasic’ example program to emit the proper ‘GET’ syntax so that the localhost server would respond appropriately.  The final program (minus my network login credentials) is shown below.

This produced the following output:

Conclusion:

After all was said and done, most of the problems I had getting the ESP32 to connect to the internet and successfully retrieve some contents from a website were due to my almost complete ignorance of HTTP protocol syntax.  However, some of the blame must be laid at the foot of the WiFiClientBasic example program, as the lack of any error checking caused multiple ‘Guru Meditation Errors’ (which I believe is Espressif-speak for ‘segmentation fault’) when I was trying to get everything to work.  In particular, the original example code assumes the website response will be available immediately after the request and tries to read an invalid buffer, crashing the ESP32.  My modified code waits in a 1 Msec delay loop for client.available() to return a non-zero result. As shown in the above output, this usually happens after 5-7 Msec.

In addition, I found that either the full syntax:

GET /index.html HTTP/1.1
Host: 192.168.1.90:1337
User-Agent: ESP32
Accept: */* {newline}

or just

GET /index.html HTTP/1.1{newline}

worked fine to retrieve the contents of ‘index.html’ on the localhost server, because the ‘host’ information is already present in the connection, and the defaults for the  remaining two lines are reasonable.  However, I believe the trailing {newline} is still required for both cases.

So, now that I can successfully use the ESP32 to connect to my local wireless network and perform internet functions, my plan is to try and use some of the IoT support facilities available on the internet (like Adafruit’s io.adafruit.com) to see if I can get the ESP32 to upload simulated robot telemetry data to a cloud-based data store. If I can pull that off, then I’ll be one step closer to replacing my current HC-05 bluetooth setup (on the 2-motor robot) and/or my Wixel setup (on the 4-motor robot).

Stay tuned!

Frank

 

 

 

Back to the future with Wall-E2. Wall-following Part VI

Posted 13 August 2019

In my last post on this subject, I discussed the idea of using orientation information to compensate raw wall offset distance values to account for the errors associated with robot orientation.  The idea was that if I could do that, then Wall-E2 would know how far he was away from the wall regardless of orientation, and would be able to make appropriate corrections to get to and stay at a predetermined offset from the wall.

Well, it didn’t really work out that way.  After getting through the geometry analysis and the math, it turned out that in order to use the compensation algorithm, I have to know the initial robot orientation with respect to the wall, and I don’t :-(.  Without knowing this, it is basically impossible to apply the correct compensation.  For example, if the robot is originally oriented 30º away from the wall, then a ‘toward-wall’ rotation will cause the measured distance to go down, and an upward compensation is required.  However, if the robot is initially oriented toward the wall, then that same ‘toward-wall’ rotation will cause the measured distance to go up and a downward compensation is required – bummer!

However, all is not lost;  the ability to perform relatively precise angular rotations means that I can use incremental rotations for acquiring and then tracking a predetermined offset distance.  In the acquisition phase, the robot orientation is changed in 10º increments in the appropriate direction, and an N-point slope calculation is performed to determine whether or not the current ‘cut angle’ will allow the robot to eventually reach the predetermined offset distance.   As the robot approaches the offset line, the cut angle is reduced until it is zero, in theory resulting in the robot travelling parallel to the wall at the offset distance.  At this point the robot transitions from ‘capture’ to ‘track’ mode, and the response to distance deviations becomes more robust.

This strategy was implemented using my 2-motor robot, and seems to work well once the normal crop of bugs was eradicated.  The following Excel plots show the results of two short runs where the robot first captured and then tracked a 30cm offset setting.

Capture and track a 30cm wall offset starting from the outside

Capture and track a 30cm wall offset starting from the inside

So far I have only implemented this completely for the right side, but as the left side is identical, I anticipate no problems in this regard.

Future Work:

So far I have demonstrated the ability to capture and then track a predetermined wall offset distance, starting from either inside or outside the desired offset distance. This represents a quantum leap in performance, as Wall-E2 currently can only track whatever distance it first measures – it has no capability to capture a desired offset distance.  However, there are still some ‘edge’ cases that need to be dealt with one way or the other.  For instance, if the robot orientation is too far away from parallel, the current algorithm won’t be able to rotate it enough to capture the desired offset or the measured distance will exceed the max range gate of the ping sensors (currently set at 200cm).  These conditions may not be all that deleterious, as eventually Wall-E2 will get close enough to something to trigger an avoidance response, thereby resetting the entire orientation picture (hopefully to something a little more parallel).

In addition to the wall tracking problem, the new capability to make reasonably precise angular rotations should significantly improve Wall-E2’s performance in handling ‘open-corner’ and ‘closed-corner’ situations; currently these cases are handled with timed turns, which are only correct for one floor covering type (hard vs soft) and battery state.  With the heading measurement capability, a 90º corner turn will always be (approximately) 90º whether it is on carpet or hard flooring.  In addition, now I can program in obstacle avoidance step-turns for approaching obstacles instead of relying entirely the ‘backup-and-turn’ approach.

Stay tuned!

Frank

 

 

Back to the future with Wall-E2. Wall-following Part V

Posted 08 August 2019

In my last post on this subject, I described some ideas for improving Wall-E2’s wall following performance by compensating for distance-to-wall errors caused by Wall-E2 not being oriented perfectly parallel to the wall.  The situation is shown in the diagram below:

When the robot is parallel to the wall, as shown in light purple, the ping sensor measures distance d1 to the wall.  However, when it rotates to make a wall-following adjustment, the ping sensor now measures distance d2, even though the robot’s center of rotation (CR) hasn’t moved at all.  If the wall-following algorithm is based strictly on ping distance, the robot tends to wander back and forth, chasing ping measurements that don’t reflect (no pun intended) reality.  I need a way of relating the measured distance to the distance from the robot’s CR to the wall, so that wall-following adjustments can be made referenced to the CR, not to the ping sensor position on the robot.

Given the above geometry, an expression can be developed to relate the perpendicular distance d1 and the measured distance d2, as shown below:

Expression relating perpendicular distance to measured distance for any rotation angle

I set up an experiment where the robot was placed on a platform about 16cm away from an obstacle.  I measured the ‘ping’ distance to the obstacle as the robot was manually rotated +/- 20 deg.  Then I plotted  the data in Excel as shown below:

In the above plot, the heading values (blue line) have been normalized to the initial heading and any linear drift removed.  After correction, the robot changes heading almost exactly +/- 20 deg.  Similarly, the measured distances (orange line) values were normalized to the nominal distance of 16cm.  As can be seen, the measured distance varied about +4 to -2 cm, even though the robot center of rotation (CR) remained fixed.  Then the distance compensation expression shown above was applied, resulting in the gray line.  This shows that the compensation expression is effective in reducing angle-induced distance changes.

Next, I set up a ‘live’ experiment with the 2-motor robot to more closely emulate the normal operating environment.  I set up a section of ‘wall’ and had the robot make a single 60 deg turn, starting with the robot angled about 30 deg toward the wall, and ending with the robot angled about 30 deg away from the wall.  Distance measurements were taken as rapidly as possible during the turn, but not before or after the turn started.

Here’s a short video of the 2-motor robot approaching a ‘wall’ at an angle of about 30º and making a single turn of about 60º.  The entire sequence is about 3 seconds long.  The robot runs straight for about 1 sec, then turns for about 1 sec, then goes straight again for about 1 sec.

The measured ‘ping’ distances for the 1-second turn portion of the run is shown in the Excel plot below

The above plot starts when the robot starts turning at about 1.2 sec into the video (the approach to the wall is not shown).  When the turn starts, the measured distance to the wall is  about 20 cm.  The measured distance decreases rapidly to about 16 cm at about 0.4 sec into the turn (about 1.6 sec into the video), and stays there for about 0.4 sec and then starts climbing rapidly to about 23 cm when the turn finishes.  However, the distance from the center of rotation (CR) of the robot to the wall changes hardly at all.  The blue painter’s tape in the background of the video has black markings each 5 cm, and it is possible to estimate the distance from the CR to the wall throughout the turn.  My estimate is that the robot’s CR starts at about 25 cm, decreases to about 22 cm at the apex of the turn, and then goes back to about 25 cm at the end of the turn.  The measured distance decreases 4 cm and then increases 8 cm while the robot’s CR decreases 3 cm and increases 3  – quite a difference, due entirely to the angle change between the robot and the wall during the turn.  After normalizing the heading values so that they reflect the angle off parallel and applying the distance compensation expression above, I got the following plot:

In the above plot, the gray line shows the corrected distance from the robot CR to the wall.  As estimated from the video earlier, the CR varies only about 1cm during the turn.  This is pretty strong evidence that the proposed distance correction scheme is correct and effective in removing distance measurement errors due to robot heading changes.

With the technique demonstrated above, I am optimistic that I can now not only improve wall tracking, but also can implement wall-following at a specific distance, say 25 cm.  The difficulty with trying to displace laterally to acquire and then lock to a specific distance is the large changes in measured difference due to the angle change needed to move toward or away from the wall made it impossible to determine where the robot’s CR actually was relative to the desired offset distance.  By (mostly) removing this orientation-induced error term, it should be feasible to determine the actions needed to approach and then track the desired offset distance.

Stay tuned!

Frank

08 February 2020 Update:

As I continued my campaign to integrate heading information into my wall-following robot algorithm, my efforts to compensate ‘ping’ distances for off-parallel robot orientations with respect to the nearest wall kept failing, and I didn’t know why.  I had gone through the math several times and was convinced it was valid, and as the plot above showed, it should work.

So, I made another run at it, completely redoing the math from the ground up – and running some more test in my ‘local range’ (aka my office).  Still no joy – no matter what I did, the math  seemed to be overcompensating, as shown in the plot below:

Ping Distance vs Calc Distance for two heading changes

This plot (and others like it)  convinced me that I was still missing something fundamental.  As I often do, I was thinking about this in bed while drifting off to sleep, and I realized that I might be able to determine the culprit by cheating; I would place the robot at a set distance from the wall, and carefully rotate it manually over a compass rose.  At each heading I would manually measure the distance from the ping sensor to the wall, perpendicular to the plane of the sensor (i.e. I would physically measure the distance I would expect the ping sensor to report), and also record the ‘ping’ distance reported by the sensor.  With just a few measurements the problem became obvious; the ‘ping’ distance for slant angles to the wall do not even remotely resemble the actual physical distance – it is much less, as shown below.

As can be seen , the compensation algorithm actually works quite well, when dealing with the physically measured slant range.  However, because the ‘ping’ distance loses accuracy very rapidly off-parallel angles beyond about 20 degrees, the compensation algorithm is ineffective.  A classic case of ‘GIGO’.

After performing the above experiment, I was still left with the mystery of why the compensation algorithm appeared to work so well before – WTF?  So, I went back and very carefully examined the previous plot and the underlying data, and discovered I’d made another classic experimental error – The ‘Calculated Distance’ data was plotted on the wrong scale.  When plotted on the correct scale, the plot changes to the one shown below:

Previous plot with ‘Calc Distance’ plotted on the correct scale

Now it is clear that the calculated compensation using ‘ping’ distances is not at all useful.

So, the bottom line on all of this it that the effort to apply a heading-based ping distance compensation was doomed to failure from the start, because the distance reported by the ping sensor is wildly inaccurate for off-perpendicular geometries.  The good news is that now at least I know why the compensation effort was doomed to fail!

In the meantime, I independently developed a technique for determining the heading required for orienting the robot parallel to the wall as the heading associated with the minimum ping distance achieved by swinging the robot back and forth. This technique utilizes the ping sensors in the realm where they are most accurate, and does away entirely with the need for compensation.

Stay tuned!

Frank

 

 

Back to the future with Wall-E2. Wall-following Part IV

Posted 30 April 2019

In two previous posts (here & here) I described my efforts to upgrade Wall-E2’s wall following performance using a PID control algorithm.  The results of my efforts to date in this area have not been very spectacular – a better description might actually be ‘dreadful’ :-(.

After some additional analysis, I came to believe that the reason the PID approach doesn’t work very well is a fundamental feature of the way Wall-E2 measures distance to the nearest wall.  Wall-E2 has two acoustic sonar units fixed to its upper deck, and they measure the distance perpendicular to the robot’s longitudinal axis.  What this means, however, is that when the robot is angled with respect to the nearest wall, the distance measured isn’t the perpendicular distance, but rather the hypotenuse of the right triangle with the right angle at the wall.  So, when Wall-E2 turns toward or away from the wall, the measured distance increases even though the robot hasn’t actually moved toward or away.  Conversely, if the robot is angled in toward the wall and then turns to be parallel, the measured distance decreases even if the robot hasn’t moved at all relative to that wall. The situation is shown in the sketch below:

Using Excel, I ran a simulation of the ping distance versus the actual distance for a range of angle offsets from 0 to 30 degrees, as shown below:

As shown above, the ping distance for a constant 25 cm offset ranges from 25 (robot longitudinal axis parallel to the wall) to almost 29 cm for a 30 degree off-axis heading. These values translate to a percentage error of zero to approximately 15%, independent of the initial parallel distance.

So, it becomes obvious why a standard PID algorithm has trouble; if the ping distance goes up slightly, the PID algorithm attempts to compensate by turning toward the wall.  However, this causes the ping distance to increase rather than decrease, causing the algorithm to command an even greater angle toward the wall, which in turn causes a further increase in ping distance – entirely backward.  The reverse happens for an initial decrease in the ping distance starting from a parallel orientation.  The algorithm commands a turn away from the wall, which causes the ping distance to increase immediately, even though the actual distance hasn’t changed.  This causes the algorithm to seriously overcorrect in one case, and seriously undercorrect in the other.   Not good.

What I need is a way to compensate for the changes in ping distance caused by Wall-E2’s angular orientation with respect to the wall being tracked. If Wall-E2 is oriented parallel to the wall, then no correction is needed; if not,then a correction is required.  Fortunately for the good guys, Wall-E2 now has a way of providing the needed heading information, with the integration of the MPU6050-based 6DOF IMU module described in this post from last September.

To investigate this idea, I modified an old test program to have Wall-E2 perform a series of mild S-turns in my test hallway while capturing heading and ping distance data.  The S-turns were tweaked so that Wall-E2 stayed a fairly constant 50 cm from the right-hand wall, as shown in the following movie clip.

 

Start of test area showing tape measure for offset distance measurement

Using Excel, I plotted the reported ping distance, the commanded heading, and the actual heading versus time, as shown below:

In the above plot, the initial CCW turn (away from the wall) was a 10° change, and all the rest were approximately 20° to maintain a more-or-less straight line.  At the end of the second (the first CW turn) and subsequent heading changes, there is an approximately 0.5 sec straight period, during which no data was captured.  As can be seen, the ping distance (gray curve) goes up slightly as the first CCW turn starts, then levels off during the changeover from CCW to CW turns, and then precipitously declines as the CW turn sweeps the ping sensor toward the perpendicular point.  Part of this decline is actual distance change caused by the 0.5 sec straight period that moves the robot toward the wall.  After the next (CCW) heading change is commanded, the robot starts to turn away from the wall causing the ping distance to increase, but this is partially cancelled by the fact that the robot continues to travel toward the wall during the S-turn. As soon as the robot gets parallel to the wall, then the ping distance goes up quickly as the heading continues to change in a CCW direction.  This behavior repeats for each S-turn until the end of the run.

As an exercise, I added another column to the spreadsheet – “perpendicular distance”, and set up a formula to compute the adjusted distance from the robot to the wall, using the recorded angular offset.  This computation presumes that the robot started off parallel to the wall (confirmed via the video clip).  The result is shown on the yellow line in the plot below:

Ping distance and heading vs time, with calculated perpendicular distance added

 

As can be seen from the above plot and video, the compensated distance looks like it might be a good match with the perpendicular distance estimated from the video. For instance, at 17 sec into the video, the robot has just finished the first clockwise turn and straight run, and is just starting the second counter-clockwise turn.  At this point the robot is oriented parallel to the wall, and the ping distance and the perpendicular distance should match. The video shows that distance should be about 33-35 cm, and the recorded ping distance at this point is 36 cm.  However, the calculated distance went directly from 45 cm at point 11 to 34 cm at point 12 and basically stayed at that value before changing rapidly from 34 to 45 over points 19 & 20.  Again at 19 seconds into the video, the robot is approximately 42-44 cm from the wall and parallel to it; both the actual ping distance and the calculated perpendicular distance agree at this point at 45 cm – a close match to the estimate from the video.

So now the question is – can I use the calculated perpendicular wall distance to assist wall-following operations?  A significant issue may be knowing when the robot is actually parallel to the wall, to establish a heading baseline for compensation calcs.

When is the robot parallel to the wall?

A unique feature of the point or points where the robot is parallel to the wall is that the ping distance and the calculated distance are equal.  However, that’s a bit of ‘chicken and the egg’ as one has to know the robot is parallel in order to use an offset angle of 0 degrees for the compensation calc to work out.  Since the heading information available from the MPU6050 IMU is only relative, the heading value for the parallel condition can be anything, and can vary arbitrarily from run to run.  So, what to do?  One thought would be to have the robot make a short S-turn at the start of any tracking run to establish the heading for which the ping distance goes through a minimum or maximum – the heading for the max/min point would be the parallel heading. From there on, that heading should be reliably constant until the next time the robot’s power is cycled.  Of course, a new parallel heading value would be required each and every time Wall-E2’s tracking situation changes (obstacle recovery, step-turns and reversals at the end of a hallway, changing from the left wall to the right one, etc).  Maybe a hybrid mode would be feasible, whereby the robot uses uncompensated heading-based S-turns instead of the current ‘bang-bang’ system for initial wall tracking, shifting to a compensation algorithm after a suitable parallel heading is determined.

Looking at the above plots, it may not be all that useful to look for maxima and/or minima, as there are multiple headings for which the ping distance is constant, so which one is the parallel heading?  Thinking about ways to rapidly find the parallel heading, it occurred to me that my previous work on quickly finding the mathematical variance of a set of values might be useful here.  I plugged the above ping distance numbers into the Excel spreadsheet I used before, and got the following plot of ping distance and 3-element running variance vs time.

So, looking at the above plot, it is encouraging that a 3-point running variance calculation shows near-zero values when the robot is most probably parallel or nearly parallel to the wall.  Adding the heading information to the spreadsheet gives the plot shown below

and now it is clear that the large variance values are associated with the changes from one heading to another, and the low variance values are associated with the middle section of each linear heading change (S-turn) segment.  If I further enhance the plot by putting the variance plot on a secondary scale and zooming in on the low variance sections, we get the plot shown below:

 

 

 

Variance scale modified to show 0-5 range only

In the above plot, the variance line is zoomed in to the 0-5 range only, emphasizing the 0-0.5 unit variance range.  In this plot it can be seen that the variance actually has a distinct minimum very near zero at time points 7, 16, 22, 28-30, and 35-38.  These time values correspond to robot heading values of 64, 61, 63, 61-67. and 70-65.  Discarding the last set as bad data (this is where the robot literally ‘hit the wall’ at the end of the run), we can compute an approximate parallel heading value as the average of all these values, or the average of 64, 61, 63, 64 (average of 61-67) = 63 degrees.  From the video we can see that the robot started out parallel to the wall, and the first heading reading was 62 degrees – a very good match to the calculated parallel heading value.

The next step, I think, is to run some more field tests against a wall, with wall-following and heading assist integrated into the code.

Frank

 

 

 

 

MPU6050 IMU Motor Noise Troubleshooting

Posted 24 July 2019

For a while now I’ve been investigating ways of improving the wall following performance of my autonomous wall-following robot Wall-E2.  At the heart of the plan is the use of a MPU6050 IMU to sense relative angle changes of the robot so that changes in the distance to the nearest wall due only to the angle change itself can be compensated out, leaving only the actual offset distance to be used for tracking.

As the test vehicle for this project, I am using my old 2-motor robot, fitted with new Pololu 125:1 metal-geared DC motors and Adafruit DRV8871 motor drivers, as shown in the photo below.

2-motor test vehicle on left, Wall-E2 on right

The DFRobots MPU6050 IMU module is mounted on the green perfboard assembly near the right wheel of the 2-motor test robot, along with an Adafruit INA169 high-side current sensor and an HC-05 Bluetooth module used for remote programming and telemetry.

This worked great at first, but then I started experiencing anomalous behavior where the robot would lose track of the relative heading and start turning in circles.  After some additional testing, I determined that this problem only occurred when the motors were running.  It would work fine as long as the motors weren’t running, but since the robot had to move to do its job, not having the ability to run the motors was a real ‘buzz-kill’.  I ran some experiments on the bench to demonstrate the problem, as shown in the Excel plots below:

Troubleshooting:

There were a number of possibilities for the observed behavior:

  1. The extra computing load required to run the motors was causing heading sensor readings to get missed (not likely, but…)
  2. Motor noise of some sort was feeding back into the power & ground lines
  3. RFI created by the motors was getting into the MPU6050 interrupt line to the Arduino Mega and causing interrupt processing to overwhelm the Mega
  4. RFI created by the motors was interfering with I2C communications between the Mega and the MPU6050
  5. Something else

Extra Computing Load:

This one was pretty easy to eliminate.  The main loop does nothing most of the time, and only updates system parameters every 200 mSec.  If the extra computing load was the problem, I would expect to see no ‘dead time’ between adjacent adjustment function blocks.  I had some debug printing code in the program that displayed the result of the ‘millis()’ function at various points in the program, and it was clear that there was still plenty of ‘dead time’ between each 200 mSec adjustment interval.

Motor noise feeding back into power/ground:

I poked around on the power lines with my O’scope with the motors running and not running, but didn’t find anything spectacular; there was definitely some noise, but IMHO not enough to cause the problems I was seeing.  So, in an effort to completely eliminate this possibility, I removed the perfboard sub-module from the robot entirely, and connected it to a separate Mega microcontroller. Since this setup used completely different power circuits (the onboard battery for the robot, PC USB cable for the second Mega), power line feedback could not possibly be a factor.  With this setup I was able to demonstrate that the MPU6050 output was accurate and reasonable until I placed the perfboard sub-module in close proximity to the robot; then it started acting up just as it did when mounted on the robot.

So it was clear that the interference is RFI, not conducted through any wiring.

RFI created by the motors was getting into the MPU6050 interrupt line to the Arduino Mega and causing interrupt processing to overwhelm the Mega

This one seemed very possible.  The MPU6050 generates interrupts at a 20Hz rate, but I only use measurements at a 5Hz (200mSec) rate.  Each interrupt causes the Interrupt Service Routine (ISR) to fire, but the actual heading measurement only occurs every 200 mSec. I reasoned that if motor-generated RFI was causing the issue, I should see many more activations of the ISR than could be explained by the 20Hz MPU6050 interrupt generation rate.  To test this theory, I placed code in the ISR that pulsed a digital output pin, and then monitored this pin with my O’scope.  When I did this, I saw many extra ISR activations, and was convinced I had found the problem.  In the following short video clip, the top trace is the normal interrupt line pulse frequency, and the bottom trace is the ISR-generated pulse train.  In normal operation, these two traces would be identical, but as can be seen, many extra ISR activations are occurring when the motors are running.

So now I had to figure out what to do with this information.  After Googling around for a while, I ran across some posts that described using the MPU6050/DMP setup without using the interrupt output line from the module; instead, the MPU6050 was polled whenever a new reading was required.  As long as this polling takes place at a rate greater than the normal DMP measurement frequency, the DMP’s internal FIFO shouldn’t overflow.  If the polling rate is less than the normal rate, then FIFO management is required.  After thinking about this for a while, I realized I could easily poll the MPU/DMP at a higher rate than the configured 20Hz rate by simply polling it each time through the main loop – not waiting for the 200mSec/5Hz motor speed adjustment interval.  I would simply poll the MPU/DMP as fast as possible, and whenever new data was ready I would pull it off the FIFO and put it into a global variable.  The next time the motor adjustment function ran, it would use the latest relative heading value and everyone would be happy.

So, I implemented this change and tested it off the robot, and everything worked OK, as shown in the following Excel plot.

And then I put it on the robot and ran the motors

Crap!  I was back to the same problem!  So, although I had found evidence that the motor RFI was causing additional ISP activations, that clearly wasn’t the entire problem, as the polling method completely eliminates the ISP.

RFI created by the motors was interfering with I2C communications between the Mega and the MPU6050

I knew that the I2C control channel could experience corruption due to noise, especially with ‘weak’ pullup resistor values and long wire runs.  However, I was using short (15cm) runs and 2.2K pullups on the MPU6050 end of the run, so I didn’t think that was an issue.  However, since I now knew that the problem wasn’t related to wiring issues or ISR overload, this was the next item on the list.  So, I shortened the I2C runs from 15cm to about 3cm, and found that this did indeed suppress (but not eliminate) the interference.  However, even with this modification and with the MPU6050 module located as far away from the motors as possible, the interference was still present.

Something else

So, now I was down to the ‘something else’ item on my list, having run out of ideas for suppressing the interference.  After letting this sit for a few days, I realized that I didn’t have this problem (or at least didn’t notice it) on my 4-motor Wall-E2 robot, so I started wondering about the differences between the two robot configurations.

  1. Wall-E2 uses plastic-geared 120:1 ‘red cap’ motors, while the 2-motor robot uses pololu 125:1 metal-geared motors
  2. Wall-E2 uses L298N linear drivers while the 2-motor version uses the Adafruit DRV8871 switching drivers.

So, I decided to see if I could isolate these two factors and see if it was the motors, or the drivers (or both/neither?) responsible for the interference. To do this, I used my new DPS5005 power supply to generate a 6V DC source, and connected the power supply directly to the motors, bypassing the drivers entirely.  When I did this, all the interference went away!  The motors aren’t causing the interference – it’s the drivers!

In the first plot above, I used a short (3cm) I2C wire pair and the module was located near, but not on, the robot. As can be seen, no interference occurred when the motors were run.  In the second plot I used a long (15cm) I2C wire pair and mounted the module directly on the robot in its original position.  Again, no interference when the motors were run.

So, at this point it was pretty definite that the main culprit in the MPU6050 interference issue is the Adafruit DRV8871 switch-mode driver.  Switch-mode drivers are much more efficient than L298N linear-mode drivers, but the cost is high switching transients and debilitating interference to any I2C peripherals.

As an experiment, I tried reducing the cable length from the drivers to the motors, reasoning that the cables must be acting like antennae, and reducing their length should reduce the strength of the RFI.  I re-positioned the drivers from the top surface of the robot to the bottom right next to the motors, thereby reducing the drive cable length from about 15cm to about 3 (a 5:1 reduction).  Unfortunately, this did not significantly reduce the interference.

So, at this point I’m running out of ideas for eliminating the MPU6050 interference due to switch-mode driver use.

  • I read at least one post where the poster had eliminated motor interference by eliminating the I2C wiring entirely – he used a MPU6050 ‘shield’ where the I2C pins on the MPU6050 were connected directly to the I2C pins on the Arduino microcontroller.  The poster didn’t mention what type of motor driver (L298N linear-mode style or DRV8871 switch-mode style), but apparently a (near) zero I2C cable length worked for him.  Unfortunately this solution won’t work for me as Wall-E2 uses three different I2C-based sensors, all located well away from the microcontroller.
  • It’s also possible that the motors and drivers could be isolated from the rest of the robot by placing them in some sort of metal box that would shield the rest of the robot from the switching transients caused by the drivers.  That seems a bit impractical, as it would require metal fabricating unavailable to me.  OTOH, I might be able to print a plastic enclosure, and then cover it with metal foil of some sort.  If I go this route, I might want to consider the use of optical isolators on the motor control lines, in order to break any conduction path back to the microcontroller, and capacitive feed-throughs for the power lines.

27 July 19 Update:

I received a new batch of GY-521 MPU6050 breakout boards, so I decided to try a few more experiments.  With one of the GY-521 modules, I soldered the SCL/SDA header pins to the ‘bottom’ (non-label side) and the PWR/GND pins to the ‘top’.  With this setup I was able to plug the module directly into the Mega’s SCL/SDA pins, thereby reducing the I2C cable length to zero.  The idea was that if the I2C cable length was contributing significantly to RFI susceptibility, then a zero length cable should reduce this to the minimum  possible, as shown below:

MPU6050 directly on Mega pins, normal length power wiring

In the photo above, the Mega with the MPU6050 connected is sitting atop the Mega that is running the motors. The GND and +5V leads are normal 15cm jumper wires.  As shown in the plots below, this configuration did reduce the RFI susceptibility some, but not enough to allow normal operation when lying atop the robot’s Mega.

GY-521 MPU6050 module mounted directly onto Mega, normal length power leads

I was at least a little encouraged by this plot, as it showed that the MPU6050 (and/or the Mega) was recovering from the RFI ‘flooding’ more readily than before.  In previous experiments, once the MPU6050/Mega lost sync, it never recovered.

Next I tried looping the power wiring around an ‘RF choke’ magnetic core to see if raising the effective impedance of the power wiring to high-frequency transients had any effect, as shown in the following photo.

GND & +5V leads looped through an RF Choke.

Unfortunately, as far as I could tell this had very little positive effect on RFI susceptibility.

Next I tried shortening the GND & +5V leads as much as possible.  After looking at the Mega pinout diagram, I realized there was GND & +5V very close to the SCL/SDA pins, so I fabricated the shortest possible twisted-pair cable and installed it, as shown in the following photo.

MPU6050 directly on Mega pins, shortest possible length power wiring

With this configuration, I was actually able to get consistent readings from the MPU6050, whether or not the motors were running – yay!!

In the plot above, the vertical scale is only from -17 deg to -17.8 deg, so all the variation is due to the MPU6050, and there is no apparent deleterious effects due to motor RFI – yay!

So, at this point it’s pretty clear that a significant culprit in the MPU6050’s RFI susceptibility is the GND/+5V and I2C cabling acting as antennae and  conducting the RFI into the MPU6050 module.  Reducing the effective length of the antennas was effective in reducing the amount of RFI present on the module.

With the above in mind, I also tried adding a 0.01uF ‘chip’ capacitor directly at the power input leads, thinking this might be just as effective (if not more so) than shortening the power cabling.  Unfortunately, this experiment was inconclusive. The normal length power cabling with the capacitor seemed to cause just as much trouble as the setup without the cap, as shown in the following plot.

Having determined that the best configuration so far was the zero-length I2C cable and the shortest possible GND/+5V cable, I decided to try moving the MPU6U6050 module from the separate test Mega to the robot’s Mega. This required moving the motor drive lines to different pins, but this was easily accomplished.  Unfortunately, when I got everything together, it was apparent that the steps taken so far were not yet effective enough to prevent RFI problems due the switch-mode motor drivers

The good news, such as it is, is that the MPU6050/Mega seems to recover fairly quickly after each ‘bad data’ excursion, so maybe we are most of the way there!

As a next step, I plan to replace the current DRV8871 switch-mode motor drivers with a single L298N dual-motor linear driver, to see if my theory about the RFI problem being mostly due to the high-frequency transients generated by the drivers and not the motors themselves.  If my theory holds water, replacing the drivers should eliminate (or at least significantly suppress) the RFI problems.

28 July 2019 Update:

So today I got the L298N driver version of the robot running, and I was happy (but not too surprised) to see that the MPU6050 can operate properly with the motors ON  or OFF when mounted on the robot’s Mega controller, as shown in the following photo and Excel plots

2-motor robot with L298N motor driver installed.

However, there does still seem to be one ‘fly in the ointment’ left to consider.  When I re-installed the wireless link to allow me to reprogram the 2-motor robot remotely and to receive wireless telemetry, I found that the MPU6050 exhibited an abnormally high yaw drift rate unless I allowed it to stabilize for about 10 sec after applying power and before the motors started running, as shown in the following plots.

2-motor robot with HC-05 wireless link re-installed.

I have no idea what is causing this behavior.

31 July 2019 Update

So, I found a couple of posts that refer to some sort of auto-calibration process that takes on the order of 10 seconds or so, and that sounds like what is happening with my project.  I constructed the following routine that waited for the IMU yaw output values to settle

This was very effective in determining when the MPU6050 output had settled, but it turned out to be unneeded for my application.  I’m using the IMU output for relative yaw values only, and over a very short time frame (5-10 sec), so even high yaw drift rates aren’t deleterious.  In addition, this condition only lasts for a 10-15 sec from startup, so not a big deal in any case.

At this point, the MPU6050 IMU on my little two-motor robot seems to be stable and robust, with the following adjustments (in no particular order of significance)

  • Changed out the motor drivers from 2ea switched-mode DRV8871 motor drivers to a single dual-channel L298N linear mode motor driver.  This is probably the most significant change, without which none of the other changes would have been effective.  This is a shame, as the voltage drop across the L298N is significantly higher than with the switch-mode types.
  • Shortened the I2C cable to zero length by plugging the GY-521 breakout board directly into the I2C pins on the Mega.  This isn’t an issue on my 2-motor test bed, but will be on the bigger 4-motor robot
  • Shortened the IMU power cable from 12-15cm to about 3cm, and installed a 10V 1uF capacitor right at the PWR & GND pins on the IMU breakout board.  Again, this was practical on my test robot, but might not be on my 4-motor robot.
  • Changed from an interrupt driven architecture to a polling architecture.  This allowed me to remove the wire from the module to the Mega’s interrupt pin, thereby eliminating that possible RF path.  In addition, I revised the code to be much stricter about using only valid packets from the IMU.  Now the code first clears the FIFO, and then waits for a data ready signal from the IMU (available every 50 mSec at the rate I have it configured for).  Once this signal is received, the code immediately reads a packet from the FIFO if and only if it contains exactly one packet (42 bytes in this configuration).  The code shown below is the function that does all of this.

Here’s a short video of the robot making some planned turns using the MPU6050 for turn management.  In the video, the robot executes the following set of maneuvers:

  1. Straight for 2 seconds
  2. CW for 20 deg, starting an offset maneuver to the right
  3. CCW for 20 deg, finishing the maneuver
  4. CCW for 20 deg, starting an offset maneuver to the left
  5. CW for 20 deg, finishing the maneuver
  6. 180 deg turn CW
  7. Straight for 3 sec
  8. 20 deg turn CCW, finishing at the original start point

So, I think it’s pretty safe to say at this point that although both the DFRobots and GY-521 MPU6050 modules have some serious RFI/EMI problems, they can be made to be reasonably robust and reliable, at least with the L298N linear mode motor drivers.  Maybe now that I have killed off this particular ‘alligator’, I can go back to ‘draining the swamp’ – i.e. using relative heading information to make better decisions during wall-following operations.

Stay tuned!

Frank

 

Arduino Remote Programming Using A HC-05 Bluetooth Module

Posted 10 June 2019

As part of my recent Wall-E2 Motor Controller Study, I reincarnated my old 2-motor robot as a test platform for Pololu’s ’20D’ metal gear motors.  When I got the robot put together and started testing the motors, I realized I needed a way to remotely program the Arduino controller and remotely receive telemetry, just as I currently do with my 4-wheel Wall-E2 robot.

On my Wall-E2 robot, remote programming/telemetry is accomplished using the very nice Pololu Wixel Shield.  However, I have been playing around with the cheap and small HC-05 Bluetooth module,  and decided to see if there was maybe a way to use this module as a replacement for the Wixel.

As I usually do, I started with LOTS of web research.  I found some posts claiming to have succeeded in remotely programming an Arduino using a HC-05 module, but the information was sketchy and incomplete, so I decided I would try and pull all the various sources together into a (hopefully) more complete tutorial for folks like me who want to use a HC-05 module for this purpose.

Overall Approach:

In order to remotely program an Arduino using a HC-05, the following basic parts are required:

  • A wireless link (obviously) between the PC and the HC-05.
  • A serial link between the PC and the Arduino and between the Arduino and the HC-05. This part is also well established, and the Arduino-to-HC-05 link can be done with either a hardware port (as with the Mega 2560) or a SoftwareSerial port using the SoftwareSerial library.  My tutorial uses the Mega 2560, so I use Tx/Rx1 (pins 18/19) for the Arduino-to-HC-05 link
  • A way of resetting the Arduino to put it back into programming mode, so the new firmware can be uploaded.
  • A serial connection between the HC-05 and Tx/Rx0 on the microcontroller – more about this later.

The Wireless Link

The HC-05 is a generic Bluetooth device, and as such is compatible with just about everybody’s Bluetooth setup – phones and PC’s.  I plan to use this with my Dell XPS15 9570 laptop, and I can pair with the HC-05 no problem.  Here’s a link to a tutorial on pairing with the HC-05, and here’s another.  As another poster mentioned, the pairing mechanism creates multiple ‘outgoing’ and ‘incoming’ COM ports, and it’s hard for me to figure out which to use.  In this last iteration, I found that I could remove the two ‘incoming’ COM ports and use just the ‘outgoing’ one. Don’t know if that is the right thing, but….

A serial link between the PC, the Arduino and the HC-05

This part is discussed and demoed in many tutorials, but the piece that is almost always missing is why you need to have this link in the first place. The reason is that several AT commands must be used in order to configure the HC-05 correctly for wireless Arduino program upload, and (as I understand it anyway), AT commands can only be communicated to the HC-05 via it’s hardware serial lines, and only when the HC-05 is in ‘Command’ or ‘AT’ mode.  The configuration step is a one-time deal; once the HC-05 is configured, it does not need to be done again unless the application requirements change.

A way of resetting the Arduino to accept firmware uploads

This is the tricky part.  As ‘gabinix’ said in this post:

Hi Paul… To be honest I couldn’t find any tutorials to explain how to program/upload sketches with the HC-05. In fact, the conclusion you came up with is in-line with all the information out there. But it’s actually an extremely simple solution.

The only thing that keeps the HC-05 from uploading a program to arduino is that it doesn’t have a DTR (Data Terminal Ready) pin which tells the arduino to reset and accept a new sketch.

The solution is to re-purpose the “state” pin (PI09)  on the breakout board. It’s purpose is to attach to an LED and indicate the connection status. It’s default setting is to send the pin HIGH when a connection is made, but you can simply enter into command mode of the HC-05 and use an AT COMMAND to tell it to send the pin LOW when a connection is made.

Voila! In about 1 minute of time you have successfully re-purposed the LED pin to a DTR pin which will reset your arduino to accept a new sketch when you hit the upload button.

A couple things to note… This will work for a pro-mini without additional hardware by connecting to the DTR pin. If you’re using an UNO or similar, you will need a capacitor in between our custom “state” pin and the reset pin on the uno. The reason is that the HC-05 will drive our custom pin LOW for the entire connection which would essentially be the same as holding the reset button the entire time. Having the cap in between solves that problem.

It a quick easy fix, takes about a minute to do. It’s just a lot harder to explain the steps to do it in a couple sentences.

Here’s a link to the AT COMMAND set —> http://robopoly.epfl.ch/files/content/sites/robopoly/files/Tutoriels/bluetooth/hc-05-at_command_set.pdf

and here’s a link to a tutorial, video, and sketch on how to enter the AT COMMANDS. —> http://www.techbitar.com/modify-the-hc-05-bluetooth-module-defaults-using-at-commands.html  <<< no longer available 🙁

So, the trick is to re-purpose the STATE output (PI09, AKA Pin 32, AKA LED2, see this link) via the AT+POLAR(X,0) command to go LOW when the connection to upload the program is first started.  This signal is then connected to the Arduino’s RESET pin via the capacitor noted above (to make this signal momentary).  The ‘Instructables’ tutorial on this subject at this link actually gets most of this right, except it doesn’t explain why the AT commands are being entered or what they do – so I found it a bit mysterious.  In addition, it recommends soldering a wire directly to pin 32 rather than re-purposing the STATE output pin (re-purposing the STATE pin allows a no-solder setup). Eventually I ran across this link which contains a very good explanation of the AT commands used by the HC-05.  The required AT commands are:

My module is the variety with a small pushbutton already installed on the ‘EN’ pin, so entering ‘Command’ mode is accomplished by holding the pushbutton depressed while cycling the power, and then releasing the button once power has been applied.

When this is done, the LED will change from fast-blink to a very slow (like 2 sec ON, 2 sec OFF) blink mode, as shown in the following short video:

This indicates the HC-05 is in ‘Command’ mode and will accept AT commands.  If you have the style without the pushbutton, you’ll have to figure out a way to short across the pads where the pushbutton should be, while cycling the power.

The screenshot below shows the result of executing these commands using the wired USB connection to the Arduino and the hard-wired serial connection between the Mega’s Tx1/Rx1 port and the HC-05 running in ‘Command’ mode.

HC-05 configuration using the wired serial port connection to the HC-05

NOTE:  The various posts and tutorials on the HC-05 describe separate AT ‘mini’ and ‘full’ command modes; the ‘mini’ mode only recognizes a small subset of all AT commands, while ‘full’ recognizes them all.  ‘Mini’ mode is entered by momentarily applying VCC to pin 34, and ‘full’ mode is entered by holding pin 34 at VCC for the entire session.  One poster described this as a flaw in the HC-05 version 2 firmware which might be corrected in later versions.  It appears this may have been the case, as the HC-05 module I used responded with VERSION:3.0-20170601 and recognized all the commands I gave it (not a comprehensive test, but enough to make me think this problem has gone away).

Wiring Layout for HC-05 Configuration via AT commands

I decided that this post was my chance to learn how to make ‘pictoral’ wiring diagrams using the Fritzing app.  I had seen other posts with this kind of layout, and initially thought it was kinda childish.  However, when I started working with Fritzing (in English, ‘Fritzing’ sounds like an adverb, not a proper noun – so a bit strange to my ears…), I realized it has a LOT of power, so now I’m a convert ;-).

HC-05 wired for initial configuration using AT commands

In the diagram above, I’m using the Rx1/Tx1 (pins 19/18) hardware serial port available on the Mega.  If you are using a Uno, you’ll need to use SoftwareSerial to configure a second port for connection to the HC-05.  A 2.2K/1.0K voltage divider is used to drop Arduino Tx output voltages to HC-05 Rx input levels, but no conversion is required in the other direction. The HC-05 can be powered directly from Arduino +5V, as the HC-05 has an onboard regulator.

Initial AT Configuration Arduino Sketch

All the code above does is transfer keystrokes from the Arduino to the HC-05, and vice versa. This is all that is required to configure the HC-05 using AT commands.

Serial Connection between the HC-05 and Tx/Rx0 for Program Uploads

Most Arduino microcontrollers are shipped with a small program called a ‘bootloader’ already installed.  This small program is only active for a few seconds after a board reset occurs, and it’s job is to detect when a new program is being uploaded.  If the bootloader sees activity on whatever serial port it is watching, it writes the incoming data into program memory and then transfers control to the user program.  The stock Arduino bootloader only monitors Tx/Rx0 for this; activity on other ports (specifically Rx1 in my case) will be ignored and program uploads will fail.  After the HC-05 has been initially configured via AT commands over the PC-to-Arduino-to-HC-05 serial links, the connection from the HC-05 to the Arduino must be changed so that PC-to-HC-05 data transferred over the Bluetooth link arrives at the Arduino’s Rx0 port so the stock bootloader will see it and write it to the Arduino’s program memory.  This minor point wasn’t at all clear (at least not to me) in the various tutorials, so I wasted a LOT of time trying to figure out why I couldn’t get the last part of the puzzle to fit – ugh!

Shown below is my Fritzing diagram for the final configuration of my test setup, showing the Tx/Rx lines changed from Tx/Rx1 (pins 18/19) to Tx/Rx0 (pins 1/0). The HC-05 STATE output is connected to Arduino reset via a 0.22uF capacitor, with resistors to form a simple one-shot circuit.  The STATE line goes LOW (after reconfiguration via the AT+POLAR=1,0 command) which causes a momentary LOW on the Arduino reset line.  This is the magic required to upload programs to the Arduino wirelessly. When the Bluetooth connection is terminated, the STATE line goes HIGH again and the Arduino end of the now-charged capacitor jumps to well above 5V. The diode shown on the diagram clamps this signal to within a volt or so above +5V to avoid damage to the Arduino Reset line when this happens.  This diode isn’t shown on any of the other tutorials I found, so it is possible the Arduino Reset line is clamped internally (good).  It’s also possible it isn’t protected, in which case not having this diode will eventually kill the Arduino (bad).

HC-05 wired for remote program upload. Note that the Tx & Rx lines have been moved from Tx/Rx1 to Tx/Rx0

Testing

The first thing I did after configuring the HC-05 (using the above AT commands) was to see if I could still connect to and communicate with it over Bluetooth from my laptop.  I used RealTerm, although any terminal program (including the Arduino IDE serial monitor) should do.  The very first thing that happened is I had to re-pair the laptop with the HC-05, and the name given by the HC-05 was markedly different, as shown in the captured pairing dialog.

Pairing dialog on my Dell XPS15 9570 laptop

The next thing was to see if I could get characters from my BT serial connection through to my Arduino serial port.  After fiddling around with the baud rates for a while, I realized that now I had to change the BT serial terminal baud rate from 9600 to 115200, and the Arduino-to-HC-05 baud rate from 38400 (the default ‘Command’ mode rate) to 115200.  Once I did this, I could transmit characters back and forth between RealTerm (connected to the HC-05 via Bluetooth) and my Visual Studio/Visual Micro setup (connected to Arduino via the wired USB cable) – yay!

For the next step in the testing, I need to remove the hard-wired USB connection and power the Arduino from an external power source.  When I did this by first removing the USB connector (thereby removing power from the HC-05) and then plugged in external power, I noticed that the HC-05 was no longer connected to my laptop (the HC-05 status LED was showing the ‘fast blink’ status, and my connection indicator LED was OFF).  I checked in my BT settings panel, and the HC-05 (now announcing itself as ‘H-C-2010-06-01’) was still paired with my laptop, but just transmitting some characters from my RealTerm BT serial monitor did not re-establish the connection.  However, when I changed the port number away from and then back to the BT COM port, this did re-establish the connection; the HC-05 status LED changed to the 2-blinks-pause-2-blinks cycle, and my connection LED illuminated.

So, now I connected the output of my STATUS line one-shot circuit to the Arduino reset line and changed my VS2017/VM programming port from the wired USB port to the BT port (interestingly it was still shown as ‘HC-05’ in Visual Studio/Visual Micro).  After some initial problems, I got the ‘Connected’ status light, but the upload failed with the error message “avrdude: stk500v2_getsync(): timeout communicating with programmer” and the communication status changed back to ‘not connected’.

At this point I realized I was missing something critical, and yelled (more like ‘pleaded’) for help on the Arduino forum.  On the forum I got a lot of detailed feedback from very knowledgeable users, most notably ‘dmjlambert’.  Unfortunately dmjlambert was ultimately unsuccessful in solving the problem, but he was able to validate that the steps I had taken so far were correct as far as they went, and ‘it should just work’.  To paraphrase the Edison approach to innovation, “we didn’t know what worked, but we eliminated most potential failure modes”.  See this forum post for the details.

After this conversation (over several days), I decided to put the problem down for a few days and do other things, hoping that a fresh look at things with a clear head might provide some insight.  A few days later when I came back to the project, I ran some tests suggested by dmjlambert to verify that the connection to the Arduino RESET pin via the 0.22uF capacitor did indeed reset the Arduino when the STATE line transitioned from HIGH to LOW.  To do this I created a modified ‘Blink’ program that blinked 10 times rapidly and then transitioned to a steady slow blink.  Using this program I could see that that the Arduino did indeed reset each time a Bluetooth connection to the HC-05 was established.

So, the problem had to be elsewhere, and about this time I realized I was assuming (aka ‘making an ass out of you and me’) that the program upload data being received over the Bluetooth link was somehow magically making it to the bootloader program.  This had been nagging at me the whole time, but I ‘assumed’ (there’s that word again) that since this problem had never been mentioned in any of the tutorials or even in the responses to my forum posts, it must not be a problem – oops!

Anyway, to make a long story short, I moved the HC-05 – to – Arduino connection from Rx/Tx1 to Rx/Tx0 and program uploads started working immediately – YAY!!

I went back through the tutorials I had been following to see if I had missed this magic step, and didn’t find any references to moving the serial connection at all.  So, if you are doing this with a UNO, you’ll need to move the serial connection from whatever pins  you were using (via SoftwareSerial) to Rx/Tx0 as the last step.  If you are using an Arduino Mega or other uino controller that supports additional hardware serial ports as I did, you’ll have to move the connection from Rx/Tx-whatever to Rx/Tx0 as the last step.

This tutorial was put together in the hope that I could maybe help others who are interested in using the HC-05 Bluetooth module for remote program uploads to a Arduino-compatible microcontroller, and maybe save them from some of the frustration I experienced.  Please feel free to comment on this post, especially if you see something that I got wrong or missed.

13 Aug 2019 Update:

Here’s a short video showcasing the ability to program an Arduino Mega 2560 wirelessly from my Windows 10 PC using the HC-05 Bluetooth module

At the start of the video, the HC-05 status light is blinking rapidly, signalling the ‘No Connection’ state.  Then, at about 2 seconds, the light changes to the slow double-blink ‘Connected’ state, the yellow LED on the Mega blinks OFF & then ON again, signalling that the Mega has been reset and is now awaiting program upload, followed immediately by rapid blinking as the new program is uploaded to the Mega’s program memory.  During the upload, the HC-05 status LED continues to show the slow double-blink ‘Connected’ status.  Then, at about 18 seconds, the program upload terminates and the HC-05 returns to the ‘No Connection’ state.

The small white part on the green perf-board is the 220 nF capacitor.  The other two modules on the perf-board are a MPU6050 IMU and a high-side current sensor.

Stay tuned!

Frank

 

25 October 2021 Update:

I came back to this post to refresh my memory when trying to initialize and use a new HC-05 module for my new Wall-E3 project, and failing badly. I finally got something to work, but only after screwing around a lot. I realized I didn’t have a good handle on what mode the HC-05 was in – even though the onboard LED changes behavior to indicate the mode. So, here is a short video showing the LED behavior for the ‘disconnected’ and ‘connected’ modes.

HC-05 LED indications for ‘connected’ and ‘disconnected’ modes

In the above video, the HC-05 starts out in the normal power-on ‘disconnected’ state (rapidly flashing LED). Then after a few seconds a BT connection is established, and the LED behavior changes to ‘connected’ (two short blinks and a long pause). Then after a few more seconds the connection is dropped and the LED behavior changes back to ‘disconnected’ (rapidly flashing)