Regular expressions are fun! (repeat 5x, or until you believe it)
I'm working on a webbot, and right now I need it to drop all the HTML and just leave me with the text. So I wrote this regex:
(explanation: start match at "<" look for any character "." any number of times "*" and stop when you come to ">" (but really, it goes all the way to the very last ">" it finds and stops).
Of course, it then matched everything from the first < all the way to the last >, dropping all text that was properly encapsulated by HTML tags.
So, next I wrote this:
(explanation: start at "<" find any character I could think of except for > "[a-zA-Z\t "=0-9_\-\\/]" any number of times "*" and then stop when you come to ">" (stops at the first >))
Wow... that's... insanity... I probably even missed something. It did, however, only drop the HTML tags themselves. However, it's nasty looking.
I then realized I could just write this:
(explanation: start at "<" find any character except > "[^>]" any number of times "*" and stop as soon as you come to ">")
Yeah, it looks like some sort of ascii art of "The Cheat" or something, but it very elegantly finds the beginning and ending of a tag. See, regex is fun!
Here is the final code btw:
$htmlSearch = '/<[^>]*>/';
$cleanLine = preg_replace($htmlSearch, "", $line);