Chomper Stomping
jQuery/JavaScript/CSS 3/HTML 5, Java/PHP/Python/ActionScript, Git, Chrome/Firefox Extensions, Wordpress/Game/iPhone App Development and other random techie tidbits I've collected



Uncategorized

July 23, 2008

preg_*

More articles by »
Written by: Christopher McCulloh

Regular expressions are fun! (repeat 5x, or until you believe it)

I’m working on a webbot, and right now I need it to drop all the HTML and just leave me with the text. So I wrote this regex:

/<.*>/

(explanation: start match at “<” look for any character “.” any number of times “*” and stop when you come to “>” (but really, it goes all the way to the very last “>” it finds and stops).

Of course, it then matched everything from the first < all the way to the last >, dropping all text that was properly encapsulated by HTML tags.

So, next I wrote this:

/<[a-zA-Z\t "=0-9_\-\\/]*>/

(explanation: start at “<” find any character I could think of except for > “[a-zA-Z\t "=0-9_\-\\/]” any number of times “*” and then stop when you come to “>” (stops at the first >))

Wow… that’s… insanity… I probably even missed something. It did, however, only drop the HTML tags themselves. However, it’s nasty looking.

I then realized I could just write this:

/<[^>]*>/

(explanation: start at “<” find any character except > “[^>]” any number of times “*” and stop as soon as you come to “>”)

Yeah, it looks like some sort of ascii art of “The Cheat” or something, but it very elegantly finds the beginning and ending of a tag. See, regex is fun!

Here is the final code btw:

$htmlSearch = ‘/<[^>]*>/’;
$cleanLine = preg_replace($htmlSearch, “”, $line);



About the Author

Christopher McCulloh
E-Commerce developer at Finish Line Co-Author of HTML, XHTML and CSS All-in-one Desk Reference for Dummies Graduated from IU with a Bachelors of Media Arts and Science and a Certificate in Applied Computer Science. Tech Editor for Building Facebook Applications for Dummies and Building Websites All-in-one for Dummies 2nd Edition. Creator and maintainer of the Status-bar Calculator Firefox Extension Three years professional experience in Java E-Commerce Development and four years professional experience with PHP for a combined total of seven years professional JavaScript/HTML/CSS experience




 
 

 
Screen Shot 2012-05-02 at 10.23.54 AM

Douglas Crockford – JavaScript the Good Parts

Really good lecture on JavaScript, especially if you are new to the language from another language like Java.
by Christopher McCulloh
0

 
 
cloudwars

Let the Cloud Wars Begin!!!

This is all going the same direction. They are all going to end up with unlimited storage for the same price point per year, so, the sooner one of them calls it and just offers that, the better that one will do. They need to ju...
by Christopher McCulloh
0

 
 
logo

dynode Batch Get Item

Working a lot with node.js, dynode and dynamoDB recently. Still trying to wrap my head around it all. Had a horrible time getting dynode.batchGetItem to work. Here is the error I was getting: { name: 'AmazonError', type: 'Valid...
by Christopher McCulloh
0

 

 
mysqlerror

WP phpBB Bridge: Warning: mysql_set_charset() expects parameter 2 to be resource, boolean given

Warning: mysql_set_charset() expects parameter 2 to be resource, boolean given in wp-content/plugins/wp-phpbb-bridge/inc/widgets/wpbb_topics_widget.php on line 149 This is an error caused by the fact that the WP phpBB Bridge pl...
by Christopher McCulloh
0

 
 
 

Events Calendar Pro Nav Formatting Messed up on Empty Calendar

The Events Calendar Pro (from http://tri.be/) has a few problems. If you are trying to figure out why a calendar with no events in that month has completely screwed up header navigation, just put this line of code inside of tab...
by Christopher McCulloh
5

 




One Comment


  1. Val

    Thank you very much for the post on phpmanual.net regarding enabling curl when using xampp.



Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>