python - Finding more than one occurence using a regular expression -

- August 15, 2011

is possible capture of information in href using 1 regular expression?

for example:

<div id="w1">     <ul id="u1">         <li><a id='1' href='book'>book<sup>1</sup></a></li>         <li><a id='2' href='book-2'>book<sup>2</sup></a></li>         <li><a id='3' href='book-3'>book<sup>3</sup></a></li>     </ul> </div>

i want book, book-2 , book-3.

short , simple:

html = '<div id="w1"><ul id="u1"><li><a id='1' href='book'>book<sup>1</sup></a></li><li><a id='2' href='book-2'>book<sup>2</sup></a></li><li><a id='3' href='book-3'>book<sup>3</sup></a></li></ul></div>' result = re.findall("href='(.*?)'", html)

explanation:

match character string “href='” literally (case sensitive) «href='» match regex below , capture match backreference number 1 «(.*?)»    match single character not line break character (line feed) «.*?»       between 0 , unlimited times, few times possible, expanding needed (lazy) «*?» match character “'” literally «'»

Search This Blog

GHI

python - Finding more than one occurence using a regular expression -

Comments

Post a Comment

Popular posts from this blog

reporting services - Visible Export Data Feed option SSRS report -

git - Initial Commit: "fatal: could not create leading directories of ..." -

parcelable - Unmarshalling unknown type code exception while reading parcel values in Android 4.4+ -