python - Finding more than one occurence using a regular expression -


is possible capture of information in href using 1 regular expression?

for example:

<div id="w1">     <ul id="u1">         <li><a id='1' href='book'>book<sup>1</sup></a></li>         <li><a id='2' href='book-2'>book<sup>2</sup></a></li>         <li><a id='3' href='book-3'>book<sup>3</sup></a></li>     </ul> </div> 

i want book, book-2 , book-3.

short , simple:

html = '<div id="w1"><ul id="u1"><li><a id='1' href='book'>book<sup>1</sup></a></li><li><a id='2' href='book-2'>book<sup>2</sup></a></li><li><a id='3' href='book-3'>book<sup>3</sup></a></li></ul></div>' result = re.findall("href='(.*?)'", html) 

explanation:

match character string “href='” literally (case sensitive) «href='» match regex below , capture match backreference number 1 «(.*?)»    match single character not line break character (line feed) «.*?»       between 0 , unlimited times, few times possible, expanding needed (lazy) «*?» match character “'” literally «'» 

Comments

Popular posts from this blog

java - Intellij Synchronizing output directories .. -

git - Initial Commit: "fatal: could not create leading directories of ..." -