python - Finding more than one occurence using a regular expression -
is possible capture of information in href
using 1 regular expression?
for example:
<div id="w1"> <ul id="u1"> <li><a id='1' href='book'>book<sup>1</sup></a></li> <li><a id='2' href='book-2'>book<sup>2</sup></a></li> <li><a id='3' href='book-3'>book<sup>3</sup></a></li> </ul> </div>
i want book
, book-2
, book-3
.
short , simple:
html = '<div id="w1"><ul id="u1"><li><a id='1' href='book'>book<sup>1</sup></a></li><li><a id='2' href='book-2'>book<sup>2</sup></a></li><li><a id='3' href='book-3'>book<sup>3</sup></a></li></ul></div>' result = re.findall("href='(.*?)'", html)
explanation:
match character string “href='” literally (case sensitive) «href='» match regex below , capture match backreference number 1 «(.*?)» match single character not line break character (line feed) «.*?» between 0 , unlimited times, few times possible, expanding needed (lazy) «*?» match character “'” literally «'»
Comments
Post a Comment