python - Unicode object error in parsing XML using BeautifulSoup -

- June 15, 2014

parsing contents of 'name' tag in xml output using beautifulsoup gives me following error:

attributeerror: 'unicode' object has no attribute 'get_text'

xml output:

<show>   <stud>     <__readonly__>       <table_stud>         <row_stud>           <name>rice</name>           <dept>chem</dept>           .           .           .         </row_stud>       </table_stud>     </__readonly__>   </stud> </show>

however if access contents of other tags 'dept' seems work fine.

stud_info = output_xml.find_all('row_stud') eachstud in range(len(stud_info)):      print stud_info[eachstud].dept.get_text()   #gives 'chem'     print stud_info[eachstud].name.get_text()   #---unicode error---

can python/beautifulsoup experts me resolve this? (i know beautifulsoup not ideal parsing xml. lets i'm compelled use )

tag.name attribute containing tag name; it's value here row_stud.

attribute access contained tags shortcut .find(attributename), works if there isn't attribute in api same name. use .find() instead:

print stud_info[eachstud].find('name').get_text()

you can loop on stud_info result list directly, no need use range() here:

stud_info = output_xml.find_all('row_stud') eachstud in stud_info:     print eachstud.dept.get_text()     print eachstud.find('name').get_text()

i notice searching row_stud in lower-case. if parsing xml beautifulsoup, make sure have lxml installed , tell beautifulsoup xml processing, won't html-ize tags (lowercase them):

soup = beautifulsoup(source, 'xml')

Search This Blog

GHI

python - Unicode object error in parsing XML using BeautifulSoup -

Comments

Post a Comment

Popular posts from this blog

reporting services - Visible Export Data Feed option SSRS report -

git - Initial Commit: "fatal: could not create leading directories of ..." -

parcelable - Unmarshalling unknown type code exception while reading parcel values in Android 4.4+ -