Regex in Python for html -


i wanted write regex expression for:

<td class="prodspecatribute" rowspan="2">[words]</td> 

or

<td class="prodspecatribute">[words]</td> 

for second case have:

find2 = re.compile('<td class="prodspecatribute">(.*)</td>') 

but, how can create regex can use either of 2 expressions

don't use regular expressions this, use html parser beautifulsoup. example:

>>> bs4 import beautifulsoup >>> soup1 = beautifulsoup('<td class="prodspecatribute" rowspan="2">[words]</td>') >>> soup1.find('td', class_='prodspecatribute').contents[0] u'[words]' >>> soup2 = beautifulsoup('<td class="prodspecatribute">[words]</td>') >>> soup2.find('td', class_='prodspecatribute').contents[0] u'[words]' 

or find matches:

soup = beautifulsoup(page) td in soup.find_all('td', class_='prodspecatribute'):     print td.contents[0] 

with beautifulsoup 3:

soup = beautifulsoup(page) td in soup.findall('td', {'class': 'prodspecatribute'}):     print td.contents[0] 

Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -