Regex in Python for html -
i wanted write regex expression for:
<td class="prodspecatribute" rowspan="2">[words]</td>
or
<td class="prodspecatribute">[words]</td>
for second case have:
find2 = re.compile('<td class="prodspecatribute">(.*)</td>')
but, how can create regex can use either of 2 expressions
don't use regular expressions this, use html parser beautifulsoup. example:
>>> bs4 import beautifulsoup >>> soup1 = beautifulsoup('<td class="prodspecatribute" rowspan="2">[words]</td>') >>> soup1.find('td', class_='prodspecatribute').contents[0] u'[words]' >>> soup2 = beautifulsoup('<td class="prodspecatribute">[words]</td>') >>> soup2.find('td', class_='prodspecatribute').contents[0] u'[words]'
or find matches:
soup = beautifulsoup(page) td in soup.find_all('td', class_='prodspecatribute'): print td.contents[0]
with beautifulsoup 3:
soup = beautifulsoup(page) td in soup.findall('td', {'class': 'prodspecatribute'}): print td.contents[0]
Comments
Post a Comment