[问题]分析html文件中的分页信息
发表于 : 2008-08-27 12:19
要做一个解析html文档, 从里面抓取它分页的url,
经过egrep的杂乱的html片断如下:
我想到分页的url结果如下:
有点不知道从哪下手, 因为这些html代码很乱, grep是按行来匹配吧, 怎样在这一行里, 抠出我想的字符串呢?
cut?
有点晕. 谢谢各位的帮忙[/code]
经过egrep的杂乱的html片断如下:
代码: 全选
<div class="pages"><em> 150 </em><strong>1</strong><a href="forumdisplay.php?fid=51&page=2&sid=7JjykH">2</a><a href="forumdisplay.php?fid=51&page=3&sid=7JjykH">3</a><a href="forumdisplay.php?fid=51&page=4&sid=7JjykH">4</a><a href="forumdisplay.php?fid=51&page=2&sid=7JjykH" class="next">››</a></div> <span class="postbtn" id="newspecial" onmouseover="$('newspecial').id = 'newspecialtmp';this.id = 'newspecial';showMenu(this.id)"><a href="post.php?action=newthread&fid=51&extra=page%3D1&sid=7JjykH" title="发新话题"><img src="images/dst/newtopic.gif" alt="发新话题" /></a></span>
<div class="pages"><em> 150 </em><strong>1</strong><a href="forumdisplay.php?fid=51&page=2&sid=7JjykH">2</a><a href="forumdisplay.php?fid=51&page=3&sid=7JjykH">3</a><a href="forumdisplay.php?fid=51&page=4&sid=7JjykH">4</a><a href="forumdisplay.php?fid=51&page=2&sid=7JjykH" class="next">››</a></div> <span class="postbtn" id="newspecialtmp" onmouseover="$('newspecial').id = 'newspecialtmp';this.id = 'newspecial';showMenu(this.id)"><a href="post.php?action=newthread&fid=51&extra=page%3D1&sid=7JjykH" title="发新话题"><img src="images/dst/newtopic.gif" alt="发新话题" /></a></span>
<span class="headactions"><a href="forumdisplay.php?fid=51&page=1&showoldetails=yes&sid=7JjykH#online" class="nobdr"><img src="images/dst/collapsed_yes.gif" alt="" /></a></span>
代码: 全选
forumdisplay.php?fid=51&page=2&sid=7JjykH
forumdisplay.php?fid=51&page=3&sid=7JjykH
cut?
有点晕. 谢谢各位的帮忙[/code]