html - Python lxml screen scraping? -


I need to parsing some HTML with python. After some research, lxml seems like my best, but I There is a problem finding examples which I am trying to do. This is the reason why I am hearing that I have to scrap a page for all my viewable text. Ignore all the tags and javascript .. I need what I want to see the text. It seems very easy .. I did it with HTMLParser, but its javascript does not handle well

class HTML2Text (HTMLParser.HTMLParser): def __init __ (self): HTMLParser.HTMLParser .__ init __ (auto) itself .output = cStringIO.StringIO () def get_text (auto): return self.output.getvalue () def hand_data (self, data): self.output.write (data) def pars HTML ( Source: P = HTML2Text () P.feed (source) text = p.get_text () return text

Any ideas to do this LxML or a better way of doing this is HTMLParser . HTMLParser would be best because no additional lbbs are needed

Scott F.

No screen-scraping library I know "does it well with javascript" - this It is very difficult to estimate all methods in which the JS HTML DOM is dynamically, conditionally & amp; C.


Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -