java - Dealing with pagination in web pages while using jsoup -


I am using jsoup to crawl through webpages of a particular website. In fact, I am trying to remove all those who have a PDF link. I am able to get all the links to a particular page but 10 uses the web page logic JavaScript _doPostBack () function to navigate to other pages are pages I could make it to get by Jesoupi.

This is how I'm trying it now

  Document document = Jsoup.connect ( "a website name") Kdeta ( "__EVENTARGUMENT", __EVENTARGUMENT). Data ( "__EventTarGET", __EVENTTARGET) Kdeta ( "__ Events", __EVENTVALIDATION) Kdeta ( "__ Vivekkrta", __VIEWSTATEGENERATOR) .cookie ( "ASP.NET_SessionId", session IDs) KfollowRirectirects (true). Timeout (0) Kyujragent ( "Mozilla / 5.0 (Windows; U; Windows NT 5.1; en-US; rv Lk8klk6) Gecko / 20070725 Firefox / 2.0.0.6") Kpost ();  

But I'm getting a false URL output. I have defined all variables before sending them.

When I pressed the issue like this, I am here to solve I have them:

  • Load pages in a browser
  • Spy HTTP messages to be e-mailed between pages and browsers going through the pages (Fiddler, Firebug, Developer Console / Toolbar). ..)
  • Identify each single byte browser and server exchanges (headers, cookies, etc.)
  • Once all the single bytes Once identified, try to go through the pages (header, cookies, user-agent, etc.)
  • Once you are able to go through the pages with hurl.it, Jesop Instructs to do this

Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -