java - Dealing with pagination in web pages while using jsoup -
I am using jsoup to crawl through webpages of a particular website. In fact, I am trying to remove all those who have a PDF link. I am able to get all the links to a particular page but 10 uses the web page logic JavaScript _doPostBack () function to navigate to other pages are pages I could make it to get by Jesoupi.
This is how I'm trying it now
Document document = Jsoup.connect ( "a website name") Kdeta ( "__EVENTARGUMENT", __EVENTARGUMENT). Data ( "__EventTarGET", __EVENTTARGET) Kdeta ( "__ Events", __EVENTVALIDATION) Kdeta ( "__ Vivekkrta", __VIEWSTATEGENERATOR) .cookie ( "ASP.NET_SessionId", session IDs) KfollowRirectirects (true). Timeout (0) Kyujragent ( "Mozilla / 5.0 (Windows; U; Windows NT 5.1; en-US; rv Lk8klk6) Gecko / 20070725 Firefox / 2.0.0.6") Kpost (); But I'm getting a false URL output. I have defined all variables before sending them.
When I pressed the issue like this, I am here to solve I have them:
- Load pages in a browser
- Spy HTTP messages to be e-mailed between pages and browsers going through the pages (Fiddler, Firebug, Developer Console / Toolbar). ..)
- Identify each single byte browser and server exchanges (headers, cookies, etc.)
- Once all the single bytes Once identified, try to go through the pages (header, cookies, user-agent, etc.)
- Once you are able to go through the pages with hurl.it, Jesop Instructs to do this
Comments
Post a Comment