web services - Retrieve a list of the most popular GET param variations for a given URL? -


I am working on intelligence generation around link spread, and because I have to deal with many short URL services where a reverse - Lookup from exact URL address is required, I should be able to solve several estimated versions of the same URL.

A URL will be such a URL

Definitely, changing GET In some circumstances, the parameters can refer to a completely different page, especially if the GET parameter is a profile or Refers to Content ID.

But a quick parset of the page will quickly determine how much the page is about each other. Using a small amount of machine learning, it can be quickly clarified which parameters do not affect the content of the given pages for a given site.

I am serving as a service to send a URL and get a list of similar URLs, can be made available only through the likes of Google or Yahoo (or Twitter), but they There is no offer facility, and I have not found any other services that do this.

If you know about any service that clusters with a group of nearly identical URLs in the above manner, please let me know

My reward is embraced

Each URL is like "address" for the location of the data on the Internet. The "host" portion of the URL (in your example, "www.example.com") is a web server, or is a group of web-servers somewhere in the world. If we think of a URL as "address", then the host can be a "country".

The country itself can keep track of every piece of the mail itself. Some do nothing, I'm talking about the web server! Of course, in real countries you do not get a note of every piece of mail! : -)

But even if "country" keeps track of every piece of mail - I have a doubt that they have a mechanism to send that list to you.

Organizations that can harvest themselves, I think the best bet would be Google, but even the situation is serious, you see, because Google uses every web server in the world ( "Country") is not the owner, they know about each URL that accesses that web-server can not

but they can reverse. Since they can point to each page, they can get a great idea of ​​each URL appearing in public HTML pages on the Web. Of course, it will not include URLs that send people in chat, SMS or e-mail. But still, they can get a great idea about the existence of the URL.

I think what I am trying to say is that whatever you want is not present, actually used to reach the Singles website The only way all URLs are to be the owner of that website .

Sorry, friend. / P>


Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -