parsing - How might one go about implementing a forward index in PHP? -
I'm looking to implement a simple forward indexer in PHP. Yes, I think PHP is probably the best tool for the job, but I want to do it anyway. The logic behind this is simple: I need another PHP.
Let us make some basic assumptions:
-
Approximately five thousand HTML and / or plain-text documents in the entire interwave each document is a special domain (UID)
-
The results of our awesome PHP-based forwarding indexing algorithm should be accompanied by the following lines:
- < P>
-
P> UID 1 -> index -> Helen, with that, Champion, Freqs
UID 1 -> Foo GM -> Chicken, Uhad, go, home, eat, sheep
UID 2 -> blaho - html -> next, week, current, badgearwawa
uid2 -> gah.txt -> one, Ideally, I would love to see the solutions that keep in mind, even the most of them, the one, and, one, is, no, numberwise
In the initial form, concepts of tokening / word boundary dispute / part-of-speech-tagging. Of course, I realize that this is a wishful thinking, and therefore will be humble to any worthy effort of parsing: Fictional documents have said:
- Extract the contents of the actual text content within the documents As a list of words in
- any garbage like
and & lt; Html & gt; To compute the tag, the list of the UID (which can be a domain, for example), ignoring any garbage, followed by the name of the document (the resource within the domain) and finally List of words for the document. I realize that the HTML tags play an important role in the terminology of the text within a document, but at this level I does not care .- Keeping a solution in mind, make a list of words that the document is cooler to read, which needs to be read in the first document.
At this level, I do not care about shore or storage. Even an original group of 'print' statements will be sufficient.
Thanks in advance, hope it was quite clear.
$ P-> Load ("www.page.com"); $ P- & gt; ("Body") - & gt; Plane;
And he will give you all the lessons. Just want to iterate on the link
foreach ($ p-> find ("a") $ link) {echo $ link-> InnerText; }See it as it is very useful and powerful.
- Keeping a solution in mind, make a list of words that the document is cooler to read, which needs to be read in the first document.
Comments
Post a Comment