How to use regex to extract groups of question-answer data from html files? -
How can I use a regular expression to extract HTML groups that would be formatted like this:
.
. Relevant article html ... & lt; B & gt; Question 6 & lt; / B & gt; & Lt; Br> Too many lessons & lt; P & gt; Too many lessons & lt; P & gt; & Lt; Br> & Lt; B & gt; Answer 6 & lt; / B & gt; & Lt; Br> Too many lessons & lt; P & gt; Too many lessons & lt; P & gt; Too many lessons & lt; P & gt; More lessons & lt; P & gt; & Lt; Human Resources & gt; & Lt; IMG SRC = "/ images / image .jpg" alt = "alt text" width = 480 height = 360 hspace = 2 vspace = 2 & gt; & Lt; P & gt; & Lt; I & gt; Caption text & lt; / I & gt; Question-answer pairs can be a variable amount. And the image code can be anywhere (either between question and answer, or after the answer) ...
The only information that I want to remove is question #, paragraph HTML code without text, Image Src and alt and captions.
I think you should see some options from this question ""
Looks like a good fit.
Comments
Post a Comment