How to use regex to extract groups of question-answer data from html files? -


How can I use a regular expression to extract HTML groups that would be formatted like this:

.

. Relevant article html ... & lt; B & gt; Question 6 & lt; / B & gt; & Lt; Br> Too many lessons & lt; P & gt; Too many lessons & lt; P & gt; & Lt; Br> & Lt; B & gt; Answer 6 & lt; / B & gt; & Lt; Br> Too many lessons & lt; P & gt; Too many lessons & lt; P & gt; Too many lessons & lt; P & gt; More lessons & lt; P & gt; & Lt; Human Resources & gt; & Lt; IMG SRC = "/ images / image .jpg" alt = "alt text" width = 480 height = 360 hspace = 2 vspace = 2 & gt; & Lt; P & gt; & Lt; I & gt; Caption text & lt; / I & gt;

Question-answer pairs can be a variable amount. And the image code can be anywhere (either between question and answer, or after the answer) ...

The only information that I want to remove is question #, paragraph HTML code without text, Image Src and alt and captions.

I think you should see some options from this question ""

Looks like a good fit.


Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -