python - How to separate content from a file that is a container for binary and other forms of content -


I am trying to parse some .txt files. These files serve as containers for a variable number of 'children' files which are set or identified in the container with the SGML tag. With Python I can easily differentiate files from children, however, I'm having trouble writing binary material back as a binary file (called a GIF or JPG). In the simplest case the container can have an embedded HTML file, after which the call is called by the graphic which is called by HTML. I am assuming that my problem is because I am reading the original .txt file using the open file name ('R'). But it seems that the only option is to find sgml tags to split the file.

I would appreciate any help to identify some related reading material.

I appreciate the suggestions, but I am still struggling with the most basic questions. For example, when I open the file with WordPad and scroll down to the tagged section as a GIF, I see:

   

I can easily handle this section, but where does the GIF file start? Whether the header starts with 644, the word starts or MITE Line starting with?

Next, when the file is read in Python, does it binary code something Also, what should be undone when it is read back?

I can find that line where the graphics start:

filerefbin = file ('myfile .txt ',' rb ') wholeFile = filerefbin.read () import again graphicReg = re.compile (' & lt; description & gt; graphic ') LocationGraphics = graphicreg.fidder (fullfile) graphics tags = [] match in place Graphics for: Graphics tag.apend (match span ())

I can easily do the word to start, To identify the file name and use the same process to reach the end of the filename in the 'first' row, I succeeded at the end of the embedded GIF file. But I can not seem to write the correct combination of things, when I double click on h65803h6580301.gif, when it is isolated and saved, I get to see the graphic.

Interestingly, when I open the file in RB, the end of the line still appears to be present, though they have no effect in Notepad.

I love this site and I love it

It was very easy once I read the post of Bendin. I had to break the section that starts with the word and saves it in a txt file and then runs the following command:

  import uu uu.decode (r'c: \ Test2 txt ', r'c: \ test.gif')  

I have to work with some other things for the rest of the day but I will post it more as I like it and See more closely near me The first thing I need to know is how to use something other than a file, since then when I read the entire .txt file in memory and it cleared that section, in which I was not able to write that clayed section instead of writing Working out is to exit Test2.txt. I'm sure this can be done to find out.

what you are seeing is not "binary", it is The standard library of Python also includes modules for controlling UUncoded data.

The modules require the use of temporary files for encoding and decoding. You can accomplish this without using the python module, instead of supporting temporary files:

  Import codecs data = "Let's just pretend that it has binary data, okay? " Uuencode = codecs.getencoder ("uu") data_uu, n = uuencode (data) uudecode = codecs.getdecoder ("uu") decoded, m = uudecode (data_uu) print "" "* initial input:% (data) s * This encodes the% (N) D bytes:% (data_uu) s * When we decode these% (M) D bytes, we get the original data back:% (decoded) s ""% globals ( )  

Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -