c - Concurrently read from a single file -


Depending on user input, I have to load them as fast as they can, and have to process them. More accurate, each data packet can be divided into 100-100 files, and there are approximately 1k data packets. Most of them are small, though.

Right now, I am using a thread pool and accessing every file, opens the next free thread file, reads it, and gives statistics ready for it. As the number of files in the future is increasing, I am not very happy with this approach, especially if it is likely to end with files of about 100 or more (it will definitely be fun to assign;)) .

So, the idea is to add all these small files to one data packet for larger, and read from it. I can guarantee that it is read-only, but I do not know the number of threads that will bring a file concurrently (I know the maximum number). This will give me around 1000 good size files, and I can easily add new data packets. Question: I read this file efficiently from a file in 1 .. How can I allow a thread? I can use asynchronous I / O on Windows, but it is considered to be synchronous for reading less than 64k. The Memory Mapping file is not an option because the expected size is> 1.6 Gb, and I still need to be able to run on x86 (unless I can efficiently map some small portions, read it, I used to measure the memory mapping - it was a lot higher than a single reading.)

I thought every time to open the data packet n times, and each Thread to thread -Have a handle in Robin fashion, but the problem is that this (the number of data files) X (maximum number of threads) can end with an open handle (can easily be 8-16km), and let me know the data Each access on the packet will have to be synchronized, or use some lock to get the next free file handle, free magic.

Because this is not a fundamental problem (I think, similar to any database engine, where you can make M tables (data packets)) with N lines (files in my case) , And you want to allow as many threads as possible to read the rows equally). So what is the recommended practice here? BTW, it should run on Windows and Linux, so portable approaches are welcome (or at least the approach that works on both platforms, even if they use different APIs - as long as they can be wrapped, I can Happy). / P>

[ Edit ] This is not about speed, this is about to hide latency, that is, I read like 100 seconds short files, I am at 1 mb / s. My main concern is that the search time (as my access pattern is not expected), and I want to hide them, stop reading the old data displaying the user. The question is how many threads can allow the issuance of IO requests on many files, possibly with 1 file accessing a single file.

This is not really a problem if one of the calls is to complete 70 ms or even, but if I do not get a call block to read.

I do not think multi-threading will help you greatly with disc read. Assuming a file on a disc pouch, you have only one set to read to read, so you can serial here.

In this situation, I think that I will have the process of reading a disc in a buffer file sequentially (it is expected that the maximum performance will be to read, because the read head is a lot upset No need to handle data files) and many processing threads that read buffers, mark them for free, they complete the processing.

However, if you choose to proceed, can I suggest that you ensure that your code is structured so that the number of different types of threads is easily configurable, ideally executable command line. In such situations, you will want to experiment with different thread configurations to find the optimal number for your specific situation.


Comments

Popular posts from this blog

python - Overriding the save method in Django ModelForm -

html - CSS autoheight, but fit content to height of div -

qt - How to prevent QAudioInput from automatically boosting the master volume to 100%? -