Page 1 of 1

Streaming data at 3G bytes per second to a file. Possble?

Posted: Sun Apr 25, 2021 3:39 am
by rdos
So, is this even possible to do with a ordinary x86 motherboard and standard disc hardware?

Part of the challenge of course is to write a filesystem that can handle this using a standard CPU. I doubt that Linux or Windows can achieve that kind of throughput, but if somebody can prove me wrong, please do. I'm developping a zero-copy filesystem, and I think an implementation that needs to copy the data will fail just for that reason.

In the system I have in mind data comes in at 3G bytes per second from a 8-lane PCIe board. The device is handed an array of 2M physical blocks and you can read completion status from the PCI bar.

I think a possible solution needs to work with memory-based schedules and of course must be able to read data at 3G bytes per second at a minimum. At least in principle, AHCI and SATA should be able to handle it, while I think USB 3 is too slow even if it uses memory schedules.

The final challenge is if there are flash devices that can be written at this speed? Usually, they have very high read speed but slower write speed.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Sun Apr 25, 2021 9:01 am
by Korona
Linux has a p2pmem infrastructure where one device (such as a NVMe device) can directly DMA to another device (e.g., a NIC). That allows truly zero-copy transfers; the data never even hits RAM. You might want to look into that.

EDIT: Obviously, you need another x8 PCIe link for the storage device (assuming PCIe 3, PCIe 4 would only need an x4), otherwise, this cannot work.

AHCI (= SATA) and USB 3 is too slow (capped at 6 GBit / 10 GBit, respectively), you need NVMe.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Sun Apr 25, 2021 9:27 am
by rdos
Korona wrote:Linux has a p2pmem infrastructure where one device (such as a NVMe device) can directly DMA to another device (e.g., a NIC). That allows truly zero-copy transfers; the data never even hits RAM. You might want to look into that.

EDIT: Obviously, you need another x8 PCIe link for the storage device (assuming PCIe 3, PCIe 4 would only need an x4), otherwise, this cannot work.

AHCI (= SATA) and USB 3 is too slow (capped at 6 GBit / 10 GBit, respectively), you need NVMe.
Yes, it seems like there are a few NVMe devices that use PCIe 4 x 4, and those have transafer rates over 3 GB / s, and some discs support this for writes too.

I don't think just transfering device-to-device would work, unless data is streamed to fixed sectors. Writing it as file data requires cooperation with the filesystem which must convert file offsets to sectors and create long requests for the device. I would also need a file system that can handle file sizes larger than 4G, so FAT32 would not work. Perhaps ext4 or ntfs.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Sun Apr 25, 2021 11:17 am
by Korona
Do you really need an FS in such a situation? I would consider writing directly to a partition. You can then have some kind of index file that is stored on a proper FS on another partition.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Sun Apr 25, 2021 2:02 pm
by rdos
It seems to depend on what kind of motherboard you have. I have this motherboard: https://www.msi.com/Motherboard/X399-SL ... cification. It is fitted with 128GB RAM (but only 100GB is usable) and a 12-core AMD Ryzen processor. The motherboard has three PCIe 3.0 x4 slots for M.2. The max transfer rate is 3.9GB/s for PCIe 3.0 x4, but the motherboard for some reason specifies 3.2GB/s.

The FPGA that generates the data has a PCIe 2.0 x8 connection. It can transfer data at 4GB/s, but since there is otherhead on PCIe, I cannot run it on that speed and so use 3GB/s instead. The FPGA is connected to a 1G sample 2-channel 14-bit ADC. I run the ADC at 750M samples or 600M samples.

Using only RAM buffering I can run the ADC for about 35 seconds at 750MHz, which is a bit too little. I would want to be able to run it for a few minutes. It would generate 180GB per minute, and 900GB for five minutes at 750MHz.

I think this M.2 device should work: https://www.samsung.com/us/computing/me ... 7s2t0b-am/. It has 3.3GB/s write speed and 2TB capacity.
Korona wrote:Do you really need an FS in such a situation? I would consider writing directly to a partition. You can then have some kind of index file that is stored on a proper FS on another partition.
It's kind of awkward to have the data on a raw partition. If I have it in a file I can easily copy it to an USB drive or save it elsewhere. Also, using a file I can more easily decide how much data to save, while with a partition, I would need to repartition the drive or use larger sizes than necessary.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Sun Apr 25, 2021 4:43 pm
by thewrongchristian
rdos wrote: I think this M.2 device should work: https://www.samsung.com/us/computing/me ... 7s2t0b-am/. It has 3.3GB/s write speed and 2TB capacity.
3.3GB/s will be peak speed, and probably not sustainable over the long term. Modern SSD use some of the FLASH in SLC mode for fast writes, but it's limited in size, perhaps only a few of percent of the total FLASH, and used for smaller bursts of writes over a few seconds. After that, it'll degrade to TLC FLASH, with a corresponding reduction in performance.

This cache effect can be seen in synthetic benchmarks such as the Anandtech benchmark suite. And synthetic benchmarks like this represent the best case.

To sustain the performance you need, you might have to consider striping across multiple drives
rdos wrote:
Korona wrote:Do you really need an FS in such a situation? I would consider writing directly to a partition. You can then have some kind of index file that is stored on a proper FS on another partition.
It's kind of awkward to have the data on a raw partition. If I have it in a file I can easily copy it to an USB drive or save it elsewhere. Also, using a file I can more easily decide how much data to save, while with a partition, I would need to repartition the drive or use larger sizes than necessary.
This whole project sounds quite awkward. Writing to a partition with simple indexing, and wasting a bit of space, is a perfectly valid trade off. To write at full speed, you'd want to preallocate space to the files anyway, as you don't want filesystem meta-data being written in the middle of your data dump.

Just curious, what are you sampling?

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Sun Apr 25, 2021 6:01 pm
by Octocontrabass
rdos wrote:I think this M.2 device should work: [...] It has 3.3GB/s write speed and 2TB capacity.
The drive can only reach 3.3GB/s using its write cache. Once the cache fills up, the write speed goes down to 1.75GB/s. Samsung's own documentation doesn't seem to list the write cache size anywhere, but this site says it's up to 78GB when there's enough unused flash available.

You'll have to spend a lot more money if you want a single SSD that can sustain 3GB/s across its entire capacity.

And don't forget, SSDs have limited endurance as well. The Samsung one you linked is good for 1200TB, which would be about four days of recording time if it were fast enough to keep up.

Striping the data across multiple SSDs would be a much more cost-effective way to keep up, but I have to wonder if you need to capture that much data in the first place. Does your antenna really have that much bandwidth?

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Mon Apr 26, 2021 5:27 am
by rdos
Octocontrabass wrote:
rdos wrote:I think this M.2 device should work: [...] It has 3.3GB/s write speed and 2TB capacity.
The drive can only reach 3.3GB/s using its write cache. Once the cache fills up, the write speed goes down to 1.75GB/s. Samsung's own documentation doesn't seem to list the write cache size anywhere, but this site says it's up to 78GB when there's enough unused flash available.

You'll have to spend a lot more money if you want a single SSD that can sustain 3GB/s across its entire capacity.
I suspected something like that. So, with only 1.75GB/s of sustainable write speed, writing the data to two dfiferent drives would be required to attain a sustainable speed of 3GB/s. I suppose that is a decent solution. When that is the case, it does make more sense to write the data to raw partitions (fixed sectors) instead. After the data collection stage is done, the data can either be read directly from the two partitions, or it can be merged into a file and using the file for further analysis.
Octocontrabass wrote: And don't forget, SSDs have limited endurance as well. The Samsung one you linked is good for 1200TB, which would be about four days of recording time if it were fast enough to keep up.
Sure, but I find that to be a minor problem. My current problem is that I need to do the data collection every time before I can do the analysis and that I can only collect a bit over 30 seconds due to RAM size limitations. I don't anticipate to run the data collection for days.
Octocontrabass wrote: Striping the data across multiple SSDs would be a much more cost-effective way to keep up, but I have to wonder if you need to capture that much data in the first place. Does your antenna really have that much bandwidth?
I wished it had larger bandwidth, but in the current setup, I added 300 MHz low-pass filters to avoid even higher frequency radio spectrum data from coming into the ADC in an aliased form. I think sampling 300 MHz data at 750 MHz is a minimum.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Mon Apr 26, 2021 5:39 am
by rdos
thewrongchristian wrote: This cache effect can be seen in synthetic benchmarks such as the Anandtech benchmark suite. And synthetic benchmarks like this represent the best case.
Interesting site. Still, their benchmark seems to indicated that a two-drive solution should work.
thewrongchristian wrote: This whole project sounds quite awkward. Writing to a partition with simple indexing, and wasting a bit of space, is a perfectly valid trade off. To write at full speed, you'd want to preallocate space to the files anyway, as you don't want filesystem meta-data being written in the middle of your data dump.
Well, if the spec was correct and the drive could sustain 3.3GB/s write speed, there would be a 10% over-capacity, which probably would be enough to handle meta-data as well.

Another issue is that the actual capacity needed depends on how long I record. If I record less than 30 seconds, then there is no requirement at all since everything can be buffered. With a one-minute recording time, about 1.5GB/s would be needed before buffer overrun occurs. As time increase, requirements will get close to 3GB/s.
thewrongchristian wrote: Just curious, what are you sampling?
Radio spectrum data. I have two broadband antennas that can pick up radio data to a few 100 MHz. The reason I have two is that it allows me to determine direction.

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Mon Apr 26, 2021 6:25 am
by Octocontrabass
rdos wrote:I think sampling 300 MHz data at 750 MHz is a minimum.
This depends on the width of your lowpass filter's transition band. In the audio realm, it's common to use a cheap analog lowpass with a very wide transition band and oversample, then perform a second lowpass with a much narrower transition band on the digital samples to come up with the final sample rate. I'm not sure if this strategy makes sense when you're capturing data 15,000 times as fast as audio, though!

Re: Streaming data at 3G bytes per second to a file. Possble

Posted: Mon Apr 26, 2021 6:56 am
by rdos
Octocontrabass wrote:
rdos wrote:I think sampling 300 MHz data at 750 MHz is a minimum.
This depends on the width of your lowpass filter's transition band. In the audio realm, it's common to use a cheap analog lowpass with a very wide transition band and oversample, then perform a second lowpass with a much narrower transition band on the digital samples to come up with the final sample rate. I'm not sure if this strategy makes sense when you're capturing data 15,000 times as fast as audio, though!
It's a bit of a research project, and so I don't know exactly what I'm looking for, only a probable frequency range. If I knew what I was looking for, and it was a single frequency, I could decimate the data early in the sampling process (probably in the FPGA), but as it is now, I want raw data for analysis. The original program in the FPGA is a digital oscilloscope, but it can only be used to get the frequency spectrum at random points in the data and doesn't allow continuous analysis. Access is through a network interface, which is not fast enough to stream raw data. So, I created my own Verilog program that can stream data over the PCIe interface of the FPGA and wrote my own OS-side driver.