Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Support reading records via stdin #22
Comments
|
Sorry Johannes, I'm a little confused what you like to do |
|
I would like to feed groups of records to the commands via standard input, not via command line parameters. |
|
Here's an simple example. But there's length limit to pass them to stdin by
|
|
The difference in your example is, that echo does not read via standard input. Specific example, yea :) ... Consider downloading 100 million gene sequences via accession, you want to spawn say 6 downloaders and give them blocks of 10k accessions to download each and to spit them out on the standard output. Here, one command gets 10k records, trying to provide that as command line parameter will likely not work (if it does, add zeroes until it doesn't). Smaller blocks will hammer the server. |
|
I see. rush can't do that. But using Anyway, you can split the records in multiple blocks and feed them to commands as you said. |
|
Workarounds are possible, I guess this is a convenience feature request. It is just very convenient (and very useful with large data) to feed information via a pipe and not via command line options. There are many examples for using '--pipe' in GNU parallel. |
|
Oh, I have the problem too. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Hi, great tool! I like all of your *kit programs.
One thing I'm using a lot in GNU parallel is the --pipe option, where the records are divided in blocks and provided to the commands via stdin. This is very useful if single commands work on a large number of records and stdin is better then command line arguments with size restrictions. rush can use an explicit number of records, which I sometimes prefer and which GNU parallel cannot do, because the blocksize is defined by (approximate) data size for performance reasons.
Is there any chance this feature makes it into rush (I coudn't find it)?
I'm aware that this kind of circumvents the whole custom field and parameter assignment part, but maybe you can fit it smoothly by using a BASH-like named pipe syntax to turn records and fields into virtual files using fifos. For instance
could provide the second field of records.txt as a file. The syntax should, of course, not clash with common Shell syntax. This example was just for illustration purposes.
Best,
Johannes