The Wayback Machine - https://web.archive.org/web/20201212201453/https://github.com/shenwei356/rush/issues/22
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reading records via stdin #22

Open
fungs opened this issue Nov 7, 2018 · 7 comments
Open

Support reading records via stdin #22

fungs opened this issue Nov 7, 2018 · 7 comments

Comments

@fungs
Copy link

@fungs fungs commented Nov 7, 2018

Hi, great tool! I like all of your *kit programs.

One thing I'm using a lot in GNU parallel is the --pipe option, where the records are divided in blocks and provided to the commands via stdin. This is very useful if single commands work on a large number of records and stdin is better then command line arguments with size restrictions. rush can use an explicit number of records, which I sometimes prefer and which GNU parallel cannot do, because the blocksize is defined by (approximate) data size for performance reasons.

Is there any chance this feature makes it into rush (I coudn't find it)?

I'm aware that this kind of circumvents the whole custom field and parameter assignment part, but maybe you can fit it smoothly by using a BASH-like named pipe syntax to turn records and fields into virtual files using fifos. For instance

rush - n 1000 'command < <{2}' < records.txt

could provide the second field of records.txt as a file. The syntax should, of course, not clash with common Shell syntax. This example was just for illustration purposes.

Best,
Johannes

@shenwei356
Copy link
Owner

@shenwei356 shenwei356 commented Nov 7, 2018

Sorry Johannes, I'm a little confused what you like to do 😿 , could you please give more specific example?

@fungs
Copy link
Author

@fungs fungs commented Nov 7, 2018

I would like to feed groups of records to the commands via standard input, not via command line parameters.

@shenwei356
Copy link
Owner

@shenwei356 shenwei356 commented Nov 7, 2018

Here's an simple example. But there's length limit to pass them to stdin by echo.

$ seq 5 | rush -n 2 -k 'echo "{}" | cat ; echo'
1
2

3
4

5

@fungs
Copy link
Author

@fungs fungs commented Nov 7, 2018

The difference in your example is, that echo does not read via standard input.

Specific example, yea :) ...

Consider downloading 100 million gene sequences via accession, you want to spawn say 6 downloaders and give them blocks of 10k accessions to download each and to spit them out on the standard output. Here, one command gets 10k records, trying to provide that as command line parameter will likely not work (if it does, add zeroes until it doesn't). Smaller blocks will hammer the server.

@shenwei356
Copy link
Owner

@shenwei356 shenwei356 commented Nov 7, 2018

I see. rush can't do that.

But using echo {} to feed one record to command via stdin every time seems OK for me, the only drawback is you have to spawn n commands in total. This may reduce the performance if it's costly to startup the command.

Anyway, you can split the records in multiple blocks and feed them to commands as you said.

@fungs
Copy link
Author

@fungs fungs commented Nov 7, 2018

Workarounds are possible, I guess this is a convenience feature request. It is just very convenient (and very useful with large data) to feed information via a pipe and not via command line options. There are many examples for using '--pipe' in GNU parallel.

@kenneth-Q
Copy link

@kenneth-Q kenneth-Q commented Mar 17, 2019

Oh, I have the problem too.
You may try this cat random.img | parallel --pipe --recend '' -k bzip2 --best > randomcompressed.img.bz2. But I cannot find a function like pipe at rush. This is useful for me.
How about you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.