unix - Mass grep but sorting results based on first input file -
The file with a column identifiers to me (or call it File A) including duplicates I it looks like this is:
GO: 0005515 GO: 0005737 GO: 0005875 GO: 0005884 GO: 0005200 GO: 0005524 GO: 0005737 ...
I have a file (FileB call it) which There are two columns, the identifier in the first column, the second associated text that looks like this:
GO: 0000001 mitochondria inherited GO: 0000002 mitochondrial genome maintenance GO: 0000003 reproduction GO : 0000006 High-affinity zinc uptake transmembrane transporter activity GO: 0000007 low affinity zinc ion transmembrane transporter activity GO: 0000009 alpha 1,6-mannosyltransferase activity GO: 0,000,010 Trans hexaprenyltranstransferase activity GO: 0,000,011 vacuole inheritance ...
I want the grep of identification in fileA to get the lines matching with the identifier and the details from file B and output it to the second file c, as in the order of file A In case of duplicate, file B does not happen.
I've tried a few different things:
The order of command fileB because it does not work fileC, File is not there.
grep "$ name" fileB in `FileA` name >> FileC be
It should work, but output:
GO: 0005515 protein binding GO: 0005737 cytoplasm GO: 0005737 cytoplasm GO: 0005737 cytoplasm GO: 0005737 cytoplasm GO: 0005737 cytoplasm GO: 0016301 kinase activity GO: 0005525 GTP binding GO: 0005737 cytoplasm GO: membrane 0016021 integral. ..
They are not in the order of file A (except for the first two people).
Any thoughts?
Give a liner the awk one should try, follow the order of production fileA.
awk 'nr == FNR {b [$ 1] = $ 0; Next) $ 1b {print b [$ 1]} 'fileB fileA If your fileB was separated into two columns & lt; & Gt; , -F '\ t followed by awk :
awk -F' \ t '' NR == FNR ...... ` Add a test
Kent $ head FA FB ==> Fa & lt; == GO: 0005515 GO: 0005737 GO: 0005875 GO: 0005884 GO: 0005200 GO: 0005524 GO: 0005737 == & gt; FB & LT; == GOC: 0005875 # 3 Get Fuo: 0005737 # 2 Fuo GO: 0005884 # 4 Fuo Knent $ Akkl 'NR == Fanar {B [$ 1] = $ 0; Next} $ 1B {print B [$ 1]} "FB father GO: 0,005,515 # 1 GO: 0,005,737 # 2 GO: 0,005,875 # 3 GO: 0,005,884 # 4 GO: 0,005,737 # 2 You can see that the output keeps DP and the identifier follows the order in file A ( fa )
Comments
Post a Comment