Abstract:De Bruijn graph is a vastly used technique for developing genome assembly software nowadays. The scale of this kind of graph can reach billions of vertices and edges, posing great challenges to the genome assembly task. It is of great importance to study scalable genome assembly algorithms in order to cope with this situation. Despite some recent works and therefore begin to address the scalability problem with parallel assembly algorithms, massive De Bruijn graph processing is still very time consuming which needs optimized operations. This paper aims to significantly improve the efficiency of massive De Bruijn graph processing. Specifically, focus is placed on the time consuming and memory intensive processing in the construction phase and the simplification phase of De Bruijn graph. The study observes observe that the existing list ranking approach repeatedly performs parallel global sorting over all De Bruijin graph vertices, which results in a huge amount of communications between computing nodes. It therefore proposes to use depth-first traversal over the underlying De Bruijn graph once to achieve the same objective as the existing list ranking approach. The new method is fast, effective and can be executed in parallel. It has a computing complexity of O(g/p) and communication complexity of O(g), where g is the length of genome reference, p is the number of processors, which is smaller than the existing list ranking approach, Experimental results using error-free data show that, when the number of CPUs scales from 8 to 512, our algorithm has a speedup of 13 and 10 times on processing the data sets of E.coli and Yeast respectively; and when the number of CPUs scales from 32 to 512, the algorithm has a speedup of 7 and 10 times on processing the data sets of C.elegans and chr1 respectively.