BFS

# BFS

## Generate Edge List Using an R-MAT Generator

``` ./rmat_edge_generator/generate_edge_list

-o [out edge list file name (required)]

-s [seed for random number generator; default is 123]

-v [SCALE; The logarithm base two of the number of vertices; default is 17]

-e [#of edges; default is 2^{17} x 16]

-a [initiator parameter A; default is 0.57]

-b [initiator parameter B; default is 0.19]

-c [initiator parameter C; default is 0.19]

-r [if true, scrambles edge IDs; default is true]

-u [if true, generates edges for both directions; default is true] ```

### Generate Graph 500 Inputs

`bash ./rmat_edge_generator/generate_edge_list -o /mnt/ssd/edge_list -v 20 -e $((2**20*16)) ``

  • This command generates a edge list file (/mnt/ssd/edge_list) which contains the edges of a SCALE 20 graph.

  • In Graph 500, the number of edges of a graph is #vertices x 16 (16 is called ‘edge factor’) as an undirected graph.

  • Note that #edges generated by this generator is #vertices x 16 x 2 if -u option (generates edges for both directions) is true, which is default.

  • This is a multi-threads (OpenMP) program. You can control the number of threads using the environment variable OMP_NUM_THREADS.

## Ingest Edge List (construct CSR graph)

`bash ./ingest_edge_list -g /mnt/ssd/csr_graph_file /mnt/ssd/edge_list1 /mnt/ssd/edge_list2 `

  • Load edge data from files /mnt/ssd/edge_list1 and /mnt/ssd/edge_list2 (you can specify an arbitrary number of files). A CSR graph is constructed in /mnt/ssd/csr_graph_file.

  • Each line of input files must be a pair of source and destination vertex IDs (unsigned 64bit number).

  • This program treats inputs as a directed graph, that is, it does not ingest edges for both directions.

  • This is a multi-threads (OpenMP) program. You can control the number of threads using the environment variable OMP_NUM_THREADS.

  • As for real-world datasets, [SNAP Datasets](http://snap.stanford.edu/data/index.html) is popular in the graph processing community. Please note that some datasets in SNAP are a little different. For example, the first line is a comment; you have to delete the line before running this program.

## Run BFS

`bash ./run_bfs -n [#of vertices] -m [#of edges] -g [/path/to/graph_file] -s `

  • You can get #of vertices and #of edges by running ingest_edge_list.

  • If ‘-s’ is specified, the program uses system mmap instead of umap.

  • The interface to the umap runtime library configuration is controlled by environment variables, see [Umap Runtime Environment Variables](https://llnl-umap.readthedocs.io/en/develop/environment_variables.html).

  • This is a multi-threads (OpenMP) program. You can control the number of threads using the environment variable OMP_NUM_THREADS. It uses the static schedule.

## Tips for Running Benchmark (on large-scale) * The size of generated edge lists could be larger than the constructed CSR graph by a few times. As the rmat_edge_generator writes edges to files sequentially, you should be able to directly generate edge lists to a parallel file systems without an unreasonable execution time. * On the other hand, ingest_edge_list constructs a CSR graph causing a lot of random writes to a file mapped region (the location of the file is specified by -g option). We highly recommend that you should construct a graph on DRAM space, e.g., tmpfs, although you can still read input edge list files from a parallel file system.

### Example Run BFS on a SCALE 20 R-MAT graph using 4 threads and system mmap. `bash env OMP_NUM_THREADS=4 ./rmat_edge_generator/generate_edge_list -o /mnt/ssd/edge_list -v 20 -e $((2**20*16)) env OMP_NUM_THREADS=4 ./ingest_edge_list -g /dev/shm/csr_graph_file /mnt/ssd/edge_list* mv /dev/shm/csr_graph_file /mnt/ssd/csr_graph_file sudo sync sudo echo 3 > /proc/sys/vm/drop_caches # drop page cache env OMP_NUM_THREADS=4 ./run_bfs -n $((2**20)) -m $((2**20*16*2)) -g /mnt/ssd/csr_graph_file -s `