i have dataset representing directed graph. first column source node, second column target node, , can ignore third column (essentially weight). example:
0 1 3 0 13 1 0 37 1 0 51 1 0 438481 1 1 0 3 1 4 354 1 10 2602 1 11 2689 1 12 1 1 18 345 1 19 311 1 23 1 1 24 366 ...
what append out-degree each node. example, if added out-degree node 0, have:
0 1 3 5 0 13 1 5 0 37 1 5 0 51 1 5 0 438481 1 5 1 0 3 ...
i have code this, extremely slow because using for
loop:
import numpy np def save_degrees(x): new_col = np.zeros(x.shape[0], dtype=np.int) x = np.column_stack((x, new_col)) node_ids, degrees = np.unique(x[:, 0], return_counts=true) # slow part. node_id, deg in zip(node_ids, degrees): indices = x[:, 0] == node_id x[:, -1][indices] = deg return x train_x = np.load('data/train_x.npy') train_x = save_degrees(train_x) np.save('data/train_x_degrees.npy', train_x)
is there more efficient way build data structure?
you can use numpy.unique
.
suppose input data in array data
:
in [245]: data out[245]: array([[ 0, 1, 3], [ 0, 13, 1], [ 0, 37, 1], [ 0, 51, 1], [ 0, 438481, 1], [ 1, 0, 3], [ 1, 4, 354], [ 1, 10, 2602], [ 1, 11, 2689], [ 1, 12, 1], [ 1, 18, 345], [ 1, 19, 311], [ 1, 23, 1], [ 1, 24, 366], [ 2, 10, 1], [ 2, 13, 3], [ 2, 99, 5], [ 3, 25, 13], [ 3, 99, 15]])
find unique values in first column, along "inverse" array , counts of occurrences of each unique value:
in [246]: nodes, inv, counts = np.unique(data[:,0], return_inverse=true, return_counts=true)
your column of out degrees counts[inv]
:
in [247]: out_degrees = counts[inv] in [248]: out_degrees out[248]: array([5, 5, 5, 5, 5, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 3, 3, 2, 2])
this assumes pair (source_node, target_node) not occur more once in data
array.
Comments
Post a Comment