Community Detection  Active Semi-supervised Community Detection based on Pair-wise Constraints 程 建 军 Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 1 of 9 兰 州 大 学 信息科学与工程学院 Go Back Full Screen Close Quit chengjianjun@lzu.edu.cn  Contents  Contents Concepts about Community Detection Concepts about . . . Semi-supervised learning  must-link constraints . . . Semi-supervised learning Active learning Proposed algorithms  Must-link and Cannot-link  Active Learning  Proposed algorithms Home Page Title Page JJ J II I Page 2 of 9 Go Back Full Screen Close Quit  Concepts about community detection  Complex network  A complex network is an undirected and unweighted graph G = (V ; E ), where V represents the node set, and E represents the edges set. Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 3 of 9 Go Back Full Screen Close Quit  Concepts about community detection  Complex network   A complex network is an undirected and unweighted graph G = (V ; E ), where V represents the node set, and E represents the edges set. The node can be a people in the social network, a web page in the internet web graph, a paper in the citation network, etc., and each edge in the network can represent the interaction or relationship between two people, can be a hyperlink between the web pages, a reference relationship between the papers, etc. Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 3 of 9 Go Back Full Screen Close Quit  Concepts about community detection  Complex network    A complex network is an undirected and unweighted graph G = (V ; E ), where V represents the node set, and E represents the edges set. The node can be a people in the social network, a web page in the internet web graph, a paper in the citation network, etc., and each edge in the network can represent the interaction or relationship between two people, can be a hyperlink between the web pages, a reference relationship between the papers, etc. A common property or characteristic of the complex network is its “Community Structure”. Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 3 of 9 Go Back Full Screen Close Quit  Concepts about community detection  Complex network     A complex network is an undirected and unweighted graph G = (V ; E ), where V represents the node set, and E represents the edges set. The node can be a people in the social network, a web page in the internet web graph, a paper in the citation network, etc., and each edge in the network can represent the interaction or relationship between two people, can be a hyperlink between the web pages, a reference relationship between the papers, etc. A common property or characteristic of the complex network is its “Community Structure”. A Community is a group of densely connected nodes in the network, such that the nodes within a group are much more connected to each other than to the rest of the network. Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 3 of 9 Go Back Full Screen Close Quit  Concepts about community detection  Complex network      A complex network is an undirected and unweighted graph G = (V ; E ), where V represents the node set, and E represents the edges set. The node can be a people in the social network, a web page in the internet web graph, a paper in the citation network, etc., and each edge in the network can represent the interaction or relationship between two people, can be a hyperlink between the web pages, a reference relationship between the papers, etc. A common property or characteristic of the complex network is its “Community Structure”. A Community is a group of densely connected nodes in the network, such that the nodes within a group are much more connected to each other than to the rest of the network. Communities are of interest because they often correspond to functional units of the networks, and community detection may shed light on the structural and functional characteristics of the network. Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 3 of 9 Go Back Full Screen Close Quit  Semi-supervised learning  Classification: supervised learning Contents Concepts about . . . Semi-supervised learning  must-link constraints . . . Clustering: unsupervised learning Active learning Proposed algorithms  Semi-supervised learning: supervised clustering semi-supervised classification, semiHome Page Title Page  Two kinds of semi-supervised components: labeled data and must-link constraints, cannot-link constraints JJ J II I Page 4 of 9 Go Back Full Screen Close Quit  must-link constraints and cannot-link constraints   8 2 2 Must-link constraints CML : vi ; vj V , (vi ; vj ) CML indicates the two nodes vi and vj must belong to the same community. 8 2 2 Cannot-link constraints CCL : vi ; vj V , (vi ; vj ) CCL indicates the two nodes vi and vj cannot be classified into the same community, and they must be allocated into different communities. Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 5 of 9 Go Back Full Screen Close Quit  Active learning  A strategy to select those data examples actively to annotate, such that the learner could achieve higher performance as compared with random selection Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms  its goal is to maximize the learner’s ability at a minimum cost  pool-based and stream-based Home Page Title Page  the most informative data or the data with least certainty are selected to annotate by an oracle JJ J II I Page 6 of 9 Go Back Full Screen Close Quit  Proposed algorithms  Semi-supervised community detection algorithm based on the mustlink constraints and cannot-link constraints  Augment the must-link constraint set and Cannot-link constraint set using the transitive property of must-link constraints Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms  Construct the skeleton of the initial community structure from cannot-link constraints and must-link constraints Home Page Title Page  Expand communities by greedily select the (community,unclassified node) pair from all the pairs, and insert the selected unclassified node into the corresponding community iteratively JJ J II I Page 7 of 9 Go Back Full Screen Close Quit  Random walk based algorithm to compute the similarity matrix  A walker is placed in the network, at each time step, the walker is on a vertex and randomly jumps from the vertex to one of its neighbors following the edge between them Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 8 of 9 Go Back Full Screen Close Quit  Random walk based algorithm to compute the similarity matrix   A walker is placed in the network, at each time step, the walker is on a vertex and randomly jumps from the vertex to one of its neighbors following the edge between them The walker tends to be trapped in a group of densely connected vertices corresponding to a community rather than walk across communities boundaries within limited number of steps Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 8 of 9 Go Back Full Screen Close Quit  Random walk based algorithm to compute the similarity matrix    A walker is placed in the network, at each time step, the walker is on a vertex and randomly jumps from the vertex to one of its neighbors following the edge between them The walker tends to be trapped in a group of densely connected vertices corresponding to a community rather than walk across communities boundaries within limited number of steps Can be used to compute the similarity matrix: in each walk, we keep track of the visited nodes in a se, the possibility that these nodes in the set belong in the same community is high, so the similarity values between each pair of the nodes in the set should be increased Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 8 of 9 Go Back Full Screen Close Quit  Random walk based algorithm to compute the similarity matrix     A walker is placed in the network, at each time step, the walker is on a vertex and randomly jumps from the vertex to one of its neighbors following the edge between them The walker tends to be trapped in a group of densely connected vertices corresponding to a community rather than walk across communities boundaries within limited number of steps Can be used to compute the similarity matrix: in each walk, we keep track of the visited nodes in a se, the possibility that these nodes in the set belong in the same community is high, so the similarity values between each pair of the nodes in the set should be increased The algorithm in literatures increased the similarity values between all pairs of the nodes in the set with an unique increment, i.e., it treated all the nodes in the set equally, without considering the order of nodes been visited during the walk Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 8 of 9 Go Back Full Screen Close Quit  Random walk based algorithm to compute the similarity matrix      A walker is placed in the network, at each time step, the walker is on a vertex and randomly jumps from the vertex to one of its neighbors following the edge between them The walker tends to be trapped in a group of densely connected vertices corresponding to a community rather than walk across communities boundaries within limited number of steps Can be used to compute the similarity matrix: in each walk, we keep track of the visited nodes in a se, the possibility that these nodes in the set belong in the same community is high, so the similarity values between each pair of the nodes in the set should be increased The algorithm in literatures increased the similarity values between all pairs of the nodes in the set with an unique increment, i.e., it treated all the nodes in the set equally, without considering the order of nodes been visited during the walk The node sequence have been visited in each walk forms a path from the start node to the end node, intuitively, on that path, the start node is more similar with the nodes been reached earlier than with the nodes been visited later, i.e., the similarities between nodes have some relationships with the order of nodes been visited Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 8 of 9 Go Back Full Screen Close Quit  Random walk based algorithm to compute the similarity matrix       A walker is placed in the network, at each time step, the walker is on a vertex and randomly jumps from the vertex to one of its neighbors following the edge between them The walker tends to be trapped in a group of densely connected vertices corresponding to a community rather than walk across communities boundaries within limited number of steps Can be used to compute the similarity matrix: in each walk, we keep track of the visited nodes in a se, the possibility that these nodes in the set belong in the same community is high, so the similarity values between each pair of the nodes in the set should be increased The algorithm in literatures increased the similarity values between all pairs of the nodes in the set with an unique increment, i.e., it treated all the nodes in the set equally, without considering the order of nodes been visited during the walk The node sequence have been visited in each walk forms a path from the start node to the end node, intuitively, on that path, the start node is more similar with the nodes been reached earlier than with the nodes been visited later, i.e., the similarities between nodes have some relationships with the order of nodes been visited We alter the similarity computation strategy of this method, add the consideration of nodes’ order been visited during the random walk Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 8 of 9 Go Back Full Screen Close Quit  Active learning strategy to generate the must-link constraints and cannot-link constraints  Every node compete with all its neighbors to be the candidate of the selection according to their degrees: v V, u arg maxn2N (v) (degree(n)); if degree(u) > degree(v ), then the node v votes for the node u, otherwise, the node v vote for itself 8 2 Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 9 of 9 Go Back Full Screen Close Quit  Active learning strategy to generate the must-link constraints and cannot-link constraints  Every node compete with all its neighbors to be the candidate of the selection according to their degrees: v V, u arg maxn2N (v) (degree(n)); if degree(u) > degree(v ), then the node v votes for the node u, otherwise, the node v vote for itself all the nodes with larger degree in local are extracted into the candiate set SC of the selection from the network 8 2  Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 9 of 9 Go Back Full Screen Close Quit  Active learning strategy to generate the must-link constraints and cannot-link constraints  Every node compete with all its neighbors to be the candidate of the selection according to their degrees: v V, u arg maxn2N (v) (degree(n)); if degree(u) > degree(v ), then the node v votes for the node u, otherwise, the node v vote for itself all the nodes with larger degree in local are extracted into the candiate set SC of the selection from the network Partition the candidate set SC into some clusters: every node in SC is taken as a cluster initially, and for every two nodes u and v in SC , if the number of their shared neighbors is larger than half of the degree of either of them, we merge the two clusters in which u and v belongs respectively into one; this merge operation is repeated until some termination criteria are met 8 2   Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 9 of 9 Go Back Full Screen Close Quit  Active learning strategy to generate the must-link constraints and cannot-link constraints  Every node compete with all its neighbors to be the candidate of the selection according to their degrees: v V, u arg maxn2N (v) (degree(n)); if degree(u) > degree(v ), then the node v votes for the node u, otherwise, the node v vote for itself all the nodes with larger degree in local are extracted into the candiate set SC of the selection from the network Partition the candidate set SC into some clusters: every node in SC is taken as a cluster initially, and for every two nodes u and v in SC , if the number of their shared neighbors is larger than half of the degree of either of them, we merge the two clusters in which u and v belongs respectively into one; this merge operation is repeated until some termination criteria are met From every cluster, the node with the largest degree and the node(s) with edge(s) connected with node(s) in other cluster(s) are selected into a set S 8 2    Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 9 of 9 Go Back Full Screen Close Quit  Active learning strategy to generate the must-link constraints and cannot-link constraints  Every node compete with all its neighbors to be the candidate of the selection according to their degrees: v V, u arg maxn2N (v) (degree(n)); if degree(u) > degree(v ), then the node v votes for the node u, otherwise, the node v vote for itself all the nodes with larger degree in local are extracted into the candiate set SC of the selection from the network Partition the candidate set SC into some clusters: every node in SC is taken as a cluster initially, and for every two nodes u and v in SC , if the number of their shared neighbors is larger than half of the degree of either of them, we merge the two clusters in which u and v belongs respectively into one; this merge operation is repeated until some termination criteria are met From every cluster, the node with the largest degree and the node(s) with edge(s) connected with node(s) in other cluster(s) are selected into a set S For each pair of nodes in S , we access a noiseless oracle to query the relationship between them 8 2     Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 9 of 9 Go Back Full Screen Close Quit  Active learning strategy to generate the must-link constraints and cannot-link constraints  Every node compete with all its neighbors to be the candidate of the selection according to their degrees: v V, u arg maxn2N (v) (degree(n)); if degree(u) > degree(v ), then the node v votes for the node u, otherwise, the node v vote for itself all the nodes with larger degree in local are extracted into the candiate set SC of the selection from the network Partition the candidate set SC into some clusters: every node in SC is taken as a cluster initially, and for every two nodes u and v in SC , if the number of their shared neighbors is larger than half of the degree of either of them, we merge the two clusters in which u and v belongs respectively into one; this merge operation is repeated until some termination criteria are met From every cluster, the node with the largest degree and the node(s) with edge(s) connected with node(s) in other cluster(s) are selected into a set S For each pair of nodes in S , we access a noiseless oracle to query the relationship between them 8 2     Contents Concepts about . . . Semi-supervised learning must-link constraints . . . Active learning Proposed algorithms Home Page Title Page JJ J II I Page 9 of 9 Go Back Full Screen Close Quit The most informative nodes and the nodes with least uncertainty are selected to access a noiseless oracle to query the relationship between them