Tải bản đầy đủ (.pdf) (46 trang)

Ebook Computational network science An algorithmic approach Part 1

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.68 MB, 46 trang )

Computational Network Science
An Algorithmic Approach

Henry Hexmoor

AMSTERDAM • BOSTON • HEIDELBERG
LONDON • NEW YORK • OXFORD
PARIS • SAN DIEGO • SAN FRANCISCO
SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an Imprint of Elsevier


Morgan Kaufmann is an imprint of Elsevier
225 Wyman Street, Waltham, MA 02451, USA
Copyright © 2015 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our arrangements
with organizations such as the Copyright Clearance Center and the Copyright Licensing
Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical treatment
may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in evaluating
and using any information, methods, compounds, or experiments described herein. In using such
information or methods they should be mindful of their own safety and the safety of others, including
parties for whom they have a professional responsibility.


To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume
any liability for any injury and/or damage to persons or property as a matter of products liability,
negligence or otherwise, or from any use or operation of any methods, products, instructions, or
ideas contained in the material herein.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress.
ISBN: 978-0-12-800891-1
For information on all MK publications
visit our website at


PREFACE
The days of the need for gurus and extensive libraries are behind us. The
Internet provides ready and rapid access to knowledge for all. This book
offers necessary and sufficient descriptions of salient knowledge that
have been tested in traditional classrooms. The book weaves foundations together from disparate disciplines including mathematical sociology, economics, game theory, political science, and biological networks.
Network science is a new discipline that explores phenomena common to connected populations across the natural and man-made world.
From animals to commodity trades, networks provide relationships
among individuals and groups. Analysis and leveraging connections
provide insights and tools for persuasion. Studies in this area have largely focused on opinion attributes. The impetus for this book is a need to
examine computational processes for automating tedious analyses and
usage of network information for online migration. Once online, network awareness will contribute to improved public safety and superior
services for all.
A collection of foundational notions for economic and social networks is available in Jackson (2008). A mathematical treatment of
generic networks is present in Easly and Kleinberg (2010). A complementary gap filled by this book is an algorithmic approach. I provide a
fast-paced introduction to the state of the art in network science. References are offered to seminal and contemporary developments. The book
uses mathematical cogency and contemporary computational insights.
It also calls to arm further research on open problems.

The reader will find a broad treatment of network science and review
of key recent phenomena. Senior undergraduates and professional people in computational disciplines will find sufficient methodologies and
processes for implementation and experimentation. This book can also
be used as a teaching material for courses on social media and network
analysis, computational social networks, and network theory and applications. Our coverage of social network analysis is limited and details
are available in Golbeck (2013) and Borgatti et al. (2013).


xPreface

Whereas a teacher is a tour guide to the subject matter, this book is
a reference manual. Chapters in each part are related and they progress
in maturity. Chapters are semi-independent and a course instructor may
choose any order that meets the course objectives. Exercises at the end
of each chapter are students’ hands-on projects that are designed for
covering learning activities during a semester. Some code is provided
in appendices for prototyping and learning purposes only. We do not
provide a how-to guide to mainstream social media or codebook for
application development that is available elsewhere.
Henry Hexmoor
Carbondale, IL
2014

REFERENCES
Borgatti, S., Everett, M., Johnson, J., 2013. Analyzing Social Networks. SAGE Publications.
Easly, D., Kleinberg, J., 2010. Networks, Crowds, and Markets. Cambridge University Press.
Golbeck, J., 2013. Analyzing the Social Web. Morgan Kaufmann Publications.
Jackson, M., 2008. Social and Economic Networks. Princeton University Press.



CHAPTER

1

Ubiquity of Networks
1.1 INTRODUCTION
Broadly speaking, a network is a collection of individuals (i.e., nodes)
where there are implicit or explicit relationships among individuals in
a group. The relationships may be strictly physical as in some sort of
physical formation (e.g., pixels of a digital image or cars on the road),
or they may be conceptual such as friendship or some similarity among
pairs or within a pair. In an implicit network, individuals are unaware
of their relationships, whereas in an explicit network, individuals are
familiar with at least their local neighbors. In certain implicit networks
called affinity networks, there is a potential for explicit connections from
relationships that account for projected connection such as homophilly
(i.e., similarity) (McPherson et  al.,  2001). Biological networks capture
relationships among biological organisms. For instance, the human brain
neurons form a large network called a connectome (Seung, 2012). An ant
society is an example of a large biological network (Moffett, 2010). There
are many examples of small-scale animal networks, including predators and their prey, plant diseases, and bird migration. Human crowds
and network organizations (e.g., government or state agencies, honey
grids in bee colonies) are other examples of natural networks. Modern
anonymous human networks have capacities for crowd solving problems
(Nielsen, 2012), where a group of independently minded individuals possess a collective wisdom that is available to singletons (Reingold, 2000).
Social and political networks model human relationships, where social
and political relations are paramount. Economic networks are models
of parties related to economic relationships such as those among buyers
(and consumers), sellers (and producers), and intermediaries (i.e., traders and brokers) (Jackson,  2003). Beyond natural networks, there are
myriads of synthetic networks. The grid of a photograph is an example

of synthetic networks. Nanonetworks are attempts to network nanomachines for emerging nanoscale applications (Jornet and Pierobon, 2011).


2

Computational Network Science: An Algorithmic Approach

A large class of networks is a complex engineered network (CEN) that is a
man-made network, where the topology is completely neither regular nor
random. A CEN supports evolving functionalities. Examples of CENs
are the Internet, wireless networks, power grids with smart homes and
cars, remote monitoring networks with satellites, global networks of telescopes, and networks of instruments and sensors from battlefields to hospitals. Time requirements in CENs range from seconds in cyber-attacks
to years in greenhouse gas emissions. Data and control flow in CENs
must be managed over connections that could span thousands of miles.
A few synthetic network categories, including CENs, are created intentionally. Here, we list six types:
1. Social networks through networking sites and services
2. Political networks as in parliamentary cabinets and political
committees
3. Computer networks that include computers as nodes and how they
communicate over local, wide area, and wireless links (e.g., sensor
networks)
4. Telecommunication networks as in switches for nodes and respective
routing paths
5. Power grids
6. Cellular networks as in cellular base stations and transmission
frequencies
There are many synthetic, however, unintended, network categories.
For example, colocated brick-and-mortar businesses may share clientele
that is sometimes unintended. As such, those businesses form a location affinity network. Relationships in affinity networks are only implied
and in the context of the affinity context (e.g., colocation). Consumers

visiting popular e-commerce sites (e.g., amazon.com) form their own
product preference affinity networks. Although pairs of individual consumers may never meet in-person, the e-commerce services use affinity
networks for data mining and marketing. Individuals sharing like votes
(or retweets) are part of an affinity network (or a hashtag) in the context
of what they liked (or tweeted).
Figure 1.1 depicts a taxonomy of network types. Exchange networks
are those in which a quantifiable entity is exchanged among the nodes
whether or not the nodes are tangible (e.g., natural gas) or intangible




Ubiquity of Networks3

Fig. 1.1.  A network taxonomy.

(e.g., trust). Relational networks are inert and merely reflect juxtaposition of nodes. All CENs are exchange networks.
Once a network emerges, we can explore interactions within the network. Strategic interactions involve reasoning and deciding over selection of strategies. They can be modeled with game theory that will be
our main focus in Chapter 3.
Network theory is a set of algorithms that codifies relationships
among network topology and outcomes, which are meaningful to network inhabitants. There is a movement afoot that codifies network phenomena under the term network science. These phenomena and salient
algorithms will be discussed throughout this book.
An Online Social Networking Services (OSNS) creates synthetic networks among people. The salient incentive for using an OSNS is to gain
social authority (i.e., legitimacy), which is a form of social power and
not generally a measure of vanity. Social authority in social networks
is with respect to a group and with respect to specific topics. Therefore,
social authority is a relative measure and not an absolute quantity. In
Section 1.2, we review a few popular OSNSs from a rapidly growing list
(Khare, 2012). Since they provide platforms to create, to share, and/or



4

Computational Network Science: An Algorithmic Approach

to exchange information and ideas in virtual communities, an OSNS
is considered to be a medium for social media. There are quantitation
schemes over social media, such as Klout, which offers user scores (i.e.,
a number between 1 and 100). Klout calls influence, which is a measure
of a user’s ability to reach one other through an OSNS. This measure is
valuable for marketing products online.
In Section 1.3, we review a few popular online bibliographic services
(OBS) that house published articles. We return to generic models of networks in Section  1.4. This is followed by a review of popular models
of synthetic network generations in Section  1.5. A fully implemented
NetLogo model (i.e., code and accompanying descriptions for use) of
network generation models and analysis is available in the Appendix.

1.2 ONLINE SOCIAL NETWORKING SERVICES
Facebook is an OSNS that connects people, organizations, friends, and
others who work or live around together. Nodes in a Facebook network
can be individuals or organizations. Some of these may be entirely synthetic without real-world humans. The main Facebook tool for connections is friendship. Facebook is used largely for personal and recreational
functions. As such, it has filled the social gaps created by physical and psychological dispersion among traditional families and friends. It also serves
as a medium that creates relationships that would not otherwise exist.
One Facebook’s feature known as sharing allows adjustments on
spread of information (i.e., selecting an audience). Sharing is used to
limit who can view posts and photos. It is a three-step process: (1) indicates who you are (i.e., tagging), (2) tells where you live (i.e., adding a
location to a post), and (3) manages the privacy right for where you post
(i.e., the inline audience selector). Sharing gives users control over their
information diffusion, which in turn can yield a measure of social authority. Another Facebook’s like feature provides a directional relationship (i.e., tie, connection, and link) that lends credibility to the item and
is proportional to the credibility (i.e., authority) of the endorser.

Twitter is an OSNS that facilitates broadcasts of messages (i.e.,
tweets). The main twitter tool for connections is the explicit alignments
of ideas among people (i.e., following). Twitter can be used by small or
large groups to form crowd sourcing. For example, in the small network,




Ubiquity of Networks5

when a family stays organized about their travel itinerary, there are disparate opinions. In the large network, a large social project, such as a
protest, can be planned. Twitter can be used to work semi-anonymously
with others. Twitter’s hashtag (i.e., #) is a feature for labeling a topic. Anyone may introduce or reuse a hashtag to attract attention. For example,
#flight1549 added to a tweet labels the tweet to be about “flight1549.”
This hashtag labeling facilitates search related to specific topics. Individuals who use specific hashtags form an implicit network in the context
of their hashtags. This feature has been used for commercial marketing
and anonymous coordination over social actions. The range of potential uses for hashtags is enormous, and they have been adopted by other
OSNSs such as Facebook. On the one hand, Twitter can be used for social organizations of crime or dissent. On the other hand, it can be used
to predict and mitigate violations of law enforcement. Since Twitter provides democratization of opinion sharing and equal access for dissemination, it is seen as a social equalizer and as such it might be feared by
repressive systems (e.g., government regimes). Twitter’s social authority
is composed of three components: (1) the retweet rate of users’ last few
hundred tweets, (2) the recentness of those tweets, and (3) a retweetbased model trained on users’ profile data. Tagging someone shows
the Twitter id to more people, whereas direct messaging someone just
puts spam in their inbox, which is generally undesirable. Websites, such
as Klout.com, gauge the influence you have by monitoring things, for
example, how active you are and how much you have been tagged on
Twitter. Twitter’s lists are a way to organize others into groups. When
you click on a list, you will retrieve a stream of tweets from all the users
included in that group. As a rule of thumb, if you want to develop relationships on Twitter, you should read other tweets, retweet good contents, tweet good contents, and stay on top of keywords and interests
that you follow. The same advice applies if you want to get retweeted.

Linkedin is an OSNS that provides an online forum for professional
identity management. The main tool for Linkedin’s connections is to
link people, who would like to support one another (i.e., connections).
Linkedin allows people to conduct a weak form of endorsement in regards to specific skills. This creates directional links from endorsers to
endorsees. Linkedin allows a stronger directional endorsement through
recommendations. Endorsed individuals’ profiles gain social authority
via Linkedin’s endorsements and recommendations. Of course, the


6

Computational Network Science: An Algorithmic Approach

gained authority is proportional to the authority of those endorsing and
recommending.
Pinterest is an OSNS that allows users to create and manage themebased image collections. Repining in Pinterest is the feature that creates
social authority.
Started in 2011, Whisper.sh is a privately owned mobile OSNS that
allows anonymous posts including photographs. It allows others to like
posts, which creates a network of posts as nodes and directional links.
Since users are anonymous, the resulting network is implicit.

1.3 ONLINE BIBLIOGRAPHIC SERVICES
DBLP is a Computer Science Bibliography database website hosted at
Universität Trier in Germany. It houses a large collection of published
articles and offers capabilities for browsing and searching. The resulting
database is a network of “author” nodes connected via coauthorship.
Through citations, papers are nodes of a separate network of paper, as
nodes and citations are the links.
Google Scholar is another bibliography database website released in

2004 by google.com. It creates networks of authors and papers similar
to DBLP.
Microsoft Academic Research is an OBS (with a corresponding Windows app) that is supported by Microsoft.com that offers a similar service to DBLP.
Research Gate is an independent privately owned online site founded
in 2008 for scientists and researchers to share papers, to ask questions,
to answer questions, and to find collaborators. On the one hand, it is an
OBS, even though it is far smaller than its rivals. On the other hand, it is
an OSNS for professionals.

1.4 GENERIC NETWORK MODELS
In this section, we review four of the most popular generic network
models. In contrast to descriptive models in this section, Section 1.5 will
offer algorithms for artificially generating networks.




Ubiquity of Networks7

1.4.1 Random Networks
G(n, p) is a random graph model with n nodes where the probability of a
pair of nodes in it being linked is denoted by p (Erdős and Rényi, 1959).
When p is small, the network is sparsely connected. When p is close to
1/n, the network appears fully connected. When p is almost 1.0, the connectivity among nodes is very high and the network is said to be a giant
component. The spread of node degrees for a random graph model (i.e.,
degree distribution) appears binomial in shape. A closely related model
is the random geometric graph G(n, r), where there are n nodes and the
distance between a pair of nodes in the graph is less than or equal to r
(Penrose, 2003). Contrary to mathematical models, real-world networks
exhibit a degree distribution that is unevenly distributed. In the powerlaw distribution, the probability that a node has a degree distribution k

(i.e., the number of connected neighbors) is determined by P(k) ≈ kg, where
parameter g is typically constrained between 2 and 3, that is, 2 ≤ g ≤ 3.
Uneven distribution stems from preferential attachment, where the probability that a new node will attach to a node i is degree( i ) / ∑ j degree ( j ) .
A node degree refers to the node’s number of neighbors. Preferential
attachment is commonly found in nature as well as man-made networks
such as an economic network (Gabaix,  2009). Random networks are
mathematically the most well-studied and well-understood models.

1.4.2 Scale-Free Networks
There is a model based on preferential attachment described by Barabasi and Albert (1999). In this model, a new node is created at each
time step and connected to existing nodes according to the “preferential attachment” principle. At a given time step, the probability p
of creating an edge between an existing node u and the new node is
p = [(degree( u ) + 1) / (| E | + | V |)] , where V is a set of nodes and E is the
set of edges between nodes. The algorithm starts with some parameters
such as the number of steps that the algorithm will iterate, the number of nodes that the graph should start with, and the number of edges
that should be attached from the new node to preexisting nodes at each
time step. The Barabasi model of network formation produces a scalefree network, a network where the node degree distribution follows a
power-law principle. Scale-free networks produce small number of components, small-diameter, heavy-tailed distribution, and low clustering.


8

Computational Network Science: An Algorithmic Approach

Many types of data studied in the physical and social sciences can be
approximated with a Zipf distribution (Li,  1992), which is one of the
families of discrete power-law probability distributions. An implication
of the Zipf law is that the most frequent word will occur approximately
twice as often as the second most frequent word, which occurs twice as
often as the fourth most frequent word, etc.

Unlike the growth model of Barabasi, Epstein and Wang’s (2002)
steady-state model uses a rewiring scheme that results in power-law
distribution. This model evolves an initial graph according to Markov
process, while maintaining constant size and density.
Epstein and Wang’s algorithm has two major steps: (1) initialize a sparse graph and (2) edit Markov edges. To generate the sparse
graph G, they randomly add an edge between vertices with probability
2  m / [ n  ×( n − 1)] , where m is the number of edges added and n is the
number of vertices. If the number of edges in G is still less than m, they
start adding edges with a probability of 0.5 until the graph G has m
edges. The second step is to reiterate the algorithm in Figure 1.2 r times
on G, where r is a parametric value.

1.4.3 Trade-Off Model
A trade-off-based model of network formation is the highly optimized
tolerance (HOT) class of models. In a simple model, nodes are allowed
to reason about their connections to other nodes. A node i’s connection
cost to a node j is denoted by cij. The node i will consider centrality of
potential nodes j for attachments denoted by cenj. The node i will consider nodes that minimize the value of a × cij + cenj, where a is a positive
weighing factor dependent on the network size n.

Fig. 1.2.  Epstein and Wang’s (2002) algorithm.




Ubiquity of Networks9

1.4.4 Game Theoretic Models
Game theoretic model of network formation focuses on reasoning over
each node’s connection with others. A strategy set of an agent i is a set

of strategies to connect each node in the network, that is, Si = {si1, si2,
…, sin}, where sij is a strategy to connect a node i to a node j. An agent
incurs a cost in a connection that is a combination of a fixed cost plus a
sum of distances between the node and all other nodes in the network.
For example, cost( sij ) = c + Σ j d (i , j ), where c is a fixed cost and d(i, j)
is the distance between nodes i and j in the number of links. The cost is
shared if both parties choose the link. Otherwise, it is incurred by one
agent. Synergistic strategy selection will provide utility for agents that
are linked.
Each strategy will have a payoff that is utility minus the link cost. The
Nash equilibrium (Carmona,  2012) is achieved with a strategy profile
(i.e., a set of links) that minimizes cost for all agents, and no agent has
incentive to deviate from it.

1.5 NETWORK MODEL GENERATORS
In this section, we review three of the most popular models for generating artificial networks.

1.5.1 Kleinberg’s Small-World Model
A social network is called a small-world network if, roughly speaking,
any two of people in the network can reach each other through a short
sequence of acquaintances (Kleinberg,  2001). Milgram’s basic smallworld experiment is the most famous experiment that analyzed the
small-world problem (Milgram,  1967). The purpose of the experiment
was to determine whether most pairs of people in society were linked by
short chains of acquaintances. So, individuals were asked to forward a
letter to a “target” through people whom they knew on a first-name basis.
Watts and Strogatz (1998) proposed a small-world network model
that incorporated the features of Milgram’s experiment. Kleinberg
(2001) proposed a variant of Watts and Strogatz’s basic model that can
be described as follows. One starts with a p-dimensional lattice, in which
nodes are joined only to their nearest neighbors. One then adds k-directed

long-range links out of each node v, for a constant k; the end point of


10

Computational Network Science: An Algorithmic Approach

each link is chosen uniformly at random (Kleinberg, 2001). Kleinberg
studied the model from an algorithmic perspective and showed that,
with a high probability, there will be short paths connecting all pairs
of nodes and the network will have the lattice-like structure. Kleinberg
model does not yield a heavy-tailed degree distribution.
Kleinberg (2000) showed a simple greedy algorithm that can find
paths between any source and destination using only O(log 2 n ) expected
edges. Kleinberg’s algorithm that will be used in this study is based on
two parameters: the lattice size and the clustering exponent. Each node
u has four local connections, one to each of its neighbors and in addition
one long-range connection to some node v, where v is chosen randomly
according to the probability proportional to d−a, where d is the lattice
distance between u and v and a is the clustering exponent.

1.5.2 Barabási and Albert’s Scale-Free Network Generator
Barabási and Albert (1999) discussed the features of the scale-free networks in detail and compared them with the features of other types of
networks, for example, small-world networks. Scale-free networks expand continuously by the addition of new vertices, and new vertices attach preferentially to vertices that are already well connected. Most of
the real networks are free-scale networks, such as WWW and citation
patterns of scientific publications, and both of them follow a power-law
distribution (Barabási and Albert, 1999).
Albert and Barabási (2002) showed a comparison between their model
and other previously proposed models. They state that other network
models start with a fixed number of vertices that are then randomly

connected or reconnected without modifying the number of vertices.
However, the WWW as an example will grow exponentially in time by
addition of any new web page. Also, other network models assume that
new edges are placed randomly, that is, the probability of connecting
two vertices is independent of the vertices’ degree. However, most of
the real networks do not behave like that. They exhibit preferential
attachment, that is, connecting two vertices is dependent on the vertices’
degree (Albert and Barabási, 2002).
According to Albert and Barabási (2002), a new node is created
at each time step and connected to existing nodes according to the




Ubiquity of Networks11

“preferential attachment” principle. At a given time step, the probability
p of creating an edge between an existing node u and the new node is
[(degree( u ) + 1) / (| E | + | V |)] . The algorithm starts with some parameters, such as the number of steps that the algorithm should iterate, the
number of nodes that the graph should start with, and the number of
edges that the new node should be attached to the preexisting nodes at
each time step.
The hierarchical network model (HNM) is part of the scale-free model
family and shares its main property of yielding proportionally more
hubs among the nodes than by random network generation. HNMs are
heavy-tailed, have small diameter, and have high clustering.

1.5.3 Epstein and Wang’s Power-Law Network Generator
Epstein and Wang (2002) have proposed a graph model called the steadystate model that results in power law by evolving a graph according to
Markov process while maintaining constant size and density. The only

difference between their model and Barabási and Albert’s model is that
their model does not require incremental growth, whereas Barabási and
Albert’s model does. Epstein and Wang’s algorithm can be viewed in two
steps: (1) initialize a sparse graph and (2) edit Markov process. To generate the sparse graph G, the algorithm randomly adds an edge between
vertices with the probability 2  m / [ n  ×( n − 1)] , where m is the number of
edges added and n is the number of vertices. If the number of edges in G
is still less than m, the algorithm starts adding edges with a probability
of 0.5 until the graph G has m edges. Then, we reiterate the algorithmic
steps, shown in Figure 1.2, r times on G, where r is a model parameter
(Epstein and Wang, 2002).

1.6 A REAL-WORLD NETWORK
In this section, we sketch essential components of a generic, commonplace exchange network applicable to package delivery and durable
products. We are keeping this model simple in order to avoid complexities of supply chain management and economic networks. Let C be a set
of consumers of a commodity (e.g., received packages or appliances)
and P be a set of producers of the same commodity (e.g., package


12

Computational Network Science: An Algorithmic Approach

senders). A node can be both a producer and a consumer at different
times T and locations L (i.e., nodes). The set C is strictly larger than P,
and it may subsume it entirely. The production rate of a producer is a
function of time and location denoted by P(t, l). Similarly, the consumption rate of a consumer is a function of time and location denoted by
C(t, l). After production but before consumption, commodities are in
transit at the rate P − C, that is, Transit(t, l1, l2) = P(t, l1) − C(t, l2). Locations of production and consumption must be distinct, that is, l1 ≠ l2. If
these locations are the same, transit is null, that is:
∀t ∈ T , l1 , l2 ∈ L, P (t, l1 ) ≥ 0 ∩ C (t, l2 ) ≥ 0 ∩ l1 = l2 → Transit(t, l1 , l2 ) = ∅

The Transit(·) function specifies the flow rate among nodes of the network. If we could specify the maximum flow between all pairs of nodes
in the network, we could discover network capacity for transit using the
standard graph theoretic flow network algorithm, for example, the FordFulkerson algorithm (Kleinberg and Tardos,  2005). Transit/flow rates
incur a cost corresponding to the amount of flow that needs to be paid
by the pair of a sender and a receiver. It may be beneficial to share the
transmission cost with neighbors, who form game theoretic coalitions
that will be discussed in Chapter 3.
In many scenarios, there is a need for intermediaries to facilitate
transfer of commodities from producers to consumers. For simplicity, we assume intermediaries to be uniform handlers, who are neither
a producer nor a consumer of commodities they handle. In economic
networks, handlers are traders (discussed in Chapter 9). In production line networks, intermediaries are dealers. In the mail carrier networks, intermediaries are delivery personnel. In electric networks, intermediaries are switches. In computer networks, intermediaries are
routers.
Handling capacity of agent i is a function of time and number of
items. Let handleri(t, I) return a delay time in i’s ability to handle I items
at time t. Delay time of zero is on time handling. Typically, there are
more handlers than items in transit. A property of interest is to find
optimal number of handlers for the volume of items to be handled with
no delay.




Ubiquity of Networks13

1.7 CONCLUSIONS
Networks are abundantly around us. They are man-made or naturally
occur. They are implicit, hidden, explicit, or articulated. They might be
tangible and objectively quantified, or they might be subjective and difficult to quantify. They all tend to change in time, which is the subject of
our future chapters on network dynamics.


REFERENCES
Albert, R., Barabási, A.-L., 2002. Statistical mechanics of complex networks. Rev. Mod. Phys. 74,
47–97.
Barabási, A.L., Albert, R., 1999. Emergence of scaling in random networks. Science 286, 509–512.
Carmona, G., 2012. Existence and Stability of Nash Equilibrium. World Scientific Publishing
Company.
Epstein, D., Wang, J., 2002. A steady state model for graph power laws. In: Proceedings of 2nd
International Workshop on Web Dynamics. World Scientific Publishing Company.
Erdős, P., Rényi, A., 1959. On random graphs. Publicationes Mathematicae 6, 290–297.
Gabaix, X., 2009. Power laws in economics and finances. Annu. Rev. Econ. 1, 255–294.
Jackson, M., 2003. A survey of models of network formation: stability and efficiency. In: Demange,
G., Wooders, M. (Eds.), Group Formation in Economics: Networks, Clubs, and Coalitions.
Cambridge University Press.
Jornet, J.M., Pierobon, M., 2011. Nanonetworks: a new frontier in communications. In: Communications
of the ACM. Vol. 54, No. 11. ACM, pp. 84–89.
Khare, P., 2012. Social Media Marketing eLearning Kit For Dummies. Wiley.
Kleinberg, J., 2000. The small-world phenomenon: an algorithmic perspective. In: Proceedings of
32nd ACM Symposium on Theory of Computing. ACM, pp. 163–170.
Kleinberg, J., 2001. Small-world phenomena and the dynamics of information. In: Proceedings of
the Advances in Neural Information Processing Systems (NIPS), Vol. 14. NIPS.
Kleinberg, J., Tardos, E., 2005. Algorithm Design. Addison-Wesley.
Li, W., 1992. Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans. Inf.
Theory 38 (6), 1842–1845.
McPherson, M., Lovin, L.S., Cook, J., 2001. Birds of a feather: homophily in social networks.
Annu. Rev. Sociol. 27, 415–444.
Milgram, S., 1967. The small world problem. Psychol. Today 1 (1), 61–67.
Moffett, M., 2010. Adventures Among Ants. University of California Press.
Nielsen, M., 2012. Reinventing Discovery: A New Era of Networked Science. Princeton University
Press.
Penrose, M., 2003. Random Geometric Graphs. Oxford University Press.

Reingold, H., 2000. The Virtual Community. MIT Press.
Seung, S., 2012. Connectome: How Brain’s Wiring Makes Us Who We Are. Mariner Books.
Watts, D., Strogatz, S., 1998. Collective dynamic of small-world networks. Nature 393 (6684), 440–442. 


14

Computational Network Science: An Algorithmic Approach

EXERCISES
1. Using examples, describe how animal swarms are networked.
2. What are the salient characteristics of biological networks (e.g.,
brain cells and protein chains) that differentiate them from other
types of networks?
3. What will be the role of network organizations in the year 2025?
Give examples.
4. How can social media be used to track cultural changes in a society?


CHAPTER

2

Network Analysis
There has been a long tradition of measuring qualities for network
locations from both egocentric and global perspectives. This is largely
addressed with quantification attempts in mathematical sociology under
the theme of social network analysis (SNA) (Wasserman and Faust, 1994;
Knoke and Yang,  2007; Golbeck,  2013; Borgatti et  al.,  2013). There
are also several popular software toolkits that perform analysis and

visualization of social networks (i.e., sociograms) including UCINET
and NodeXL. Tom Snijders’ SIENA is a program for the statistical
analysis of network data. The NSF-sponsored visualization project is
Traces (Suthers,  2011), which traces out the movements, confluences,
and transformations of people and ideas in online social networks.
The aim of this chapter is to review a selective subset of SNA
measures that complement algorithmic descriptions explained in the
remainder of this book. For a glossary of SNA terms, readers are
recommended to consult Golbeck (2013).
We will start with egocentric (i.e., node view) measures. A degree-1
network of a node is the node and its immediate neighbor nodes. A
degree-1.5 network of a node is the node’s degree-1 network and its links
among immediate neighbors (Golbeck,  2013). A degree-2 network of
a node is the node’s degree-1 network and all its immediate neighbors’
connections (Golbeck,  2013). A degree-n network of a node is the
degree-1 network of the node plus all the nodes and the corresponding
links that are no more than n links away from the starting node.
A path is a chain (i.e., succession) of nodes connected by links between pairs of nodes. Two nodes are connected if and only if (i.e., iff)
there is a path between them. A connected component is a set of nodes
with connected paths among all pairs of nodes in the set. A bridge is a
link that connects two isolated connected components. A hub is a node
with many connections. Reachability is whether two nodes are connected
or not by way of either a direct or an indirect path of any length.


16

Computational Network Science: An Algorithmic Approach

Geodesic distance, denoted by distanceij, is the number of links in the

shortest possible path from node i to node j. Diameter of a network is
the largest geodesic distance in the connected network. Reverse distance,
denoted by RDij, is distanceij − (1 + Diameter). Metrics in Equations 2.1
and 2.2 are adapted from Valente and Foreman (1998):
Integration ( k ) = ∑



j ≠k

Radiality ( k ) = ∑



j ≠k

RD jk

(2.1)

n −1

RD kj

(2.2)

n −1

Structural centrality measures of a node are a host of measures reflecting the structural properties of the links surrounding a focal node.
For example, degree centrality of a node is the number of edges incident

on the node. Closeness centrality of a node is the average of the shortest path lengths from the node to all other nodes in the network. It is a
rather small number in small-world networks (Watts and Strogatz, 1998).
Betweenness centrality of a node is a measure of the node’s importance
(and possibly influence as discussed in Chapter 7) and is computed using
the algorithm shown in Figure 2.1.
Eigenvector centrality measures the centrality of neighbor nodes and
has been used as a measure of influence and power, which are discussed
later in this book (Bonacich and Lu, 2012). Bonacich developed a beta
centrality measure CBC with a parameter a used for adjusting the importance of a node’s degree versus a parameter b for adjusting the importance of the neighbor’s centrality. This is shown in Equation 2.3:


CBC   =



α + [ β  × CBC ( j )] = α log n (i ) +   β  ×

j ∈N (i )

Fig. 2.1.  Betweenness value computation.



j ∈N (i )

CBC ( j )

(2.3)





Network Analysis17

Eigenvector centrality of a node at time t is computed with Equation 2.4, where C(t) is the vector of node centralities, A is the adjacency
matrix, and At is the result of iterated multiplications of A:


C (t ) = At  C (t )

(2.4)

As time approaches ∞, the dominant eigenvalue g will determine the
centrality vector value with the value γ t  ×  V 1 , where V1 is the eigenvector corresponding to the dominant eigenvalue g (Chiang, 2012).
Let us consider a degree-1.5 network of a node and measure the ratio
of the actual number of links in that network over the total number of
possible links that could exist, which yields a measure called the local
clustering coefficient (Golbeck, 2013).
Density of a network is the ratio of the actual number of links in that
network over the total number of possible links that could exist. Cohesion is the minimum number of edges that has to be removed before the
network is disconnected.
Let us consider a cluster that is a subset of nodes s and each node may
count the ratio r as node. r is the density of its neighbors in s versus the
total number of its neighbors. In the set s, the node with the minimum
r value rmin yields the value called density of cluster (used in Chapter 7).
Whereas centrality is a microlevel measure, centralization is a macrolevel measure, which measures variance in the distribution of centrality in a network. We show the most generic form of centralization in
Figure 2.2.
Leadership (L) is a measure of network domination, computed using Equation 2.5, where dmax is the degree of the node with the highest

Fig. 2.2.  Centralization algorithm.



18

Computational Network Science: An Algorithmic Approach

degree and di is the degree of node i (Freeman,  1978; Macindoe and
Richards, 2011):
d max − di
j =1 ( n − 2) × ( n − 1)
n

L =∑



(2.5)

Bonding (B) measures triadic closure in a graph (Macindoe and
Richards, 2011) using Equation 2.6:



B=

6 × number of triangles
number of length between two paths

(2.6)


Diversity is a measure of the number of edges in a graph that are disjoint. End vertices of such edges are not adjacent (i.e., disjoint dipoles).
Diversity is shown in Equation 2.7:



D=

number of disjoint dipoles
[(n / 4) × ((n / 2) − 1)]2

(2.7)

Burt’s structural holes measure gaps among connected components
and as such are another measure of diversity (Burt, 1995).

2.1 CONCLUSIONS AND FUTURE WORK
Network analysis focuses on quantification (and statistical analyses) of
qualities of relative nodes’ locations as well as entire network properties.
SNA has long been a stable tool for mathematical sociology (Borgatti
et  al.,  2013). An active direction of interest has been intelligence
analysis of human networks to understand, predict, and mitigate law
enforcement as well as understand geopolitical landscapes. The recent
debate over surveillance and monitoring of electronic communication
metadata by the National Security Agency (NSA) is indicative of this
fervent interest.
A second direction of interest is marketing and branding on social
media. The interest is to understand human propensity for influence
from network connections. Marketers use these propensities to craft viral
dissemination of consumption patterns and manipulation of economic





Network Analysis19

activities. The documentary filmmaker, Morgan Spurlock, has publicly
explored branding on social media. His mission is to raise public awareness and to inform us about the changing landscape of cultural values in
the society (e.g., supersize me app).

REFERENCES
Bonacich, P., Lu, P., 2012. Introduction to Mathematical Sociology. Princeton University Press.
Borgatti, S., Everett, M., Johnson, J., 2013. Analyzing Social Networks. SAGE Publications Ltd.
Burt, R., 1995. Structural Holes: The Social Structure of Competition. Harvard University Press.
Chiang, M., 2012. Networked Life: 20 Questions and Answers. Cambridge University Press.
Freeman, L., 1978. Centrality in social networks: conceptual clarification. Soc. Netw. 1, 215–239.
Golbeck, J., 2013. Analyzing the Social Web. Morgan Kaufmann.
Knoke, D., Yang, S., 2007. Social Network Analysis. Sage Publications.
Macindoe, O., Richards, W., 2011. Comparing networks using their fine structure. Int. J. Soc.
Comput. Cyber-Phys. Syst. 1 (1), 79–97, Inderscience Publishers.
Suthers, D., 2011. Interaction, mediation, and ties: an analytic hierarchy for socio-technical
systems. In: Proceedings of the Hawaii International Conference on the System Sciences
(HICSS-44). January 4–7, 2011, Kauai, Hawai‘i.
Valente, T., Foreman, R., 1988. Integration and radiality: measuring the extent of an individual’s
connectedness and reachability in a network. Soc. Netw. 20 (1), 89–105.
Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge
University Press.
Watts, D., Strogatz, S., 1998. Collective dynamics of ‘small-world’ networks. Nature 393 (6684),
440–442.  

EXERCISES

1. How can we track search queries (i.e., say in Google or YouTube)
from social network profiles of corresponding users?
2. Describe a network that appears to be but is not a scale-free
network.
3. How can beta centrality measure be used to find weakly connected
nodes in a network?
4. Modify density metric for weighted networks, that is, networks with
links that have weights as strengths of connections.


CHAPTER

3

Network Games
Decision making requires reasoning. Whereas decision theory is about the
process of an individual’s reasoning processes when pertinent decision
attributes can be independently ascertained, game theory (GT) is about
the process of reasoning when pertinent decision attributes include
decisions of other individuals (Fudenberg, 1991). The latter is the scenario
in networks where all the decisions are interdependent. GT has been
a branch of mathematics (Barron,  2008) and has long been used to
explain economic decision making in the theories of microeconomics
(Mas-Colel et al., 1995). We will briefly introduce GT in Section 3.1 before
the discussion of network-relevant applications in Sections 3.2–3.6.

3.1 GAME THEORY INTRODUCTION
A game is a simple tuple 〈I, S, U〉. Here, I is a set of individuals (i.e.,
players or agents in GT nomenclature). Whereas Si is a nonempty set of
actions (i.e., strategies in GT) for agent i, S is a set of all agents’ strategy

sets, that is, S =   Π Si .
mi: S → R is the utility function (i.e., payoff) for agent i. For a combination of simultaneous decisions, agent i receives a nonnegative reward.
For convenience, S−i is used to denote the strategies of agents other than
agent i.
Matching Pennies is a famous zero-sum game of pure conflict with
two actions {head, tail}, with a payoff bimatrix shown in Figure  3.1.
The minimax theorem guarantees that all zero-sum matrix games are
solvable, which means we can determine strategies that maximize player
payoffs (Osborne and Rubinstein, 1994).
Many real-world games have action sets with infinite cardinality such
as the economic competition among firms deciding on production of
product quantities described in Cournot games. In the case of two identical firms and a single product type, Si = [0, ∞], which is the amount of
goods that firm i will produce. Payoffs are given by Equation 3.1, where


22

Computational Network Science: An Algorithmic Approach

Fig. 3.1.  Matching Pennies game payoff bimatrix.

p is the price of goods as a function of amount of goods produced and
ci is the unit cost of the product for firm i:

µi = si × p(S1 + S2 ) − ci × Si
(3.1)
For Cournot games, it is typical to plot the best responses of two
players (i.e., strategies with optimal payoffs for players) with S1 versus
S2. In such plots, the point of intersection of the two best response lines
denotes the equilibrium point (i.e., stability point) often denoted as S*

(i.e., the amount of goods either firm should produce) where neither
player will have an incentive to unilaterally abandon the strategy prescribed by the equilibrium.
A famous two-player competitive game is Prisoner’s Dilemma (PD)
with prototypical payoffs shown in Figure  3.2 with two strategies of
cooperation (C) and defection (D). In PD games, D is the dominant
strategy (i.e., the strategy that yields higher payoff for the player)
regardless of players’ choices. Often, there are strategies that might be
dominated (e.g., suicide would always produce a loss–loss strategy combination in PD) and game analysis often suggests elimination of such
strategies (Myerson, 1997). Nash equilibrium (NE) for a game 〈I, S, U〉 is
a strategy profile S* ∈ S (i.e., an ideal strategy combination for players)
such that for all i ∈ I and for all Si ∈ S, Equation 3.2 holds. NE is a
form of equilibrium (i.e., stability). In many competitive games, there
are multiple equilibria, among which we must select the most desirable
one based on contextual biases (Nisan et al., 2007). A measure of equilibrium efficiency is the Price of Anarchy that is the ratio between the
worst and the best equilibria (Roughgarden, 2005).

Fig. 3.2.  Prisoner’s Dilemma game payoff bimatrix.


×