SoftwareMetrics Data Collection
Lecture # 42
1
SoftwareMetrics Data Collection
• Software measurement is only as good as
the data that are collected and analyzed
• In other words, we cannot make good
decisions with bad data
2
Moroney, 1950
3
Data should be collected with clear
purpose in mind. Not only a clear
purpose but a clear idea as to the precise
way in which they will be analyzed so as
to yield the desired information. It is
astonishing that men who in other
respects are clear-sighted, will collect
absolute hotchpotches of data in the
blithe and uncritical belief that analysis
can get something out of it
4
• We need to ask questions, like
– What constitutes good data
5
What is Good Data?
• Measures should be welldefined and valid:
that our measures should reflect the
attributes we claim they do
• But even when we have a welldefined
measure that maps a real world attribute to
a formal, relational system in appropriate
ways, we need to ask several questions
about data
6
Are they Correct?
7
Are they Correct?
8
Are they Correct?
• Correctness means that the data were collected
according to the exact rules of definition of the
metric
• For example, if a lines of code count is
supposed to include everything but comments,
then a check for correctness assures us that no
comments were counted
• A measure of process duration is correct if
time is measured from the beginning of one
specified activity and ends at the completion of
9
another specified activity
Are data Accurate?
10
Are Data Accurate?
11
Are they Accurate?
• Accuracy refers to the difference between
the data and the actual value
• For example, time measured using an
analog clock may be less accurate than time
measured using a digital clock
12
Are they Appropriately Precise?
13
Are they Appropriately Precise?
14
Are they Appropriately Precise?
• Precision deals with the number of decimal
places needed to express the data
• For instance, activity duration need not be
reported in hours, minutes, and seconds; hours
or days are usually sufficient
• Likewise, it is not necessary to calculate the
mean cyclomatic number to several decimal
places; since cyclomatic number expresses one
plus the number decisions in a module,
fractions of a decision to several decimals
15
places are meaningless
Are they Consistent?
16
Are they Consistent?
17
Are they Consistent?
• Data should be consistent from one
measuring device or person to another,
without large differences in value. Thus,
two evaluators should calculate the same or
similar functionpoint values from the same
requirements document
18
• Similarly, when the same data value is
computed repeatedly over time, the data
should be captured in the same way
• For example, we often measure the growth
in size of a product over time, especially
during maintenance
• We want to be sure that the size measure is
calculated the same way each time, so that
19
the resulting measures are comparable
Are they Associated with a Particular
Activity or Time Period
20
Are they Associated with a
Particular Activity or Time
Period
21
Are they Associated with a
Particular Activity or Time Period
• If so, then the data should be timestamped,
so that we know exactly when they were
collected
• This association of values allows us to track
trends and compare activities
22
Can they be Replicated?
23
Can they be Replicated?
24
Can they be Replicated?
• Data are often collected to support surveys,
case studies and experiments
• These investigations are frequently repeated
under different circumstances, and the
results compared
• At the very least, project histories and study
results are stored in a historical database, so
that baseline measures can be established
25
and organizational goals set