INSTITUTE OF POLICY AND STRATEGY FOR AGRICULTURE AND RURAL DEVELOPMENT
CENTER FOR AGRICULTURAL POLICY
CARD Project 030/06 VIE: Developing a strategy for enhancing the
competitiveness of rural small and medium enterprises in the agro-
food chain: the case of animal feed
Training Manual for CARD project 030/06 VIE
Donna Brennan and Sally Marsh
School of Agricultural and Resource Economics, University of Western Australia
July 2010
Feasible
Maximizing
p
rofit
2
Table of Contents
Table of Contents 2
1 Purpose of the training manual 4
2 Problem identification 6
2.1 Identification of key issues 6
2.1.1 Methodology/activities 6
2.1.2 Use of reviews and secondary data 9
2.2 Formulating researchable questions or hypotheses 9
2.3 Focusing data collection 10
2.4 References in this section 11
3 Survey design and sampling techniques 12
3.1 Introduction 12
3.2 Why is it so difficult to conduct a good survey? 12
3.2.1 Issues with translation 13
3.3 Steps in the process of doing a survey 13
3.3.1 Is a survey really needed? 13
3.3.2 Statement of information goals and uses 14
3.3.3 Collect background information 14
3.3.4 Focus groups 14
3.3.5 Select survey method (personal interview, phone, letter, web-based) 15
3.3.6 Determine sampling method and select sample 16
3.3.7 Draft questions 16
3.3.8 Pilot test the questionnaire 17
3.3.9 Redraft the survey 17
3.3.10 Train interviewers/enumerators 17
3.3.11 Collect the data 17
3.4 Sampling 18
3.4.1 Accuracy, bias and precision 18
3.4.2 Types of sample design 18
3.4.3 Sampling strategies 19
3.4.4 Proportional stratification by size 23
3.5 Question design 24
3.5.1 Designing good survey questions 24
3.5.2 Should you use open or closed questions? 24
3.5.3 If closed questions, which type of closed question format? 25
3.5.4 Using Likert Scales 26
3.6 References in this chapter 27
4 Data entry 28
4.1 Principles of database design 28
4.2 Designing tables from survey questionnaires 28
4.2.1 Example of table design in IFPRI feedmill database 29
4.3 Practicing using queries 29
4.3.1 Types of queries: 30
4.4 Designing a database for the CARD Livestock questionnaire 31
5 Data cleaning and analysis– techniques using Stata 33
5.1 Data cleaning 33
5.2 Creating Output Templates 33
5.3 Stata dofiles for feed use as an example 36
5.3.1 Objectives: 36
3
5.3.2
Exercises 37
6 Analysis of Survey Data 49
6.1 Treatment of variables in survey analysis 49
6.1.1 The number of variables 49
6.1.2 Levels of measurement 49
6.1.3 Method of analysis 50
6.1.4 Descriptive and inferential statistics 51
6.2 A Quick Overview of Descriptive Statistics 51
6.2.1 Measures of location 52
6.2.2 Measures of spread 52
6.2.3 Measures of shape 53
6.2.4 Techniques for displaying and examining distributions 53
6.3 Data management in Excel 55
6.3.1 Notation for basic functions in Excel 55
6.3.2 Using more complex functions in Excel - SUMIF 55
6.3.3 Using more complex functions in Excel - COUNTIF 57
6.3.4 Using more complex functions in Excel - TRANSPOSE 58
6.3.5 Pivot Tables in Excel 62
6.3.6 Using MACROs in Excel 66
6.4 References in this section 71
7 Assessing competitiveness – principles and exercises 72
7.1 Types of market structure 72
7.1.1 Perfect competition 72
7.1.2 Monopoly 72
7.1.3 Monopolistic competition 72
7.1.4 Oligopoly 73
7.2 Analyzing competitiveness 74
7.3 Product differentiation in the feedmill industry 74
7.4 Competitiveness in the livestock feed production sector 75
7.4.1 Evidence of returns to scale 76
7.4.2 Supply chain differences 77
7.4.3 Competitive strategies 77
7.5 Production economics for feed operations – least-cost feed rations 79
7.5.1 Some basic animal nutrition 79
7.5.2 The pig diet used in this training course 80
7.5.3 Linear programming 81
7.5.4 Mathematical specification of the linear programming problem 86
7.5.5 Least cost feed analysis using linear programming 87
7.6 References for this chapter 88
8 Reporting and communication 89
8.1 Writing the research report 89
8.1.1 Working in Outline 89
8.1.2 Labelling and cross referencing tables and figures 90
8.1.3 Tables and figures in a Research Report 91
8.1.4 Other conventions for Report writing in English 92
8.2 Some common errors in English writing 92
8.2.1 Language used in reports 92
8.2.2 Correct use of some English words in Reports 93
8.3 Writing policy briefs 94
8.3.1 Preparation of a Policy Brief 94
4
1. Purpose of the training manual
The purpose of this manual is to document theoretical issues, methodology and
analytical techniques that were used in the process of conducting CARD Project
030/06 VIE “Developing a strategy for enhancing the competitiveness of rural small
and medium enterprises in the agro-food chain: the case of animal feed”. Work for
this project was conducted from mid-2007 to early 2010. It is hoped that the
experiences gained from the project work and documented in this training manual will
be useful for future work undertaken by IPSARD/CAP.
The chapters include:
• 2. Problem identification. In this chapter, techniques to identify key issues,
formulate researchable hypothesis and focus data collection are discussed
using examples from the project.
• 3. Survey design and sampling techniques. This chapter focuses on aspects
of socio-economic surveying, including: reasons why surveys can be difficult
to conduct; steps in conducting a survey; sampling techniques used in surveys;
and question design.
• 4. Data entry. This chapter contains the material from a course on database
design presented by Donna Brennan in July 2008. It should be read in
conjunction with electronic course materials in the zip file “Course database
and access forms.zip”. The chapter includes sections on principles of database
design, designing tables from survey questionnaires and using queries in
Microsoft Access.
• 5. Data cleaning and analysis – techniques using Stata. This chapter
contains tips and techniques for data cleaning, building data output templates,
and data analysis. Training notes recorded by members of the CAP team
(Pham Thi Lien Phuong and Nguyen Thi Thinh), in the form of annotated
Stata do files, are provided in this section. Data needed for these analyses will
be in the CARD project database kept at CAP.
• 6. Analysis of survey data. This chapter includes a discussion of treatment of
variables in analysis of survey data and an overview of descriptive statistics.
Additionally, it includes material from a training course in data management
in Excel provided to team members when they visited Perth in August 2009.
The course covered special functions for managing and querying large data
tables, including conditional sums, transposing data, and extracting subsets
using pivot tables. The course also covered the basics of building macros.
Training notes recorded by members of the CAP team (Pham Thi Lien Phuong
and Nguyen Thi Thinh) are provided in this chapter.
• 7. Assessing competitiveness - principles and exercises. This chapter briefly
outlines types of market structure. Issues to consider when analyzing
competitiveness, and in particular, issues when assessing competitiveness of
firms producing a heterogeneous product are discussed. Aspects of
competitiveness investigated in the project are outlined, and material from a
training course on Least-Cost feed rations is included.
• 8. Reporting and communication. This final chapter focuses on providing
tips for producing a well-structured and well-written Research Report,
including techniques for handling large documents in Microsoft Word and a
5
discussion of common errors made in English writing. Finally we outline the
preparation of a Policy Brief.
The report was mainly written by Dr Donna Brennan and Sally Marsh, but also
contains contributions from Vietnamese CARD project team members, Pham Thi
Lien Phuong and Nguyen Thi Thinh in Chapters 5 and 6.
A number of electronic files are provided as part of and to be used in conjunction with
this report:
For Chapter 4: Course database and access forms.zip
For Chapter 6: macro_practice.xls
For Chapter 7: Cong Nhan May Mac.xls
Least cost feed ration exercise.xls
6
1 Problem identification
1.1 Identification of key issues
A key task at the beginning of a research project is to scope key issues and existing
information and data relevant to the planned research. There are a number of standard
ways in which this can be done, including:
• Literature reviews;
• Collection of secondary data;
• Identification of and engagement with key stakeholders e.g. interviews, field
visits, workshops designed to seek stakeholder/expert ideas and opinions;
• Consultations with known experts;
• Overseas study tours; and
• Participatory appraisals, a technique used for consultation with local people
often used in rural development projects.
(see /> )
In this project, methods used to identify key issues involved consultations with
stakeholders and known experts, a study tour to Thailand, collection of secondary data
and a literature review.
1.1.1 Methodology/activities
Early engagement with stakeholders and experts
Early in the project, time was spent identifying key stakeholders and experts (e.g.
feedmills, staff of MARD, Vietnam Animal Feed Association) and discussing the
planned project with them. For example: a meeting in 2007 with Mr Le Ba Lich,
Chairman of the Vietnam Animal Feed Association (VAFA), elicited the following
information, issues and opinions (questions asked are in italics, with a summary of the
reply in normal text).
• What is the benefit for a feedmill to join VAFA? They get technical support,
recipes (Lich and other scientists involved in formulation) for all feeds for pigs
and chickens, training. Some companies come when prices change to get
advice on how to change feeds (ability to change recipes depends on storage,
inventory, knowledge of market prices).
• What are the characteristics of small feedmill enterprises? Generally
producing <3,000 T/yr (there are 145 businesses with < 5000 T/yr – 10% of
total production), often don’t have an office or own equipment (rent), sell
animal feed concentrates (premix), sell directly to farmers, located in rural
areas.
• Are small mills inefficient? Small mills still have their market share – sell to
very small land holders (who are interested in low prices), smallholder animal
production is 90% of production – small mill production is 10% of total.
• Why are small mills going bankrupt? They don’t have sufficient capital to
sustain/invest in their business, material costs are increasing.
7
• Why does the GoV want to encourage them to continue? GoV has
slogans/policy to support SMEs, but in his opinion the GoV should only
support medium enterprises. Support might include land, capital, interest rates.
• What is the cutoff between medium and large enterprises? Discussion about
this, Mr Lich considered >10-20,000T/yr to be large.
• What is a low quality feed? Protein content too low, inaccurate labeling, high
mycotoxins, feed stored in areas with high contamination risk.
• How many feedmills employ a nutritionist? Large mills yes, some medium
mills, others get recipes from others.
• Does VAFA provide specific or generic recipes? Specific – depending on what
raw materials are available.
• Regulation: This is a difficult area.
o No laboratory in the livestock dept and no experience. If they sample
and send to another laboratory (in the north or south) it is costly and
the Dept of Livestock doesn’t have budget for this.
o MARD has funds but they are insufficient.
o Corruption is an issue.
o MARD not authorised to take food in the market as this is linked to the
Ministry of Trade (Dept of Marketing and Management).
• Can the VAFA guarantee feed quality for small mills? No.
• What is a small holder farm? Uses traditional methods and has <100 chickens
and <5 pigs.
• Do smallholder farms have a seasonal demand for feed? After summer they
buy a pig and raise for Tet. Small and medium mills have a cycle of increased
production after August and up until Tet (main pig raising season in the north).
• Do any medium mills have breeding operations? Yes – Dong Nai, CP, Dabaco
• Are there any independent breeding companies in Vietnam? One poultry
research centre, some others I think.
• Do the large feed companies have a monopoly over animal breeding in
Vietnam? No.
An initial workshop involving stakeholders and experts was held to provide an
overview of the planned project and discuss issues in the sector and capture feedback.
Field visits
Another early activity to scope issues was visits to a number of feedmills and
producers. Examples of the data that were recorded from these scoping visits are
shown in Table 1.1. It is good research practice to record and summarise issues and
opinions from field visits for later discussion by the research team.
Literature review
A review of the literature is usually essential, to see what is already known about the
subject area, and what field research has already been done. Often, work done in
Vietnam will be found in technical reports for MARD and donor projects, but other
sources may be theses (both locally and internationally), web-based publications, and
8
Table 1.1 Examples of issues and opinions obtained from CARD project field visits
Field visit Activities and issues identified
Field trip to DABACO
feed processing
company (large
domestic feedmill)
• Diversity of operations – multiple feed products, associated
livestock operations (contract farming)
• Investment in technology and human development
• Management structure – SOE to equitised company
• Storage capacity
• Buying and importing strategies
• Quality control capability – use of laboratory
• Batch size and mill operations generally (throughput capacity
(tonnes/hr, tonnes/day – the smaller the batch size the more the
energy/unit cost), repairs and maintenance scheduling
(cleaning of equipment, safety), some feeds harder to produce
(chicken feed and small pigs which need smaller dye)
• Do price and quality equate? Yes, but not perfectly as price
can include services.
• Pricing arrangements within and outside contracts
Visited small domestic
feedmill in Gia Lam
• Established 2002, 25-27 employees, produce 100T
concentrate/mth, rent land for the mill (not really a mill – just a
mixing facility)
• Biggest issues – capital, land, increasing production costs, cost
of credit from VBARD (1.03% mth) – mortgages private
assets
• Customers – agents at provincial level, markets to the
mountainous areas as a priority as this is a good market for
concentrates
• How does he compete? He has difficulties – especially in
import procurement, also bigger companies give agents a
bigger bonus. Only competition from large companies is an
issue – other SMEs not a problem.
• Marketing policy? His strategy is to have good quality by
buying good raw materials, and to focus on mountain areas.
• Quality control? Done in two stages: checks maize quality
when he buys, expert from provincial dept level checks the
product. Every 3 months he sends his product for testing to
National Husbandry Institute. Fishmeal and soybean he tests
more often (110,000 VND for one protein test). No laboratory
– 100% of small mills don’t have a laboratory. Dept of Ag at
provincial level comes in once per year to check the output –
he has to pay for testing (100,000 to 200,000 VND/yr). Fined
once when content didn’t match label (then changed his
components).
• Recipes? He has one nutrition expert – also the German
company he buys the premix from helps with recipe
formulation, also VAFA.
• Avian flu reduced sales by 30-40%.
9
scientific journals. In the case of the CARD project, we were interested in the lessons
from international experience, and one of the components of the project was an
international literature review which was conducted by Dr Johanna Pluske (Pluske,
2007). This review provided a desktop overview of the feed industry from a global
perspective generally and with specific focus on three countries: Vietnam, China and
Thailand. These countries were selected for review to identify similarities and lessons
that may be useful in understanding the feed sector in Vietnam.
Collection of secondary data
Basic information about the nature of the industry, including recent trends in
production, and differences in characteristics of production in different parts of the
country, should be assembled. Aside from statistics reported by others in the technical
reports mentioned above, there is a lot of detailed information at the regional and
province level available from the GSO.
1.1.2 Use of reviews and secondary data
Information from collection of secondary data forms the basis of the background
chapter presented in the livestock feedmill survey report (Phuong et al. 2010). The
secondary data demonstrated the rapid rate of growth in livestock feed production
since 2000, and highlighted the role of domestic and imported ingredients in feedmill
production.
Specific input into the planned research from the secondary data collection included:
• An examination of the spatial pattern of production showed that the Red River
Delta and South East region (and to a lesser extent the Mekong Delta) were the
most important livestock feed production areas, and that is why we chose to
conduct the survey in those regions.
• The evidence on price trends for feed inputs and feedmill outputs highlighted the
problem of rapidly rising feed input prices which have been encountered by the
feedmill industry in recent years, and helped us to form some basic survey
questions about the setting/revision of feedmill output prices.
Information from the literature review, workshop, interviews and field visits were
used to help develop possible research questions through team discussions and
meetings. A team meeting at CAP in 2007 identified a range of research questions
that could be asked and further secondary data that would be needed to help answer
these questions. How these possible research questions were then further considered is
discussed further in Section 2.2.
1.2 Formulating researchable questions or hypotheses
It is unlikely that all relevant research questions can be answered by any individual
research project. Any project is limited by resources and time available to conduct the
research. Some information that might be needed to answer a question may be
unavailable or particularly difficult to obtain. It is important to carefully consider
possible research questions arising from initial observations/data to see if it is possible
to answer the question with the planned research.
The formulation of research questions/hypotheses is an application of scientific
method, i.e.
10
• Collection of facts by observation or experimentation,
• Formulation of a research question or hypothesis to explain facts in terms of
cause and effect relationships,
• Deductions from a question or hypothesis that can be tested, and
• Verification of deductions by new observation or experimentation.
The scientific method attempts to systematize the process of generating scientific
knowledge. However, it is a general approach or a general way of thinking, not a
specific recipe for any given research project. The key to success in research is in
being able to ask an important question in such a way that the question can be
answered. There are an infinite number of important questions to ask, and for many of
them there are no practical methods of providing answers. Likewise, there are an
infinite number of questions with reasonable methods of providing the answers, but
the questions themselves are unimportant. Useful research questions must aim to have
answers which are important, and have hypothesis that can be tested and confirmed or
refuted.
The project team discussed a wide range of possible research questions arising from
the scoping studies and then focused these into a much smaller number of research
questions that were considered to be important, and could be investigated by the
planned research. These were:
• Are economies of scale evident in the livestock feed sector in Vietnam?
• How different is production and trading between large feed mills and SMEs in
terms of material input use, storage, product types, quality control, types of
customers and services offered to customers?
• Are the raw material procurement and output distribution channels used by
SMEs and larger feed mills different?
• How do domestic SMEs compete in the sector against larger foreign-owned
mills?
• Is there any evidence of prices for raw material imports being higher than
domestic prices for raw material inputs?
• Is there an opportunity for Vietnamese SMEs to compete in niche markets?
(e.g. smaller mills targeting more remote areas)?
• What are the constraints facing SMEs operating in the livestock feed sector in
Vietnam?
1.3 Focusing data collection
Agricultural economists often use information from agricultural scientists when
seeking to understand production issues, and in focusing data collection in agricultural
surveys. In the case of the feed industry, we can use information from scientists about
animal nutrition, and the quality and composition of different feed inputs, to focus our
questions. We can also use the measures adopted by scientists and by the industry to
assess the technical efficiency of production. The most commonly used indicator of
technical efficiency used by scientists is the Feed Conversion Ratio. This is a measure
11
of the quantity (kg) of liveweight produced per kg of feed fed. A higher feed
conversion ratio means that more feed is required to produce a unit of output, thus
indicating a less efficient system. In our analysis of animal producers, we collected
data on feed input use and liveweight production, so that we could calculate and
compare the feed conversion ratio achieved on different farms.
There are two main types of ingredients used in producing animal feed: energy and
protein. Energy rich ingredients include maize, rice and cassava. Protein rich
ingredients include soybean cake and fishmeal. Animal feeds normally must meet
certain criteria such as energy content (calories per kg) and protein content (%). With
information on the energy and protein content of feed ingredients, and the nutritional
requirements of certain animal feeds, we can examine the effect of feed input prices
on cost of production. Basically we ask the question: What is the least cost
combination of feed ingredients that can be used in making animal feed, given that we
know what nutrient composition the feed must have?
By setting up a model to assess this question (see chapter 7), we can also examine the
impact of policies affecting feed ingredient prices (such as import taxes, price
seasonality and access to stored maize). We can also forecast the likely demand for
feed ingredients as feedmill demand grows, and as relative prices of feed ingredients
change.
1.4 References in this section
Pluske, J. 2007. A Desktop Review of the Animal Feed Sector at a Global Scale.
Report for CARD Project 030/06 VIE, Center for Agricultural Policy, Hanoi.
Phuong, P.T.L., Thinh, N.T., Brennan, D., Marsh, S. and Nguyen, B.H. 2010. Small-
Medium Enterprises in the Livestock Feed Sector in Vietnam Vol I: Livestock
feed production, Report for CARD Project 030/06 VIE, Center for
Agricultural Policy, Hanoi.
12
2 Survey design and sampling techniques
2.1 Introduction
Surveys are used to ask a consistent set of questions to a sample of people, so that
responses can be recorded and analysed. They are the standard tool for professionals
who are interested in people’s and firm’s activities, attitudes, beliefs, intentions and
preferences. As a tool, surveys can be very difficult to design and implement. Surveys
are conducted for two main reasons:
• to get otherwise unavailable information, and
• to allow researchers to generalise about a large population by studying only a
small proportion of the population.
Good policy analysis is critically dependent on good quality data. There are many
factors that influence the quality of survey data. It will be dependent on the quality
of:
• the survey design;
• the implementation of the survey, and
• the treatment of the raw data.
The main problems associated with surveys are people-related, not statistical, and they
include issues such as the ambiguity of communication by language, the attitudes of
respondents to their participation in the survey, and the limits to human memory. In
this chapter common problems encountered in the design and conduct of surveys
leading to poor validity of results are outlined, following Pannell and Pannell (1999).
2.2 Why is it so difficult to conduct a good survey?
For a variety of reasons getting accurate information from surveys can be very
difficult. Foddy (1993) outlines a set of reasons why this is so.
• Even simple factual questions are often answered incorrectly. This is
especially the case if people are being asked about activities that happened in
the past.
• The relationship between what people say they do and what they actually do is
sometimes poor.
• People’s attitudes, beliefs, opinions, habits and interests often seem to be very
unstable. The instability may be due to actual instability of attitudes, but it
may reflect other things, such as the way the question is asked.
• Small changes in wording can sometimes produce changes in responses.
• Respondents frequently misinterpret questions. This can easily be seen if
respondents are asked to repeat questions in their own words. Often this will
show that people have misunderstood what is being asked.
• Answers to earlier questions can affect answers to later questions.
• Changing the order in which response options are presented sometimes affects
respondent’s answers. If people are asked to read the options for themselves,
they tend to go for the first option. This is called a “primacy” effect. If the
options are presented verbally, they tend to go for the last one: a “recency”
effect.
13
• Answers are sometimes affected by the question format. This is most easily
seen when comparing answers from “open” and “closed” format questions.
For example, if people are asked an open question about their information
sources, they are less likely to nominate sources than if a closed question is
asked with a list of possible information sources that can be ticked if used.
• Cultural or ethnic differences can affect not only the interpretation of a
question, but also people’s willingness to give accurate answers. For example,
in a culture where governments and/or businesses are perceived as being
corrupt or exploitive, responses to questions from outsiders are likely to be
affected by the risk that responses may be obtained and abused by government
officials or others.
The first three dot points above are unavoidable to some extent. The other factors all
have implications for the design and conduct of a survey. For all of these reasons it is
essential to invest a lot of care and effort into developing, testing, improving and re-
testing your survey before you actually conduct it.
2.2.1 Issues with translation
There is an additional problem when developing surveys within multi-lingual teams.
If the survey is developed in English and then translated into Vietnamese (or vice
versa) extra care must be taken to ensure that the translation is correct. It is very easy
for small translation errors to make a big difference to the data collected. For
example, despite a great deal of care and checking, at least one small translation error
occurred in the CARD project feedmill survey. The English version of the survey
asked about storage and one of the options was “silo”. The Vietnamese translation in
the survey for “silo” was a word that meant “underground bunker” – an old meaning
of the word “silo” within a military context. The more usual English dictionary
definition of silo is “a tower-like structure for storing grain”.
This error was not noticed until after the survey had been completed, when it became
apparent that very few firms said that they had “silos” for storage, despite the obvious
fact that silos were often clearly visible. Care should be taken when using
Vietnamese-English dictionaries and translation software which are often not correct
for modern English use. If the Vietnamese collaborators are uncertain about the
meaning of English words it is better to ask the English-speaking collaborators. The
translation of the survey needs to pass tests of “common-sense”. If a question seems
silly or not relevant, then it could be that the translation is inaccurate.
2.3 Steps in the process of doing a survey
2.3.1 Is a survey really needed?
It is important to ask if a survey to collect the information is really needed. It is
possible that the information may be already available from other sources such as:
• a previous survey (much information is collected routinely and regularly but
not used);
• published data; and
• reliable interpersonal feedback from contact with farmers and growers.
14
2.3.2 Statement of information goals and uses
If the information is not available from other sources, the next step is to write a
statement of information goals and uses. That is, what information do you want to
know and what will you do with this information when you have collected it? The
goal for the feedmill survey was articulated as wanting to answer a series of research
questions, as below:
• Are economies of scale evident in the livestock feed sector in Vietnam?
• How different is production and trading between large feed mills and SMEs in
terms of material input use, storage, product types, quality control, types of
customers and services offered to customers?
• Are the raw material procurement and output distribution channels used by
SMEs and larger feed mills different?
• How do domestic SMEs compete in the sector against larger foreign-owned
mills?
• Is there any evidence of prices for raw material imports being higher than
domestic prices for raw material inputs?
• Is there an opportunity for Vietnamese SMEs to compete in niche markets?
(e.g. smaller mills targeting more remote areas)?
• What are the constraints facing SMEs operating in the livestock feed sector in
Vietnam?
2.3.3
Collect background information
The next step is to collect background information to familiarise yourself with the
issues you have decided to conduct the survey on so that you have an understanding
grounded in reality and a “feel” for the issues. This can involve reading, talking to
relevant people and experts, and running focus groups. This step is often called
“scoping” the issues, and the procedures carried out for the CARD project are
outlined in Chapter 2. More information on running focus groups is given in the next
section.
2.3.4 Focus groups
A focus group is a small group of people (say six to eight) drawn from the population
you will survey. You ask these people open-ended questions about the issue you are
interested in and record their responses. Focus groups are good for: i) helping ensure
that you ask about aspects of the issue which are most important to the relevant
population, ii) helping to word survey questions using language which is appropriate
to the likely survey respondents, and iii) alerting you to issues and problems which
you weren’t aware of.
The procedure for focus groups is to:
• Select a sub-sample of your population (ideally a minimum of three different,
but similar, groups of eight to ten people). You should think about the
15
important characteristics of your target group and make sure they are
represented in this small sample. Sometimes, due to a lack of time or
resources, researchers use a convenient sample for the focus group; for
instance, a group of farmers that they already have links with, but this reduces
the representativeness of the data collected. Sometimes it is advisable to hold
separate focus groups for different types of people that you might want to
include in the same survey, especially if their responses to questions are likely
to be affected by the presence of the other type of people.
• Create a prompt list of questions. You will have already formed a range of
ideas which you think need to be included in the survey from your review of
the literature and talking to stakeholders and experts. Write down the key
issues to use as discussion starters with the focus group. Use the prompt list as
a check list to make sure that these issues are covered in the discussion. The
order of the issues in the discussion in not important. Be prepared to give
attention to new issues and ideas which are not part of your prompt list.
• Facilitate the group during the discussion. Your role is to get a discussion
going in the general topic area and then observe and record the discussion.
Occasionally you will prompt the discussion by asking an open-ended
question to address the issues on your list. Some points to remember when
facilitating are:
o Give a brief introduction about the purpose of the discussion and then
invite people to speak about the topic.
o Use prompt questions to keep the discussion on the topic – but allow some
digressions.
o Use probing questions to encourage detail and ask for elaboration and
clarification : they should be offered in a conversational style… “So what
do you think of…. ? ; So can you tell me more about that?”
o Listen to the language being used to describe the issues and adapt your
own to it.
o Remain neutral: be interested but do not show surprise, anger,
embarrassment at any of the comments.
o Raise issues in an open-ended manner - e.g. “How do you feel
about……?” rather than “Are you satisfied with …?”
o Beware of body language: avoid sounding like the answers you are getting
are correct, instead look interested and encourage people to keep talking.
o Allow silences as signals that you’d like them to keep talking.
• Tape observations and/or write them down.
• Analyse the tape/written record to provide major issues for questions and
suitable wording
2.3.5 Select survey method (personal interview, phone, letter, web-
based)
Survey data can be collected in a number of ways: by post, web-based, phone or in a
face-to-face interview. There are a number of factors to consider when determining
which is the most suitable method. These include:
• Cost. Phone, web-based or letter are cheaper; face-to-face is the most
expensive and time consuming. There are a number of websites which offer
free (or relatively inexpensive) use of web-based survey tools. One such site
16
is: www.surveymonkey.com
(“an online survey tool that enables people of all
experience levels to create their own on-line survey quickly and easily”).
• Size and location of your sample. If your sample is large it takes significant
resources to conduct face-to-face surveys. Location of respondents in remote
areas also creates difficulties for face-to-face surveys.
• Response rate. It is common to have a response rate of 30% or less in postal
and web-based surveys. Response rates for phone or personal interviews are
higher: around 70%. Research indicates that response rates in post and web-
based surveys are similar, but of course web-based surveys assume that
respondents have access to a computer and are computer literate. Response
rates to mail and web-based surveys can be improved by following a
standardised procedure developed by Dillman (2000). For mail surveys this
“tailored design” approach includes four contacts: a preliminary postcard, a
hard copy
survey with cover letter explaining the purpose of the study,
a
follow-up/reminder postcard, and a replacement hard copy survey
with cover
letter to non-respondents. Response rates for mail and web-based surveys can
be increased by offering incentives to respondents, e.g. to go into a draw for a
prize if they respond.
• Complexity of information being collected. If the survey requires complex
information or large amounts of information, a personal interview may be the
only feasible method. In all surveys the length should be as short as possible,
but in postal and web-based surveys brevity is especially important so as not to
reduce the response rate.
• Time available. Telephone and web-based surveys are favored for their speed
compared to mail surveys and face-to-face interviews.
• Literacy levels. Low levels of literacy and low literacy competency can be
serious issues for mail and web-based surveys.
• Validity, or the risk of introducing bias into the survey results. In face-to-face
and telephone interviews, the interviewer is a threat to validity. Inappropriate
non-verbal behaviour, failure to clarify vague replies, failure to use the
question wording, failure to accurately record the respondent’s reply are all
common problems. Biased samples through low response rates are the most
worrisome aspect of postal and web-based surveys. Even if a relatively high
rate of say 60% was obtained there is still the question of what the distribution
of replies would have looked like if everyone had responded. Those who do
not respond may well be self-selecting on the basis of a particular
characteristic, e.g. education level. If education is also likely to be associated
with the issue you are investigating in the survey then you’ve got a biased
sample.
2.3.6 Determine sampling method and select sample
The aim is to get a sample which is as representative and unbiased as possible. This is
addressed in more detail in Section 3.4.
2.3.7 Draft questions
This are also many issues associated with designing survey questions and these are
addressed in detail in Section 3.5.
17
2.3.8 Pilot test the questionnaire
One important type of pilot testing is to trial the draft survey with a small number of
people/firms from the target population. This is useful for uncovering aspects of
questions that will cause interviewers and respondents to have difficulty. Two
interviewers should conduct each pilot interview. One should conduct the interview,
the other should record impressions. When pilot-testing consider the following
questions:
• Were any of the questions difficult for the respondent to answer?
• Did any of the questions seem to make the respondent uncomfortable?
• Did you have to repeat any of the questions?
• Did the respondent misinterpret any of the questions?
Mail, phone and web-based surveys should also be pilot-tested. In this case the test
respondent should complete the survey, and then be asked about what they thought
about the structure and content of the survey, alone the lines of the questions above.
Another type of pilot-testing which can be valuable is to attempt to analyse a set of
fictional results to your survey. Often people don’t adequately consider which
statistical method or what type of summaries they are going to use until after the data
have been collected, by which time is it too late to realise that you didn’t ask for the
right information to do the planned analysis. Attempting to do an analysis with
artificial data (e.g. made up out of our head) will reduce this problem substantially. In
the CARD project the exercise on least cost feed rations (see Chapter 7) was
conducted to help clarify what data we would need from the survey.
2.3.9 Redraft the survey
Reconstruct the questions based on your experience with the pilot interviews and pilot
analysis.
2.3.10 Train interviewers/enumerators
It is important that the interviewers are familiar with the survey topic and the
questionnaire. In the training, emphasise the things they should not do because it
would reduce the validity of the results. These things include using inconsistent
wording to ask the questions, and providing guidance or reinforcement for particular
types of responses.
2.3.11 Collect the data
If care has been taken with the survey design this should now be relatively straight
forward. In Chapters 4, 5 and 6 of this Manual we discuss aspects of data entry, data
cleaning and analysis.
18
2.4 Sampling
Most samples taken from a population of data are designed to reduce the cost of
collecting data. There are generally, two purposes of either obtaining an estimate of a
population parameter such as a mean value or of testing a statistical hypothesis such
as: “large-scale feedmills have lower costs of production than small-scale feedmills”.
Estimation of population parameters is the most common purpose.
Note that a sample can only give results in terms of probability statements. Suppose
your population of 3,000 workers in a factory were surveyed and you found 1,500
smoked. This would be a population parameter. Suppose now, you sample 2,998 and
you found 1,499 smoked, the proportion is still 50 per cent but now you have an
estimate or statistic rather than a parameter.
Also, note that drawing a sample implies that a random method has been applied to
choose the sample such that each member of the population has a known chance of
being selected. Sampling theory does not apply if this is not the case.
2.4.1 Accuracy, bias and precision
The accuracy of a sample estimate refers to its closeness to the population value.
Consider a small population of 4 numbers, 15, 17, 18, 22 with a mean of 18. We
could select a sample of 2, say 15 and 18, to give a mean of 16.5. With a larger set of
numbers we could select many samples so that we have many mean values. These
mean values will have a distribution of which we can take a mean which is referred to
as an estimator rather than an estimate. If the expected value of the estimator is equal
to the population parameter then it is an unbiased estimator. Otherwise it is a biased
estimator. The bias is the difference between the expected value of the estimator and
the population value. Note that bias depends on the method of sampling as well as the
method of estimation.
It is important also, to recognise that any one sample may give an inaccurate estimate
even if the estimator is unbiased. An estimator which is biased can also produce an
accurate estimate for an individual sample.
Next, it is important to know what the sampling fluctuations might be on average.
This can be obtained through a measure of the spread of the sampling distribution.
The standard deviation or variance is the means to measure this property. The
standard deviation of the means obtained from numerous samples is the standard error
of the mean. This provides an estimate of the probable accuracy or precision of any
one estimate.
2.4.2 Types of sample design
The aim of sampling for a survey is to get a sample which is as representative and
unbiased as possible. Firstly, the sampling frame should be determined – that is, what
is the “working” population from which the sample will be drawn? The sample size
will be dependent on a lot of factors (particularly the resources available to conduct
the survey), but generally the larger the sample size the better. However, a large
sample size is not a guarantee of the accuracy of the results since it will not eliminate
bias in the selection of a sample. Thus, size of sample alone is not enough. Seek the
19
advice of a statistician to help determine the sample size (in relation to the “working
population”) that will give an acceptable standard error.
The sample selection procedures/strategies should then be determined. Sampling can
be random or “purposeful”.
• Random sampling avoids systematic bias in the sample and can be simple,
stratified, or cluster random sampling.
• “Purposeful” sampling increases the utility of information from small samples,
by deliberately selecting a sample from a specific group of respondents.
Simple random sampling is used when we believe that the population is homogenous,
and a random set of individuals selected from it will represent the population
responses.
Stratified random sampling is used when there are distinct subgroups of the
population that we are interested in and believe that these subgroups may respond
differently to our survey. In the survey for the CARD project we wanted to make sure
that we selected a set of respondents that represented the range of size categories that
we were interested in. We wanted to analyse the impact of size class on many of our
responses in the survey.
There are two ways of going about the selection of the population for a stratified
sample, once the strata have been decided upon. One is called proportionate
allocation, and organizes the sample so that the share of surveys in each strata is in the
same proportion as the share of these groups in the overall population. For example, if
the total population had 20% large firms we would have 20% large firms in our
sample.
The other method of sampling is to emphasize the collection of data from a group
where we think the group might have a higher variance. For example, if we thought
that the large firms might have greater variance in their responses we would select
more than 20% of the survey from this group. However, in order to work out how
many to select we would need to have a reasonable estimate of the standard deviation
of each population sub-group. If we have no idea as to whether variance would be
different for different sub-groups (as in our case) we use the proportionate method
(see Section 3.4.4 for the detail of the proportionate method used in the CARD
project).
A formal classification and description of sampling strategies is given in the following
section.
2.4.3 Sampling strategies
Random sampling.
Here each of the N units has a calculable (non-zero) probability of being selected.
Unrestricted random sampling means that each possible sample of n units from the
population of N has an equal chance of being selected. Also unrestricted random
sampling is usually conducted “with replacement”, ie. a unit drawn is returned to the
population with the possibility of being drawn again. If there is no replacement it is
usually referred to as simple random sampling. For example, simple random
20
sampling can be done by numbering the N members of the population and then using
uniform random numbers in the range 0-N to chose n.
Randomness is vital if the parameter estimates are to be unbiased. Random is not
haphazard selection: it needs to be independent of human judgment. Use a random
number generator (e.g. RAND( ) function in Excel) or lottery method (e.g. drawn
from a hat).
Systematic Sampling.
Divide the population by the sampling fraction, say 5000/100 = 50. Randomly select
the number between 1 and 50 and then take every 50
th
number, e.g. 10, 60, 110
etc. In this case only a total of 50 samples can be chosen, not an infinite number.
Note that the list should have a random arrangement to get the precision of a simple
random sample.
Stratification.
Other than increasing the size of a sample, its precision can be increased by
stratification. Before the sample is selected information is used to divide the
population into a number of strata - then a random sample is selected from each
stratum. If the sampling fraction is the same for each stratum then there will be
greater precision than a simple random sample because the different strata will be
properly represented (e.g. sexes, age, regions, town, etc.): i.e. the Standard Error will
be reduced. It is not necessary that the sampling fraction be constant across the strata
– there can be a proportionate stratified sample (uniform sampling fraction) or a
disproportionate stratified sample (variable sampling fraction).
Proportionate Random Sampling. In this case, information on the population is
known. For example, stratify a student population by type of degree and then sample
according to the proportion in each degree so the sample has the same proportions.
The reason it works is that the variation between the strata does not enter into the
Standard Deviation because it is reflected exactly in the sample. There is no sampling
of the strata, only sampling within the strata. The greater the variation accounted for
by the strata the greater will be the gain from stratification. Thus the strata should as
much as possible be distinct from each other, and within strata should be
homogenous. Select stratification factors related to the subject of the survey. The aim
should be to stratify using a classification related to the key variables or attributes in
the survey. However you must know the population distribution of the classifying
variables for every member of the population.
Disproportionate Stratified Sampling.
Disproportionate stratified sampling allows for the possibility of using variable
sampling fractions. This is useful where the populations in some strata are more
variable than others. Where a stratum is more variable it is better to have a larger
proportion representing it to gain greater precision. It can be shown in this case that
the optimum precision is obtained for a given cost if the sampling fractions in the
different strata are made proportional to the standard deviation in those strata and
inversely proportional to the square root of the costs per unit in the strata.
21
Normally the standard deviation and costs/unit are not known. However a pilot
survey, previous surveys or expert judgment might be used, and some judgment may
be necessary in choosing the sampling fractions.
Cluster and Multi-stage Sampling.
A population can be thought of as made up of a hierarchy of sampling units of
different sizes and types. It is possible to randomly select a sample of say student
classes then one may include all the students in that particular set of classes or
randomly choose individuals from the class. To work, each student should only be a
member of one class.
Select randomly B and use B to form a cluster then select the students from the
cluster. When sampling is done so that all of a cluster is used it is known as cluster
sampling. When the cluster is randomly sampled it is multi-stage sampling. For
example: Select randomly some suburban areas each with 50 houses and interview all
of the groups of 50 houses. Or, select randomly some suburbs then select the houses
from the suburbs randomly. The advantage of this method is the lower cost of travel
and collection of the information.
Sampling with Varying Probabilities.
Previously we have assumed the clusters were of near equal size. If they are not, then
complications arise since choosing a large unit or primary sampling unit first, changes
the probability of the selection of a particular individual. Compare a cluster unit size
of 2000 or 20 where the second stage sampling fraction is 1/10 yields 200 people in
the first case and 2 in the second. This can be allocated by stratifying the primary
sampling units or the clusters and selecting a sample of them in each size group,
probably with a varying sample fraction. Another approach is to select the primary
sampling unit with a probability proportional to size. This gives greater precision
than would a simple random sample of primary sampling units.
Area Sampling.
This approach is most useful where there are inadequate population lists. Basically
the area to be covered is divided into a number of smaller areas and a random sample
of these is drawn. Within the area either there is complete sampling or a further sub-
sample is taken. The approach is basically multi-stage sampling in which maps are
used as the sampling frame. When naturally defined areas on a map are of different
size then the probabilities of sampling in proportion to size might be used. However,
an appropriate measure of size is needed.
Multi-phase Sampling.
In this case some limited information is collected from the whole sample and
additional information is collected from sub-samples. With only one sub-sample it is
know as two-phase sampling.
This is an efficient way of getting information some of which may be time consuming
and expensive to obtain. Also there may be areas of questions where less precision is
required. Also, the data in the first phase can be used to select the sample by
stratification in the second phase. The use of two-phase sampling is only effective if
the cost of data collection for the first-phase is much lower than the second phase by
about a factor of 10.
22
Replicated Sampling.
With complex sample designs such as multi-stage sampling the calculation of
standard errors is difficult. The paired direction design yields a simple formula. This
involves the selection of two units per stratum in single-stage sampling or two
primary sampling units per stratum in multi-stage sampling.
Another flexible approach is through replicated or interpenetrating sampling. In this
case a number of sub-samples rather than one full sample are selected from the
population. All the sub-samples have exactly the same design and each is a self
contained and adequate sample of the population. Each of the sub-samples has to be
independent and with the same sample design.
The sample estimates can be calculated for each of the sub-samples, and the variation
between these estimates provides a means of assessing the precision of the overall
estimate. The advantages of the replication sampling are:
• Easy generation of preliminary results using one of the sub-samples.
• Can obtain an estimate of some of the non-sampling errors such as variation
between interviews.
The number of replications must be chosen. Some have used between 4 and 10.
However, the larger the number the more limited the possible stratification of each
sub-sample.
Quota Sampling.
Quota sampling is different from probability sampling. Quota sampling is a method
of stratified sampling in which the selection within strata is non-random. It is the
non-random error that constitutes its greatest weakness causing great debate about the
value of quota sampling. Statisticians think it is theoretically weak, while market
researchers defend its cheapness.
Various stratification schemes might be used, e.g. rural/urban, sex, age, etc. In quota
sampling the interviewers are given the numbers to select from rural or urban areas,
the number of males and females, the numbers in age-groups, etc. The strata chosen
should be important in determining the variation in the variables of interest.
Arguments against:
• Not possible to calculate appropriate standard errors. It is sometimes argued,
however, that these are small problems compared to other biases.
• Interviewers may select in a biased way - the easy people or firms to interview.
• Control of interviewers in placing respondents into the right groups is difficult.
Arguments for:
• Less costly.
• Administratively easy.
• Independent of the existence of sampling frames and may be the only method if
there are no suitable sampling frames.
Panel and Longitudinal Studies.
This is collecting data from the same sample on more than one occasion. There are
special problems in maintaining the representatives of the sample. Samples such as
23
this allow trends to be studied. Also, it is possible to study the nature of the change
and the people who have changed and possibly why they changed, as well as the
causes of the change.
A panel nearly always has greater precision than a set of random samples through
time. Also, they can be used to measure the impact of experiments such as
advertising. The design is known as the before-after design without control group.
The problem with panel studies is the recruitment of willing respondents, sample
mortality or loss and conditioning of responses. Panel number replacement systems
have been worked out.
Master Samples.
If repeated samples are to be taken of the same area or population then the preparation
of a master sample from which sub-samples can be taken is often efficient. They
simplify and speed up the selection process. Often select the primary sampling units
once, such as regions or provinces, etc. and then sample from these. A master sample
needs to be reasonably stable.
2.4.4 Proportional stratification by size
In the CARD project we were interested in the impact of size on competitiveness, and
therefore it was important to draw a sample that represented the range of feedmill
scales in operation. We classified large as being more than 80,000 tonnes per year,
and medium as 20000 to 80000. Because of the range of sizes within the small
category, and the dominance of very small firms in our total population, we selected
three different size categories for small-scale firms to ensure that we selected the
representative range. Our sampling frame was a list provided by the Department of
Livestock Production which contained the name, address, and production capacity of
241 feedmills operating in Vietnam in 2006. The total population of feedmills in the
six provinces in which we were working (Ha Noi, Ha Tay, Binh Duong, Dong Nai,
Long An, Tien Giang) was 107 firms. Around half these firms were of a scale less
than 5000 tonnes per year. Using the proportionate sampling approach, around half
the firms in the sample should be from this smallest category. Similarly, around 10%
of firms in the population were in the next size group, so our sample was selected so
that 10% of the sample was from this size group, and so on (Table 3.1).
Table 3.1 Sampling strategy to represent scale of operation
Scale (tonnes/yr) Population Our Sample
< 5000 Small 53 37
5000 to 10000 Small 10 8
10000 to 20000 Small 5 3
20000 to 80000 Medium 16 12
> 80000 Large 23 10
Total 107 70
24
Once we determined the desired number of firms in each province, using the same
proportionate sampling approach, we used a spreadsheet function to randomly choose
which firms were selected. This was done by assigning a random number to each firm
(using the rand()) function in Excel to generate a number for each firm, then cutting
and pasting the values into a new column (to avoid the fact that the rand() function
continually produces new random numbers). We then sorted the firms according to
the random number assigned to them, from highest to lowest. We then used this sorted
set of firms as our priority list for sampling, going down the list until we had enough
firms to meet our required sample size. A reserve list of firms was drawn up in the
same way in case we needed to replace firms in the sample that did not wish to
participate, or that were no longer operating in the sector.
2.5 Question design
The aim is to design valid and reliable questions. Reliability means that a person
would give consistent answers to your survey questions in different times, places and
contexts. Validity means that the survey questions actually measure what they set out
to measure. Reliability and validity can be affected by language, context and question
type.
2.5.1 Designing good survey questions
The aim is to have questions which are as clear and straight forward as possible. Some
tips for this include:
• Use simple language.
• Keep concepts simple, or be prepared to explain a concept in simple language.
• Make sure the task (i.e. answering the questionnaire) is manageable.
• Questions should relate to issues which are common knowledge within the
target population.
• Be aware of wording effects - even small changes in wording can shift
answers (e.g. not allow vs forbid).
• Question order - the general rule is to move from more general to more
specific questions.
Foddy (1993) and Pannell and Pannell (1999) give a more comprehensive discussion
about designing good survey questions.
2.5.2 Should you use open or closed questions?
An example of an “open-ended” (qualitative) question is:
“What assistance should the government of Vietnam provide for small-medium
domestic livestock feed enterprises to help their competitiveness in the animal feed
sector?
A “closed question” that explores a part of this same issue might ask:
“The Government of Vietnam should provide subsidised credit to small-medium
domestic livestock feed enterprises. Do you:
strongly agree 1 2 3 4 5 strongly disagree”
25
There has always been controversy over the value of collection of qualitative
information from open ended questions. Researchers disagree over whether this kind
of information is useful. It has been argued that the open type of question fails to
control what the respondent is supposed to be answering; that respondents wander
from the topic and that answers from different respondents cannot be meaningfully
compared.
On the other hand closed questions are said to impose a framework on respondents
which may not be relevant to the respondent, and that fixed response options force the
respondent to adopt the researcher’s frame of reference even when it is not
meaningful to them.
In general:
• The open-ended approach is good for exploratory work where you are trying
to discover the range of ideas, feelings and reactions. You might then use this
data to create a structured, quantifiable set of questions.
• Sometimes, you only have time to collect qualitative data, (e.g. through a
group discussion) and it might be all that is needed for the decision making
you have to do on that issue.
• The two types of information can be used as a validity check on each other.
• It is possible to roughly classify open ended question into themes where the
sample is not too large and then do a rough quantification of them.
• You can use specific quotes from open questions to complement your closed
data in reports.
2.5.3
If closed questions, which type of closed question format?
There are various types of scales and formats that can be used in closed questions and
advantages and disadvantages associated with each type. The main types are:
• Agree/disagree or yes/no. This is the simplest kind of question. It tends to be
lower in validity than those with a scale of responses because it forces an
extreme or cut-and-dried response when in fact most of us are not clearly
polarised on many issues. You should normally avoid agree/disagree or yes/no
questions. If used, a “not sure/don’t know” option should be provided.
• Standard scales. In this approach, an example is given for each level in the
scale. This can be useful for frequency of behaviours where each point is
numerically defined. For example:
Normally, I eat pork (please tick one choice only):
every day
several times a week
about once a week
once or twice a month
several times a year
never
However, often is it difficult to develop scales with behavioural or attitudinal
statements which accurately represent something.