This sub-process identifies and specifies the population of interest, defines a sampling frame and, where necessary, the register from which it is derived , and determines the most appropriate sampling criteria and methodology which could include complete enumeration. Common sources are administrative and statistical registers, censuses and sample surveys. This sub-process describes how these sources can be combined if needed. Analysis of whether the frame covers the target population should be performed.

A sampling plan should be made: The actual sample is created sub-process 4. Surveys collect data on a sample of households with the intention of inferring about the total population from the observation of the sample. The difference will be partly due to the fact that we are not observing all households in the country but only some of them, and partly due to other reasons. Trying to reduce both kinds of errors respectively called sampling and non-sampling errors is thus an important concern of survey designers, critically linked to the precision of the product.

Sampling theory has earned a reputation of being difficult to understand and better left to experts, because it requires substantial background in mathematics and probability theory as a prerequisite. We provide here some highlights, and links to key resources, some simple, other more advanced.

Skip to main content. Search form.

- A comparison of periodic survey designs employing multi-stage sampling.
- Chemically Speaking: A Dictionary of Quotations.
- The Desperado Who Stole Baseball.
- Epistle of Ignatius to the Philadelphians!

Highlights The sample of households to be visited by the survey is often selected in two stages: first a certain number of area units sample points are chosen; then a group of households a cluster is chosen in each sample point. Both stages are random selections. Random sampling permits establishing sampling errors and confidence intervals around the survey estimations.

## Survey Methodology, 2nd Edition

Only random sampling can do this. Sampling errors depend very much on the size of the sample, and very little on the size of the population. As the sample size increases, sampling errors are reduced but non-sampling errors get bigger.

- Sampling of Populations: Methods and Applications - Paul S. Levy, Stanley Lemeshow - Google книги.
- How do I analyze survey data with a simple random sample design? | R FAQ?
- Highlights.

The sample is generally stratified by regions or by other criteria, in order to adequately represent subgroups of the population, The first-stage sample frame is developed from the most recent census. If the census is old, some updating may be needed. The second-stage sample frame is the list of all households in each selected sample point.

This field operation needs to be done before the survey, but ideally not much before. A sampling plan should be made: The actual sample is created sub-process 4. Surveys collect data on a sample of households with the intention of inferring about the total population from the observation of the sample. The difference will be partly due to the fact that we are not observing all households in the country but only some of them, and partly due to other reasons. Trying to reduce both kinds of errors respectively called sampling and non-sampling errors is thus an important concern of survey designers, critically linked to the precision of the product.

Sampling theory has earned a reputation of being difficult to understand and better left to experts, because it requires substantial background in mathematics and probability theory as a prerequisite. We provide here some highlights, and links to key resources, some simple, other more advanced. Skip to main content. Search form.

## Design frame and sample methodology

Highlights The sample of households to be visited by the survey is often selected in two stages: first a certain number of area units sample points are chosen; then a group of households a cluster is chosen in each sample point. Both stages are random selections. Random sampling permits establishing sampling errors and confidence intervals around the survey estimations.

Only random sampling can do this. Sampling errors depend very much on the size of the sample, and very little on the size of the population. As the sample size increases, sampling errors are reduced but non-sampling errors get bigger. The sample is generally stratified by regions or by other criteria, in order to adequately represent subgroups of the population, The first-stage sample frame is developed from the most recent census.

If the census is old, some updating may be needed. The second-stage sample frame is the list of all households in each selected sample point. This field operation needs to be done before the survey, but ideally not much before. The sampling errors of two-stage samples are affected by clustering — the tendency of neighboring households to provide similar answers to the questions asked.

## Survey Sampling/Survey Error | Harvard University Program on Survey Research

To reduce clustering, the size of the clusters should be small. Samples are generally selected with unequal probabilities and thus need to be analyzed with weights. Design effect is the combined result of stratification, clustering and weighting on sampling errors. Stratification is sometimes introduced after the sampling phase in a process called "poststratification". Although the method is susceptible to the pitfalls of post hoc approaches, it can provide several benefits in the right situation. Implementation usually follows a simple random sample. In addition to allowing for stratification on an ancillary variable, poststratification can be used to implement weighting, which can improve the precision of a sample's estimates.

Choice-based sampling is one of the stratified sampling strategies. In choice-based sampling, [7] the data are stratified on the target and a sample is taken from each stratum so that the rare target class will be more represented in the sample. The model is then built on this biased sample. The effects of the input variables on the target are often estimated with more precision with the choice-based sample even when a smaller overall sample size is taken, compared to a random sample.

The results usually must be adjusted to correct for the oversampling. In some cases the sample designer has access to an "auxiliary variable" or "size measure", believed to be correlated to the variable of interest, for each element in the population. These data can be used to improve accuracy in sample design. One option is to use the auxiliary variable as a basis for stratification, as discussed above. Another option is probability proportional to size 'PPS' sampling, in which the selection probability for each element is set to be proportional to its size measure, up to a maximum of 1. In a simple PPS design, these selection probabilities can then be used as the basis for Poisson sampling.

However, this has the drawback of variable sample size, and different portions of the population may still be over- or under-represented due to chance variation in selections.

### My Account

Systematic sampling theory can be used to create a probability proportionate to size sample. This is done by treating each count within the size variable as a single sampling unit. Samples are then identified by selecting at even intervals among these counts within the size variable. This method is sometimes called PPS-sequential or monetary unit sampling in the case of audits or forensic sampling. The PPS approach can improve accuracy for a given sample size by concentrating sample on large elements that have the greatest impact on population estimates.

PPS sampling is commonly used for surveys of businesses, where element size varies greatly and auxiliary information is often available—for instance, a survey attempting to measure the number of guest-nights spent in hotels might use each hotel's number of rooms as an auxiliary variable.

In some cases, an older measurement of the variable of interest can be used as an auxiliary variable when attempting to produce more current estimates. Sometimes it is more cost-effective to select respondents in groups 'clusters'. Sampling is often clustered by geography, or by time periods. Nearly all samples are in some sense 'clustered' in time — although this is rarely taken into account in the analysis.

For instance, if surveying households within a city, we might choose to select city blocks and then interview every household within the selected blocks.

- Design frame and sample methodology | IHSN?
- Questionnaire Design and Surveys Sampling!
- Hacking Gmail.
- Halophilic Microorganisms.

Clustering can reduce travel and administrative costs. In the example above, an interviewer can make a single trip to visit several households in one block, rather than having to drive to a different block for each household. It also means that one does not need a sampling frame listing all elements in the target population.

Instead, clusters can be chosen from a cluster-level frame, with an element-level frame created only for the selected clusters. In the example above, the sample only requires a block-level city map for initial selections, and then a household-level map of the selected blocks, rather than a household-level map of the whole city. Cluster sampling also known as clustered sampling generally increases the variability of sample estimates above that of simple random sampling, depending on how the clusters differ between one another as compared to the within-cluster variation.

For this reason, cluster sampling requires a larger sample than SRS to achieve the same level of accuracy — but cost savings from clustering might still make this a cheaper option. Cluster sampling is commonly implemented as multistage sampling. This is a complex form of cluster sampling in which two or more levels of units are embedded one in the other. The first stage consists of constructing the clusters that will be used to sample from.

In the second stage, a sample of primary units is randomly selected from each cluster rather than using all units contained in all selected clusters. In following stages, in each of those selected clusters, additional samples of units are selected, and so on. All ultimate units individuals, for instance selected at the last step of this procedure are then surveyed. This technique, thus, is essentially the process of taking random subsamples of preceding random samples. Multistage sampling can substantially reduce sampling costs, where the complete population list would need to be constructed before other sampling methods could be applied.