Data finishing


In the middle and late 1990s, in order to reveal some implied data nature, trends and models, many merchants have begun to explore traditional statistics and artificial intelligence analysis technologies to Feasibility issues for large databases, these discussions ultimately develop formal data finishing tools based on statistical analysis technology.

Data Corporation mainly refers to processing processing of raw data, so that it is systematically, organizes to meet the needs of statistical analysis, and the data is displayed in the form of a chart to simplify data. Easy to understand and analyze.


(1) induction method: a histogram, grouping method, a layer method and statistical analysis.

(2) Deductive method: Apply to be analyzed, scattered, and related regression analysis.

(3) Prevention Method: Generally introduction to the control graph method, including a PN control diagram, a P control diagram, a C control diagram, a U-control diagram, a control diagram, an X-RS control diagram.


1.. According to the design of the research purposes.

The finishing scheme mainly includes two aspects: First, the overall treatment method is mainly to consider how to conduct statistical packets; the other is to determine related indicators that reflect the overall characteristics.

2. Audit and inspection of statistics.

The data must be examined before the data must be reviewed, verify the integrity, accuracy, and timeliness of the original data. Discover problems, to solve it in time.

3. Data packets and summary, and calculate various indicators.

Packet raw data in accordance with certain standards, summarize the number of units of each group, and calculates indicators such as mean, variance.

4. Through a statistical table or chart, the final results are displayed.

On the basis of the statistical group, calculate the frequency of each group, organize the frequency distribution table, and draw the frequency distribution map.

5. Accumulation, storage and announcement of statistics.

Accumulation and storage of statistics. Dynamic analysis is often performed during statistical research, which requires long-term accumulation of statistics.


After the statistical investigation phase, a large number of statistics are collected, but the statistics obtained are mainly original materials that reflect the overall unit characteristics. These materials are It is sporadic, dispersed, and unstoppable, can only indicate the specific conditions of each survey unit, reflecting the surface phenomenon of things, can not explain the whole picture of the study, cannot explain the essential characteristics of things, and cannot reveal things Development Law. Therefore, these survey data must be processed and organized to reflect the overall characteristics of the phenomenon.

Data Corporation is based on the tasks and requirements of statistical research, and the large number of original materials collected by the statistical survey, group, summarize, and make it a physical and systematic, and degrade it to reflect the overall comprehensive characteristics. The working process of statistical data. Also, re-processing that has been organized (including historical data) is also statistical. Typically, after a large amount of data is collected, it cannot be used directly, because the difference between these data can still be embodied as an original disorder, only after organizing, we can find the regularity of the phenomenon.

Data finishing is the intermediate link of statistical work, which is carried out on the basis of statistical surveys, but also the continuation of statistical surveys, and is the premise of statistical surveys, in front of the statistics. The important role is very important in the statistical work. The good and bad results of statistical finishing, whether scientific, truly reflect the objective practice, will directly affect the accuracy of statistical analysis, affect the quality of the entire statistical work. If this is a good job, it will make the survey, and complete data is lost, so that the purpose of statistical work and the task of completing statistical work.

In addition, data finishing is a necessary means of accumulating historical data. Dynamic analysis is often used in statistical research, which requires long-term accumulated historical information. According to statistical research needs, it is necessary to select, reorganize, classify, and summarize the existing information, and it needs to be completed by statistical finishing work.


1. On-site collection data, the data collected daily, weekly, weekly and product management departments, to make sorting real and representative data.

2. Data finishing, before the improvement, the conditions after the conditions will be consistent, so the data finishing and comparison is meaningful.

3. Abnormal occurrence should take measures to be based on the data after the finishing.

4. Use the secondary data published by others to note:

(1) The purpose of the original collection data and the source of data?

(2 The unit of original use is consistent with the researchers, if it should be adjusted to be uniform?

(3) The original collection is collected, how is the reliability? If it is certain, it is certain, not When reliable, you should seek reasons and solve it.

(4) What is the original collection method? Is there any duplicate or omission?

(5) According to the data of two or more different original sources, it should be found before use. Significant content, seek error reasons to place.

Data Consolidation Technology

From a business point of view, the previously unknown statistical analysis model or trend discovery provides a very valuable insight for companies. Data consolidation technology can have certain predictability for future development. Data finishing technology can be divided into 3 categories: cluster, classification, and prediction.

Clustering technology is in a disorderly manner. An example of a cluster is an analysis of group business customers with unknown features, and enters related information to this example that can define the characteristics of customers.

Classification technology is to specify Object to determine a collection. The collection usually forms the above technology, and an example is to divide the customer into a specific sales group according to their income level.

Prediction technology is a known value for certain specific objects and directories, and applies these values ​​to another similar collection to determine the desired value or result. For example, a group of people wearing helmets and shoulder is a football team, then we also think that another group of people with helmets and shoulder is also a football team.

Related Articles