Quality analysis for data in route optimization
In the first quarter of 2009, Itella Oyj, the company in charge of postal services within Finland, initiated a project to optimize their delivery routes in both Early Morning Delivery and Daily Mail Delivery. The main goal of this project was to make more efficient the delivery processes due to changing trends in demands of conventional methods of information and mail communication. Adapting to these changing trends meant more than just the maintenance of financial growth but also conformation to higher standards for achieving a greener environment. Previously, route measurement had being done by regional planners who used conventional means of route measurement. This meant physically travelling to the various regions and measuring and calculating distances according to set parameters. This project intended to streamline this process by adaptation of new systems and integration with already existing ones which we shall look at later.
1.2 Purpose of the thesis
However, as it is with many new company projects in their initial stages; challenges are inevitable all sorts of problems are likely to be encountered. One of the major issues faced in the project is ensuring that the data used in the process of route optimization is in tip top shape. Problems with quality in the data have resulted in nearly sabotaging obstacles to the optimization process. These problems have resulted in delays in schedules and thus increase in costs for the company. There have also being high unbudgeted costs due to corrections of faults.
The purpose of this thesis is to examine and analyze current quality for data used in route optimization, and possibly formulate quality standards from the analysis done. The research questions are therefore broadly divided into four as listed below;
- What is data quality?
- What is the importance of certain quality levels for the company?
- What are the methods used to describe and analyze data quality and data quality process?
- What is the current level of quality and have the resources invested being worthwhile?
The above questions give a guideline of the issues we shall look more deeply into in this paper.
2 Literature Review
This chapter captures the definition of quality and more specifically data quality and its importance to a company's processes and benefits of good quality as well as issues that require paramount concern with regard to maintaining good Quality. The line between the data quality and process quality as we shall see is pretty thin and therefore these two elements shall be referred to many times in this document.
2.1.2 Defining Quality
To be able to understand data quality, quality in itself has to be defined. Quality has, for the past few decades, been considered the cornerstone for excellence and competitive edge for a majority of companies that have gained a stronghold in their areas of operations. Just like beauty, quality is in the eyes of the beholder and in a business environment, the beholder is always the client or end-user, In other words, quality is whatever the customer says it is. Many scholars have come up with different definitions of quality and sometime you have to narrow down to the nature, degree and rationale you are considering in your definition of quality.
According to Ivancevich et al. (2003) - Quality is the function of policy, information, engineering and design, materials, equipment, people, and field support. Quality means getting it right first time, rather than merely laying down acceptable level of quality (Philip Crosby, 1995). Quality is the degree that something will conform to the requirements. This needs to be defined firstly in terms of parameters or characteristics, which vary within processes. For example, for a mechanical or electronic product these parameters would be performance, reliability, safety and appearance. Quality is being creative, innovative, fluid and forthright. (Drucker, Peter (1985). Innovation and entrepreneurship. Harper & Row)
2.1.3 What then is data quality??
Data Quality(acronym DQ) is a process entity that is multidimensional. The multidimensional aspect is due to the complexity of this entity and the difficulty to singularly define it. Leo Pipino and Co., in article on Data Quality Assessment, defined data quality as having more than 10 dimensions. However, other experts have analyzed these dimensions and have narrowed them to following; accuracy, consistency, completeness, timeliness and auditability. (Andrew Greenyer, vice president, international marketing for Pitney's group also notes that in addition to these aspects, an organization should also make sure that everyone has a common understanding of what the data represents.
(Andrew Greenyer on November 26, 2007) http://www.customerthink.com/article/importance_quality_control_how_good_data
Below is an excerpt from Excution MIH that aims at defining these 5 dimensions.
- Accuracy of data is the degree to which data correctly reflects the real world object OR an event being described example an address of customer in a customer database is the real address.
- Completeness of data is the extent to which the expected attributes of data are provided. For example, customer data is considered as complete if all customer addresses, contact details and other information are available
- Consistency of Data means that data across the enterprise should be in synch with each other. For example, if a customer changes their address but they are still linked to both the old and new addresses.
- Data Timeliness This is is an aspect that is reflected in how deadlines and schedules are met within a process. In addition, this is also the availability of data when it is needed.
- Data Auditability is its ability to be examined and analyzed to determine its level of accuracy and possible discrepancies or inconsistencies
With this in mind we can therefore conclusively state that data quality is a continuously adaptive state of being in a “no-error” zone as a result of continuous engagement in functions that aim at achieving efficiency and accuracy in results as well as processes.
2.2 Types of quality
Jean, from International food safety and Quality Network, defines quality in two ways. It can be either subjective or objective. He states that objective quality is the degree to which a process or the outcome of a process sticks within a predetermined set of criteria, which are presumed essential to the ultimate value it provides. On the other hand, he continues to describe the other side of quality which is subjective. This kind of quality is the level of perceived value reported by the end-user who benefits from a process or its outcome. Example: pain relief provided by a medication. In both cases he links quality to the ultimate end-product from a process.
It is difficult to separate this two especially when thinking about route optimization since both go hand in hand. In an article of data quality assessment, Leo L. Pipino and company discuss about three important steps a company should take to improve process and data quality as a whole. These are:
- Performing subjective and objective data quality assessments;
- Comparing the results of the assessments, identifying discrepancies, and determining root causes of discrepancies; and
- Determining and taking necessary actions for improvement
The FIGURE below gives a clearer picture of the issue discussed above
2.3 Quality Management
When a company incorporates quality in its processes, its overall objective is to satisfy the parties involved at low costs while maintaining process efficiency. Quality is an ever evolving perception determined by the value provided by the end result of a process. In other words, quality is an adaptive process in its own capacity that is receptive to changes within a process as it matures and other alternatives emerge as a basis for comparison. Eventually, the basis for assessing how a company's process incorporates quality is evaluated by the end-result in terms of cost savings; resources used and increased value to the company from the process.
Quality in a process is not what the investor puts in but what the end-user gets out and what the customer is willing to pay for in the end result. Quality means best for the following conditions (a) the actual use and (b) the selling price (Feigenbaum, 1983). Therefore if the company's processes ignore quality the eventuality is low customer satisfaction which leads them to reducing their investments or spending interests for the company. Consequently this leads to reduced incomes and as result diminished mark up. This simply means that since quality within a company's processes has an effect its financial value through both costs and incomes, it is the backbone of being a niche company in the area of your operation.
Quality Management means that the organization's culture is defined by and supports the constant attainment of customer satisfaction through an integrated system of tools, techniques, and training (Sashkin & Kiser, 1993).This definition further emphasizes the need for the organization's culture to fully support quality at all times in its operation by making it an integral part of the company, the center nut that glues the company's activities. Most importantly and especially with reference to this research, this entails continuous improvements to the processes, functions and systems.
At a bare minimum no shoddy work should be part of the company, from the top management to the bottom. It should be engraved in the company's culture and code of conduct that quality is part and parcel of the company's operations.
It is almost impossible to separate the processes and functions from human factors. Management as well as employees derive satisfaction and from good results. When quality has being properly integrated into a company's culture, the results generate emotions and feelings within the parties who have being involved in the process.A result that brings smiles to management, employees and most importantly, the client, defines having achieved good quality. You'll know it, they'll know it, and the company will prosper from it.This is testament to the fact that employees exude a lot of satisfaction when they discover that, not only is management proud of their work but also the customer.
2.4 Benefits of good quality
Since we now basically have an understanding of what quality is, why is it then so important to a company to maintain high quality levels for both data and processes? There are certain benefits that are associated with good data quality whict accrue to the company as well as the end user who happens to be the consumer.
Good quality is a result of reduction of process and data defects. This is because there is Total Quality Management that promotes quality awareness and participation of all members of the organization. It means quality at the source, which translates to reduced wastage in the company's processes thus translating to cost saving.
Good quality data leads to ease of problem solving. Through processes such as failure analysis and measurement standards developed during quality analysis procedures, defects and failures (even potential failures) can be identified with ease, which means that a problem is solved quickly translating to saved man hours. These man hours can then be released to venture into other tasks. For example if a problem is encountered within a company's process, it would be easily solved due to the parameters in place that would help identify the cause of failure and have it addressed.
Good quality also makes it easy to give direction for continuous improvement of processes. It also aids in the improvement of systems and increasing employee efficiency. This will be through ensuring that the employees are continuously trained on the importance of embracing quality in their work and always proffering quality services to the customer (end-user). As for the systems, by virtue of being subjected to change and conformance to potentially demanding processes, it becomes easier to identify key areas needing adjustment or improvement..
Good data quality leads to quality results, which in turn translate to customer satisfaction. Customer satisfaction is a key foundation block in not only maintaining profitability but also increasing market share. In addition, a company with satisfied clients is always at an advantage of maintaining its competitive edge in its area of operation.
Finally, by reducing data defects and improving systems and personnel efficiency, good quality leads to cost savings and profitability improvement which is the bottom-line for each and every company. With reduced cost of running processes, its anticipated that the revenue of the company will be bolstered. Consequently, it will enable the company to invest much of its profits in increasing the market share by conducting research and development into better ways of improving process and data quality.
2.5 Analyzing quality
This far we have looked at what quality is and why it is of importance to a company. The next important question would be how do we determine the level of quality of a subject or object in a company. We saw earlier that quality can be analyzed through subjective or objective assessments. According to Neville Turbit, quality within company projects can be analyzed from either a business perspective or a technical perspective. These are criteria depending on the type of project at hand. Some scholars are also two discuss about two additional ways in a project to analyze quality and that depends a lot on the analyst and how much attention he wishes to give to either. One may analyze end-process or result quality or the project process quality. In route optimization, the two factors go hand in hand and as we shall see later, technical factors have a significant effect on Itella's business. In addition, it will also be relevant not to separate process quality from end-process quality since the deliverables are proportionally linked to each other.
Neville goes ahead to list some questions that may arise as we seek to analyze quality within a project. These include:
- Was the project completed on time?
- Was the project completed within budget?
- Did the system meet my needs when it was delivered?
- Does the system comply with corporate standards for such things as user interface, documentation, naming standards etc.?
- Is the technology and system stable and maintainable?
- Is the system well engineered so that it is robust and maintainable?
An analysis for data quality in route optimization had not being done before this research. It therefore called for careful thought into the methods I was going to use to analyze various data with the aim of giving viable results. The nature and format of data that was to be analyzed was more or less standard. By this I mean that there was not much variation in data formats and fields regardless of the fact that there were multiple information systems in use. Forming associations and picking out discrepancies within the data was done through a process called data mining.
3 Research Strategy
3.1 Data mining
Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets which may be in quantitative, textual or multimedia form, Jeffrey W. SeifertData Mining:An Overview. Jiawei Han and Micheline Kamber give a more understandable or layman's definition of what data mining is. Simply stated, it refers to extracting or “mining” knowledge from large amounts of data Data mining concepts and techniques, pg 5
Data miners have over the years used a wide array of parameters to study data. These include
- Association: patterns where one field in data is connected to another field
- Sequence or path analysis: patterns where one event leads to another event
- Classification: identification of new patterns, such as relationships between different fields in the same data
- Clustering: finding and visually documenting groups of previously unknown facts, such as
- Geographic location and brand preferences
- Forecasting: discovering patterns from which one can make reasonable predictions regarding future activities
Ref: data mining: an overview
However, in addition to getting results from data mining processes, it was vital to have an analysis tool that would enable us to have a clear picture of the quality standard of the analyzed data. To ensure that the analysis of data quality in this research is not only effective but also efficient, data mining has to go hand in hand with a tool that sets quality standards. Over the years many a companies have used total quality management tools which have further being developed into a more vigorous analysis tool known as six sigma. Companies such as motorolla and General motors have proven track records of six sigma's success having saved billions of dollars since over a few years. Itella is a company that is striving to achieve efficiency, reduce process wastage while maintain profitability in a market facing aggressive competition from technological advancements especially in the telecommunications industry. With this in mind, I deemed it fit to use six sigma as an appropriate quality tool for this project. So what then is six sigma?
3.2 Six sigma
In his book, Mcgraw Hill defines six sigma as a highly technical method, used by engineers and statisticians, to fine-tune products and processes in an aim to position a company for greater customer satisfaction, profitability, and competitiveness. From previous training on six sigma methods, I would say that six sigma is not a single entity but rather a collection of various process and quality analysis tools guided under the six sigma methods. Quality analysis tools include flowcharts, check sheets, Pareto diagrams, cause and
effect diagrams, histograms, scatter diagrams, and control charts.
Thomas Pyzdek, the Author ofThe Six Sigma Handbook states that for six sigma to make sense, the term quality has to be viewed from two perspectives; potential quality and actual quality. Potential quality is the maximum result achievable from a process while actual quality is the current result achieved from the same process. The gap between these two perspectives is what we term as bad quality / failures or defects. Essentially, the main goal of six sigma is to reduce variations within processes as much as possible.
The table below shows the perceived levels of sigma in relation to the number of faults in those levels in a sample of one million instances or opportunities.
The less the number of faults within a process, the more the efficient the process is. Many companies have being misled to believe that since the variation between 4th to 6th sigma seems to be very small(within 1%), it is relatively ok if their processes are at 4th sigma. However, as we shall see later on in this paper, the variation in costs for processes falling within various sigma levels is quite significant. Not included in the above table are the other levels of sigma, since sigma is calculated to the 100th point at each level. This makes it more accurate when running bigger analysis.
Expenses incurred in correction and elimination of errors, in order to achieve a certain sigma level, are known as costs of poor or bad quality. Six sigma should be well understood and adapted more or less as a company lifestyle for it to achieve its purpose. With reference to previous research, Thomas goes ahead to describe how various levels of sigma affect a company. For companies without six-sigma processes, they incur ridiculously high costs due to bad quality. Companies operating at three or four sigma spend between 25 to 40 percent of their budget revenues fixing problems while those operating at Six Sigma spend less than 5 percent of their revenues fixing problems within processes. The cost of poor quality as compared to six sigma is illustrated in the FIGURE 4 below.
4 Understanding Route Optimization
4.1 Early Morning Delivery (EMD) Systems
In EMD, there are different information systems that carry out various functions relating to processes within the department. The main ones are Jakti and Lehtinet. Earlier, when routes were measured manually, the two above were the main and only information systems in use. Additional systems were taken into use with the inception of the Route optimization project. These are Webmap and Routesmart.
We shall take a brief look at these systems and their functions as we try to get an understanding of the basics of this research.
JakTi is an SQL database that holds workspaces with address and route information. There are 2 versions of Jakti, namely “A” and “B”. JakTi B is the download manager for workspaces from the main database to a work station, which is a desktop computer or laptop. Jakti A, is the editor tool for the workspace already downloaded onto the workstation. In Jakti A one can create new routes, delete existing ones, add and delete addresses or move them to various routes and also add additional data to the routes.
The additional data mentioned above that are added in Jakti A are mostly route parameters used in calculation of route delivery times. These include apartment buildings' floor and elevator information, exceptional yard distances and delivery mode. Exceptional yard distances are distances to delivery points which are located within private yards. This is the distance between the point where a deliverer will park their car to the point of actual delivery. There are 3 main delivery modes used by Itella in EMD, namely; delivery by company cars (right handed cars), delivery by private car (left handed cars) and delivery by bike.
Lehtinet is an information system that imports address and route information from Jakti and matches them to newspaper subscriptions from Newspaper Publishers. However the matching is not always at 100% and some errors occur. The 3 common errors in lehtinet are mentioned below.
220.127.116.11 Route number errors
These are mostly in areas with new addresses or in areas which have had no EMD before. Since matching data is imported from Jakti, if there is some missing data in Jakti, then there will not be a match in Lehtinet. However if the publishers have correct route information, the subscription will be allocated to the correct route.
18.104.22.168 Address errors
Newspaper Publishers could have different address databases than Itella has which results in a conflict when matching the addresses. These errors are mostly misplaced characters within the address or wrongly spelt addresses. And just as in the above case, subscriptions will be allocated to the correct routes, but matching information will be wrong.
Wikipedia defines Webmap as a standard protocol for serving georeferenced map images over the Internet that are generated by a map server using data from a GIS database. It continues to define A GIS as a system that captures, stores, analyzes, manages and presents data with reference to geographic location data.
Simply put, Webmap is a tool used to edit visualized address data by a process called geocoding.
With reference to our case, Webmap imports address data from Jakti and presents it as visual data on a map interface as shown by the blue dots in FIGURE 5.
A user places the visualized addresses, guided by features on the map interface. This process is known as geocoding. Webmap uses already existing workspaces from the Jakti database. Once a point on webmap has being geocoded, it receives co-ordinates under the KKJ-coordinate system which is used in Finland. After all the addresses have being geocoded, the workspace is returned to the main database, where this co-ordinate information is then stored in Jakti.
RouteSmart is a tool that puts together data from JakTi, Webmap and Lehtinet, and uses variable parameters and functions to calculate routes as defined by a user. There is also one more set of crucial data needed to calculate routes that hasn't being mentioned under the information systems listed above. These are the distances between delivery points which are represented by a detailed set of street networks. Routesmart is also the tool used to visualize and edit street segments and networks.
Webmap is just a map interface with non-editable layers that have outlines of street networks which guide a user in geocoding. However these are just the main streets as would be seen on any map interface. Along with these streets on webmap, are also vector lines that are supposed to give a more accurate geographical position of the streets. Due to the parameters agreed upon within Itella and the variation in modes of delivery, the street networks actually used are much more detailed and classified. They are categorized as listed below:
- CAR: visualized as red lines and these are streets usable by cars
- WALK-CAR (WC): visualized as blue lines and are connected to car street segments. These are not usable by cars but on foot from the point to where the.
- WALK-WALK(WW):visualized as green lines
- STAIRS: visualized as orange lines
4.1.5 Integration of EMD systems
The relationship between JakTi, Lehtinet, Webmap and routesmart is illustrated in the picture below. Information that is relayed between theses systems is numbered with letters which are listed below.
These systems discussed above hold what would be the foundation data for eventual route planning. As a summary, the data mentioned above is listed:
- Address and route information
- Apartment building information
- Elevator information
- Yard exception distances
- Newspaper subscriptions
- Co-ordinate information
- Street networks
All the above data except for street networks are directly linked to delivery points.
An address is attached to what we call a delivery point, which is a term also used to refer to visualized addresses on either Webmap or RouteSmart. Addresses in the same delivery location for example an apartment building with several apartments, are put under the same group.
In reference to delivery points, I shall use two terms to differentiate between grouped addresses and single addresses, service location and service point. A service location is the individual delivery point in a group while a service point is the whole group.
However, data in these systems is not compatible and for RouteSmart to be able to handle it, it needs to be put together in a uniform format understandable by the system. This is done by means of an ETL download.
4.2 ETL download.
ETL stands for extract, transform, and load. By running an ETL download, relevant data that we have looked at earlier is extracted from JakTi, lehtinet and webmap, transformed into a format recognizable by Routesmart then downloaded onto Routesmart for manipulation. Information on an ETL file relates mainly to delivery points. There are several column fields on an ETL download which represent various parameters drawn from relevant systems.
4.2.1 ETL Download Fields
Part of the analysis done in this research was derived from analysis of the ETL file. It is therefore important that the major fields or columns that have relevance to the analysis be explained in a little more detail.
22.214.171.124 Jakti ID
This column, column A below, has a number that identifies a delivery point in JakTi. This number should be unique to every delivery point. It is automatically generated and follows the common “upper-bit”/”lower-bit” binary numbers.
In the early days when EMD started using Jakti, this system was already being used by daily mail which uses the same data base of addresses. To avoid large amounts of work, many of the addresses were copied from daily mail to EMD workspaces. What this meant is that many of the delivery points shared the same attributes. It is therefore possible that one address, although in different workspaces could be sharing the japi ID.
126.96.36.199 Jakti Object Id and Jakti element ID
A Jakti object is a subgroup is Jakti while an element is a larger group that holds the smaller subgroups. The Ids in this case refer to the unique numbers each object and element have. A subgroup for example would be an apartment building with many delivery points. These delivery points are grouped together under a Jakti Object.
188.8.131.52 Jakti internal sequence
The simplest way to define Jakti internal sequence would be to give an example of what it is. Every delivery object begins with jakti internal sequence nuber 1. This sequence runs sequentially until the next object. In other words this is a numbering system for delivery points in the same group.
184.108.40.206 Extra Distance
There are certain parameters used in Itella that are used to calculate various elements in route planning. Extra distance is one of the results from those parameters. Extra distance is any additional distance within an apartment building, covered by walking during delivery.
220.127.116.11 Route number and route Sequence
Route number is basically a number that is unique to planned areas of delivery. Route sequence in an ETL file is a sequential numbering of the delivery points according to their delivery order.
18.104.22.168 X and Y co-ordinates
X and Y co-ordinate information is data that comes from webmap. Once a delivery point has being made in Jakti and has being transferred to webmap, the point is geocoded after which it takes the co-ordinate acquires the co-ordinates for the location that it has being placed on, in a webmap's map interface. This information is stored in Jakti's main database and during and after an ETL download, it is shown in the columns below.
The co-ordinate system used here is KKJ, which is what has being used across finland since the late 1960s. KKJ is a 2-dimensional co-ordinate system and so the height of different areas is not depicted in an ETL file.
22.214.171.124 Floor and elevator
The information found in these two columns relates to apartment buildings that normally have more than one storey. On the column marked floors, the number of floors in a storey apartment building are marked numerically. Only the first delivery point in a group of addresses in an apartment building will be marked with the number of floors in that apartment. The elevator column is only marked by “1”s and “0”s. 1 represents a building that has an elevator while 0 represents a bulding without an elevator.
5. Analysis Elements
At the beginning of the research, it was agreed that I would focus on 4 main areas that had already undergone at least one optimization phase. These areas or regions were Jyväskylä, Vaasa, Nurmijärvi and Salo. The fact that these regions also had a diverse scope of delivery types and data also facilitated the decision to focus on them. Having undergone one optimization process already meant it would be easier to make comparisons on variations from data initially used to the current data. In between these phases of optimization there had being large resource investments to correct quality issues as well as prepare data for the next phase of optimization. Therefore through the analysis we would be able to conclusively state whether the resources used had returned value through improved data quality.
The amount of data to be handled in these four regions was quite large. I therefore decided to use a standard sample from all regions that would reflect a near realistic picture of the situation especially for analysis procedures done using routesmart. I used urban boundaries which were already prebuilt within the system. These are the same urban boundaries used by geographers and statisticians in Finland. Finland's Statistics center defines an urban settlement as an area with residential buildings separated by at most 200 meters apart and with at least 200 inhabitants. Anything else falling outside this is considered as a rural area. In selection of data in urban areas, I increased the boundary by 10 metres to accommodate for service locations whose buildings might be physically within the boundary but whose delivery point is a few meters off the boundary.
All in all, there were three data and process analysis elements that this research dwelt on. In addition, there was a financial analysis done from the results derived which was more or less an icing to the cake.
- Delivery point /service location data in relation to street networks
- Proximity of service locations to streets in relation to prescribed guidelines
- Service locations connected to the wrong streets
- Additional information
- Geocoded / webmap data vs. jakti data.
5.1 Service locations vs. street networks
There are set guidelines that have being given in handling optimization data. This was a good starting point to analyze quality since it was possible to document variations in the data in comparison to the prescribed guidelines. Route optimization guidelines state that in urban areas, all service locations must be within proximity of 10 meters from connectable street networks for optimum results during route calculation. As we saw earlier, there are four categories of street networks. However; service locations can only be connected to two of these; CAR and WALK-CAR.
From all the four regions, proximity of service points (delivery groups) from the streets was calculated at intervals of 5 meters, ranging from 10 meters to 60 meters. This analysis was done on routesmart using selection tools. Considering that all the regions had undergone at least one optimization, the analysis was done for both the data used in the first optimization and the current data. If street networks are properly digitized, the ideal result should show that more service points fell within the required proximity.
Below is a sample of the table used to collect the results:
The idea was to get the percentage number of service points in relation to the total number in that region's urban area, falling under different proximities.
The results were then translated onto a graph which also showed the different sigma levels.
5.1.1 Street Networks analysis results
All regions generally fell under 3rd level sigma when considering the required proximity of 10 meters. There was however significant improvements in Vaasa since more service points in the new data fell within the requirements as compared to the old data. There was not so much change in Salo while there was a slight decrease in the level of analyzed quality in Jyväskylä. Unfortunately in Nurmijärvi, there was the greatest negative variation in comparison of old and new data. This however should not dim out the fact that out of all the four regions, Nurmijärvi had the highest level of sigma. It is important to note that the R-squared values given for this analysis were relatively high for the sampled data thus the results from this analysis can be effectively used to predict similar terms in other analysis.
5.2 Additional information
Additional information in this case refers only to elevator and floor information. The analysis for this element was quite straight forward. The task was to compare data from the regions to data already input into the Jakti. The importance of having all additional information correct is because of the effect that missing data could have on the results from the optimization process. There are separate parameters used to calculate distances and travel times within apartment buildings, and these depend on the availability of floor and elevator information.
Comparing the data was the first step. If any discrepancies where found, I tabled them as follows
The floor and elevator information in different regions varies a lot so having this reported as for example In Jyväskylä 5 buildings with 4 floors withut elevators, was not a very viable comparison for all the regions. I needed to standardize the discrepancies using the parameters used to calculate apartment building distances and travel times. This allowed me to represent these distance discrepancies as percentages of the total expected distances in that region. The table below shows calculated total distances that should be in the information systems. These totals are calculated from the data that comes from the region vaasa
After this, we calculate the total distance represented by the discrepancies found between the ETL file and the regional data as shown below
5.2.1 Additional information analysis results
The above steps were carried out for all the regions and below are the compiled results that were gotten from the analysis.
Results showed that Vaasa had the least amount of missing data in their databases followed by Salo and Vaasa. From the analysis, Nurmijärvi had the worst result. Jyväskylä is not on the table above because no discrepancies were found in the data.
5.3 Webmap vs. Jakti analysis
As discussed earlier in this paper, one of the main methods I used to develop valid analysis results was data mining. There were new relationships and associations between the various fields in an ETL file that I found. There is a direct relationship between Jakti data and geocoded data. In an ideal world, Jakti internal sequence should start at “1” for every identical set of co-ordinates and continue sequentially until the next set of identical co-ordinates. However, there were some discrepancies in the Jakti internal sequence column where the sequence broke off at some point. An example would be a group with sequence starting from 1 to 10 but somewhere in between there appears an odd number that doesn't fall within the sequence. This instance is illustrated in highloghted rows in the diagram below.
There are basically two reasons for this occurrence:
- Error in webmap where the service group has split into two different groups
- Error in jakti where service locations in different service groups have being joined in webmap.
Whichever the case, it was not easy to pick it out directly from the ETL file but these discrepancies were collectively taken under this analysis as an element that represents one dimension of quality.
The process of picking out these errors required Excel macros to be made to run this task. There were 3 separate macros made. The functions of these macros were to do the following in this order:
- Highlight the discrepancies
- Copy the discrepancies onto a separate worksheet
- Calculate these discrepancies as percentages of 2 entities
- As percentages of total service points or groups
- As percentages of total service locations
The difference between 3 (a) and 3 (b) is that in instant A, the single erroneous service locations were individually calculated in relation to the total number of service location in the region. In instant B, the groups that have these erroneous service locations were calculated as percentages of the total number of groups in a region.
5.3.1 Webmap vs. Jakti analysis results
The tables above show result from both service groups and service locations. The reason why in Jyväskyä and Vaasa, there were bigger percentages for service groups as compared to Nurmijärvi and Salo is because of the difference in type of groups from both regions. In essence this means that some groups have more service locations than others. The above data was then put on a graph and compared to six sigma levels
This analysis aimed at mainly showint the level of quality between data that should be in sync in both jakti and webmap. Jyväskylä, Nurmijärvi and Salo had their data falling between the 3rd and 4th sigma levels while Vaasa's quality was way below the 3th sigma level.
6. Cost of Quality overview
The analysis methods discussed above gave a good picture of different estimated quality errors in mainly street and delivery point data. The question that remained after this was that regardless of the quality issues now, have resources invested and those that are being ivested now, made it worthwhile in improving the process?? In addition, what is the relation between these perceived costs of quality to costs at various six sigma levels. Efficiency in a process and high quality data means more financial savings within the company. Therefore in my opinion, after all has being said and done, the only measurement to determine the level of quality would eventually be the costs of quality.
First n foremost, I needed to gather estimations of used resources in data checks between the first optimization phase and the current situation. This data was available only for work done on delivery points. using unit costs for correction of a single element, that is delivery point or street segment, I calculated the current costs that would be incurred if all the errors were to be elminated.
This means my calclation was as follows:
((Number of elements in wrong quality * time taken correct one element)/60) manhours * cost of one man hour.
For the street data, I gathered the data on the total number of connectable street segments in the urban area of one region. Due to the high R squared value gotten from the street networks analysis, it was relatively reliable to make proportional comparisons from that analysis directly to streets. This means that I used the percentage gotten from the earlier analysis for service loactions within 10 meters proximity, to get how many street segments would approximately have errors in one workspace. I selected only CAR and WALK-CAR streets, which are the connectable street categories, in urban areas with an allowance on 10 meters from the boundary of the urban area. I went further ahead to calculate the costs for having a particular number of errors for different sigma levels. This analysis gave an insight into the resources, in terms of money, already used, perceived expenses and projections on savings for moving quality to higher sigma levels.
6.1 Cost of quality: streets
In the FIGURE 33, the column >10m shows, as per the analysis done above, how much would be the potential costs and work hours invested to check through the streets that are above 10 metres from each region. The first column shows how much the initial budget would be to check all the streets from the regions.The consecutive columns in the FIGURE show how much the costs would be if perceived quality was at the different levels of sigma.
FIGURE 34 shows how much would be the perceived savings, as a percentage of the total budget, would be made from each of the regions, if the street quality was at the different levels of sigma shown. This amount is the difference between current perceivable costs of quality to the next level of sigma.
From the two FIGUREs above, we can conclusively say that the amount of perceived costs of quality would be somewhat proportional to the size of the region. However, in Salo's case, perceived costs are substantially large considering the size of the region. More importantly, the importnace of maintaining quality of streets high is shown by the savings that would be made from a shift to just the next level of sigma.
6.1 Cost of quality: service locations
In this analysis, approximations for actual resources used in service location checking for the period between the last optimization phase and now, was available. In the FIGURE 35, the column “current quality issues” shows required investments to correct current quality issues in the regions. Just like in the previous “cost of quality analysis”, the consecutive columns in the FIGURE show perceived costs at different levels of sigma. The final column shows how much resources have being currently invested. The financial perspective of the analysis from service locations gave quite some interesting results. Vaasa, the region with the highest amount of invested resources, showed that it needed the highest investment in additional resources to check errors in service locations. Jyväskylä, which is also a relatively large region, also had quite a substantial investment in error checking. This is in comparison to the other two areas, Salo and Nurmijärvi. The current costs to recheck the service locations, from these 3 regions are almost the same. However, considering the size of the region, in Salo and Nurmijärvi, these costs are quite high.
It is encouraging to notice that service laction quality is at a higher level than street data quality. This however does not dim out the fact that additional savings could be made from making shifts in quality from one sigma level to another. This perceived savings are shown in the FIGURE 36.
The current level of quality for data in Route optimization cannot be collectively summarized. The two subjects analyzed in this paper, those are service locations and street networks, are at different levels of quality. Service locations are at a much better level of quality than street networks.
When it comes to resources invested, analyses show that although there are slight improvements in the quality, they do not necessarily accrue to the investment made in all regions. Through review of various documents and interviews with various persons, I found out that there are several possible reasons why results have not necessarily being positive. These include:
- The continuous repetitive routines for data checks develop uninterest through the checking process. This might result in reduced concentration and negligence of some sought on the part of the worker during handling of the task.
- Current data check methods are time consuming and are prone to error due to their sense of being manual at many steps. Much of the data in route optimization is numerical. This makes it possible to automate at least part of the checking using associations between these numerical fields.
- Lack of defined standards to follow up these checks, makes it hard not only to track progress but also to define acceptable levels. Setting acceptable levels of quality, reduces the chances of wasted investments through unnecessary processes.
- Inconsistency in information flow regarding adjustements to how different data checking processes are handled within the project
There have however being significant adjustments made during the course of this research that aim at improving the quality of data. These are mostly system related improvements.
8. Limitations to the research
I believe that this research gives a good insight on quality levels and standards within the route optimization project. However, the analyses methods and elements used are just small aspects of data as a whole and therefore might create a few gaps and minor inconsistencies. There are many other aspects of route optimization data that would, and should be looked into, in the process of making conclusive quality statements for the project. In addition, in some of the analyses, actual data would aid to develop this research in the future. On the other hand, this work provides solid groundwork and direction into which Itella should undertake to ensure better efficiency in the project's processes.
In my opinion, the easiest way to move forward from the results derived in this research would be to create a quality plan. David Loshin states that high quality data requires constant vigilance. By introducing data quality monitoring and reporting policies and protocols, the decisions to acquire and integrate data quality technology become much simpler. Assessment and improvement of data quality is something that should be adopted as a continuous process with both short and long-term goals. Analyses of quality should be done from both subjective and objective perspectives to ensure optimum results. Considering the current level of quality for data in Route optimization, the current short term goals would be to get to the next level of sigma. Six sigma uses a, very simple to define, 5-step method to continuously achieve improvements. This method has being abbreviated as DMAIC. This means Define, Measure, Analyze, Improve and Control. This is a process that I believe, by virtue of it having a proven track record, that it would be highly beneficial to Itella.
Basically, it means to define the problems and analyses subjects within the project. It means identifying the variables and deliverables within the project and deciding how to analyze their quality. Secondly, continuously measure the performance of the project. This step includes ensuring access the correct data when needed. Third step is to analyze the data using predefined analysis methods. After this, take steps to improve on the problematic areas and finally follow up your process progress.
High costs of quality not only reduce profitability, they are also translated directly to the customer. Company processes should not in any way affect their clients. In route optimization, poor data quality has sometimes caused many delays in project implementation and resulted in some unhappy clients.
It is also vital that all the employees have an understanding on the company's direction on quality. This plays a big part in understanding why the customer the key focus for the company. With adaptation of six sigma methods and further research into analyzing quality for data in route optimization, I believe that the blissful realm of high quality will eventually come into realization. It takes work and effort to achieve this, but a wise man once said, “only by getting stung, does one get to the honey”.