Join us for this free 1-hour advanced webinar where we answer the question, “Why do we need hypothesis tests in process improvement?” and then stay with us as we walk you through a real, live hypothesis test direct from the Bahama Bistro!
- What are hypothesis tests?
- Why do we need them in process improvement?
- When should we run them?
- How do we set up and run them?
- Any words of advice?
Elisabeth Swan, Managing Partner & Executive Advisor
Elisabeth is a Managing Partner, Executive Advisor and Master Black Belt of GoLeanSixSigma.com. Elisabeth has over 25 years of success helping leading organizations like Amazon, Charles Schwab, and Starwood Hotels & Resorts build problem solving muscles and use Lean Six Sigma to achieve their goals.
Tracy: Hi everyone! Welcome to GoLeanSixSigma.com’s webinar. Thanks for spending some quality time with us. We’ve got over 550 people registered for this webinar and I’m very excited about it because I think that you’re going to really enjoy this webinar. And we are really excited that you’re joining us today.
So Lean and Six Sigma are the go-to improvement methods used by leading organizations all over the world to minimize cost and maximize profits and develop better teams all while creating happier customers. So every month, we craft webinars just for you. And this particular topic was on the high request list of covering so we’re really excited to bring this to you. And we really are looking to simplify concepts and tools of Lean Six Sigma so that you can understand and apply them more easily and be more successful.
Our Expert: Elisabeth
Elizabeth: So today’s webinar is titled as you can see How to Set Up and Run Hypothesis Tests.
Tracy: And I’m Tracy O’Rourke. I’m managing partner at GoLeanSixSigma.com. And today’s presenter is also a managing partner and executive adviser at GoLeanSixSigma.com, Elisabeth Swan. How are you, Elisabeth?
Elisabeth: I’m good, Tracy. How are you?
Tracy: I’m good. So Elisabeth has a lot of experience. She has been doing this for over 25 years. She is a Master Black Belt, a consultant, a coach, a trainer. She has helped many organizations like Amazon, Charles Schwab, Target, Volvo, Alberta Health Services, Starwood. Man, you must be tired, Elisabeth because you’re helping a lot of people.
And guess what? They love her. They always want to work with her. And what’s really nice about Elisabeth that you might not know is well, she does live on Cape Cod and that’s a really nice place to live with her husband and her cat. But she also knows how to ride a unicycle since she was 10 years old. Tell us a little bit about that, Elisabeth.
Elisabeth: I learned it from a neighborhood friend and we all – we see the cycle as little kids, I didn’t think about it until my husband decided he needed to learn how to ride a unicycle to be better at mountain biking and I thought, I wonder if I could still do it. And I can.
Elisabeth: I don’t use it on the job generally but I can do it.
How to Interact
Tracy: That’s great. OK. A few housekeeping notes before we begin. So first of all, everyone will be in listen-only mode. We’ve got as we said a lot of people on this webinar so we want to make sure that people can hear very well. At the end of the presentation, we’re going to have a question and answer session but please feel free to ask any questions at any time during the webinar and we can refer to them at the end of the webinar as well.
So we’ll also ask you to participate in some polls. And if we don’t answer all of your questions during the webinar, we’ll be sure to post answers as well as share the recording of this webinar on our website at GoLeanSixSigma.com.
So I will say one more time because we get this a lot. You will be getting the slides for this presentation. They will be available on our website after. We probably get 5 or 6 comments, “Can I have the slides? Can I have the slides?” So yes, you will get the slides.
So, I’d love it if you could join us in our first activity today and that is to tell us where you’re from. So, in the Ask a Question session, please type in where you’re joining us from today.
Elisabeth: What do you see?
Tracy: Alright. I got Hobbs, New Mexico. Eugene, Oregon. Cheshire, Connecticut. Columbia. Auburn, California. Clearwater, Florida. Houston, Texas. Covina, California. South Carolina. St. Louis, Missouri. Kingston, Jamaica. Scotland.
Tracy: We have some folks from Norway. Vancouver, British Columbia. Hawaii. Orange County, California. Italy. Philippines. Rapid City, South Dakota. I almost said San Diego because it says SD. Dominican Republic. Lots of different people. Montreal, Quebec.
So you’ve got a very global audience today, Elisabeth. It sounds to me like many people across the globe are interested to hear you talk about setting up and running hypothesis tests.
Elisabeth: Awesome. Thanks for the warm introduction, Tracy. I’m sorry I can’t see everybody but thanks for those of you tuning in today and thanks for those who are clearly staying up late. You guys are dialing from all over the place.
Who Is GoLeanSixSigma.com?
So as Tracy said earlier, we’ve both been with GoLeanSixSigma.com since its inception. Our mission is to make it easy for you to build your problem-solving muscles. So that means that we simplify complex concepts. We make our training extremely practical and I think it’s really enjoyable.
We provide a running case study at the Bahama Bistro, our restaurant team applies all the tools. Aside from this webinar series, we put up blogs and podcasts and book reviews and lots of information to help you get where you need to go. And we’ve used and we’ve taught Lean Six Sigma for decades because it has got the best toolkit for problem-solving.
We’ve Helped People From…
And thankfully, there’s a growing list of companies who agree with us. So here are some organizations we’ve helped. You can see we’ve got bricks and mortar, online, there are diverse industries. You got healthcare, financial services, manufacturing, government. And the reason these people are all interested is because Lean Six Sigma is about problem-solving. And once you have an organization, you’ve got problems. So like all of you guys, these companies want to be the best at problem-solving so you’re in good company.
Some more on benefits later but first, let’s look at our agenda for today.
First up, what are hypothesis tests and why do we need them when we’re in process improvement, when should we use them, and how do we set up and run them, and then at the last, at the end of this, we’ll give you some sort of words of advice what we’ve learned over time.
And I want to preface this whole webinar by saying that this tool, these tools come to us from a scientific community so there’s a lot more to it than a 45-minute webinar. There’s a lot of depth here. We’re going to keep it at a digestible level because our purpose is to get you comfortable with hypothesis tests and encourage you to use them when they’re appropriate.
There are a lot of different ways to prove things and hypothesis tests are one way that you can use to determine whether your theory is correct or whether your improvement made a difference. And you want to always show the evidence you have to prove that you made a difference. You could remove 10 steps from a process that has 20 steps and you can say, “I changed it. Look, it was 20 steps. Now, it’s 10.” But some things aren’t that obvious. And that’s why we’re going to delve into hypothesis testing today.
What Are Hypothesis Tests?
But what are they? A hypothesis is a guess but it’s an educated guess. We generally start with a process. We start a process improvement project with some ideas about what’s happening in the process and then as we conduct the process walks, we map the processes, we collect the data, our ideas turn into hypotheses because we’re educated about the process now. We have – it’s what we call profound knowledge or a building map. And hypothesis tests are a formal way to test those educated guesses.
We start a process improvement project with some ideas about what’s happening in the process and then as we conduct the process walks, we map the processes, we collect the data, our ideas turn into hypotheses because we’re educated about the process now.
What Can Be Tested?
So let’s go to some samples. So what can you test? So there is a drug out there called Prevagen and their makers claimed that it supports healthy brain function, gives you sharper mind, clearer thinking. They say it’s clinically proven to work. It’s a big seller with the elderly. So that’s one thing we could test.
Another one is Swiffer. One point they claimed they were three times more effective than a broom, puller brooms.
The other one we’ve got here is Sketchers. I don’t know if you remember this. They claimed that Sneakers help you lose weight and also improve your muscle tone. And we had a lot of celebrity endorsements. That was a big one.
So those are things we could test.
Is There a Difference?
And what we’re asking, the general question we’re asking is, is there a difference? So for Prevagen, is there a difference from placebos? That’s what you want to test. Is there any difference?
And the lawsuit that eventually came up against these guys says that the marketers were lying. The study the marketers were relying on showed no difference in memory function between the group that used Prevagen and the group that was given placebos. And that decision came from the Federal Trade Commission of the New York State Attorney General and they’re seeking refunds and a halt in advertising. So that test resulted in a change there.
Next one was Swiffers. Are they really different from brooms? Libman Broom Company might not have heard of them. They complained that the test did not support Procter & Gamble’s claims. And the National Advertising Division of the Better Business Bureau agreed. So they had to stop saying that.
And then we’ve got our Sketchers. Are they really different from Sneakers? Do they really make you lose weight and give you better muscle tone? Well, the claims were based on an independent clinical study conducted by a single chiropractor. And the federal judge said the study did not show a difference and they settled a $40 million claim against the company. Everybody got compensated for their Sneakers. And it turns out that chiropractor was married to a Sketchers’ ad executive and they kind of failed to disclose that. So that didn’t end well.
So that’s one of our questions here is, is there a difference? Another thing was – say it again?
Tracy: That is scandalous.
Are These Things Related?
Elisabeth: It’s funny once you start research of things like I had no idea. All right. There’s also a study that says the more wine you drink, the higher your good cholesterol goes.
Another study showed that as a person’s education increased, the more education you got, the higher your life’s expectancy was. This was a joint study that showed that in actuality, just getting a high school diploma increased life expectancy, so these both turned out to be true.
Two Types of Testing
So there are two types of test. One, is there a difference? We just looked at a few of those. But back to our process improvement world, if you have – in the Bahama Bistro, two cooks that were cooking at different speeds, you could test, is there really a difference between the new cook’s and the experienced prep cook’s?
And the other one you can do is correlation. There was the more you drink wine, the better your cholesterol was. Of course, if you drink too much then it starts to go down again.
Then the other one was the more education you got, the more life expectancy you had. So those were correlations, as X changes, Y changes too.
So your ears perk up when you hear “study show.” They’re running tests. And I love the studies that show that wine makes you healthy and coffee makes you smarter. Those are good.
Tracy: I’ll drink to that, both of them.
Applying to Process Improvement
Elisabeth: So this example is coming to you from the Bahama Bistro. And we’re measuring customer satisfaction. What drives customer satisfaction? And there are lots of factors like the speed of service, the time it takes customers to serve you, menu item availability, food freshness, order accuracy, ambiance. And what these are, is these are different potential Xs. We’re going to start getting into this pseudo math world where the customer satisfaction, the measurement of how says by the customer is, is the Y, that’s our output measure. And then Xs are all variables that could drive customer satisfaction up or down.
In this case, if you wanted to improve customer satisfaction, we determine which of these is critical. We might test that customer satisfaction is different with overhead lighting versus candle light or if the faster the service, the happier the guests. These are examples of why we run tests in process improvement. We want to understand the critical X.
Applying to Process Improvement
Now, this – I don’t know if you remember this equation. It’s usually part of process improvement training. It’s not really math. It kind of looks like Algebra. But it’s important. And it’s saying just what we looked at. It’s saying, any Y, customer satisfaction, cycle time, defect rate, any output is a function of all these critical Xs, X1, X2, X3. And our job is to figure out what’s the critical X? What are critical Xs? There’s often more than one. But what makes the biggest difference in terms of moving that Y up or down in terms of whether what the process improvement project is trying to do?
When Should We Run Them?
So there are two main areas in DMAIC where we run the test, the analyze phase and the improve phase. And the root cause in the analyze phase is we’re trying to determine the root cause and that’s what we’re going to focus on today. We’re going to look at testing to understand root cause.
So when you’re improving projects, you’re deciding where to spend your time, where to focus data collection, where to put your valuable resources. If you fix the wrong thing, it’s expensive in time, dollars, and also project momentum. People get discouraged if they go after the wrong thing and it doesn’t really making any difference.
If you fix the wrong thing, it’s expensive in time, dollars, and also project momentum. People get discouraged if they go after the wrong thing and it doesn’t really making any difference.
And then improvement, it’s key. You want to show people that you made a difference. You did all this work. Did it change the Y? Did it really drive customer satisfaction up? You have to show a difference between the baseline, when you started this project, what was the level of the project Y whether it’s customer satisfaction or defects? And then you have to show that the difference is statistically significant. It’s not just random common variation but you really made a difference.
And that’s the two areas you have test. We’re going to focus on the analyze phase version which is determining root cause.
How have you used hypothesis tests?
So now, we’re going to go to a poll. So this is our first one just to get a feeling for how our subscriber audience has used polls, how you guys have used them or whether you’ve used them. So let’s launch a poll now and get a sense.
Tracy: I have a quick for you too while people are polling.
Elisabeth: Awesome. So this is how have you used hypothesis test. All right. So I’m going to launch that. So go ahead. And Tracy, what’s your story?
Tracy: So I came in and I was helping a project group on finding the root cause to something. And I kept asking them, “What do you guys think the root cause is? What do you guys think?” We were brainstorming root causes, right? And it got really quiet. And then someone said, “Well, we think it’s Chris Mobley.” Yeah. Everybody starts nodding, “Yeah, it’s Chris Mobley.” And I’m like, “Whoa! OK.” And I say, “OK. Well, let’s put it down as a hypothesis.” And so, they’re like, “Yeah, he’s the one. He’s always the problem. Blah, blah, blah.” And they were totally like blaming him.
We come to find out when we actually tried to prove and I say, “OK. Let’s try to prove that.” We collected data from this person’s area. It turned out to be not him or his area. And I said, “I’m sorry guys, the data doesn’t show that he’s the root cause or his group.” And they were mad.
You know what’s even funnier, is three weeks later, I got a call from Chris Mobley and he’s like, “I heard you got my back.” I go like, “I don’t have your back. The data just didn’t show that you were the problem or you were the root cause.”
Elisabeth: That’s hilarious!
Tracy: Isn’t that funny?
Elisabeth: I think we’ve got a good 80% on this one.
Tracy: OK. So this one says, I haven’t tried to use one yet, 47%. So it sounds to me like a lot of people that are on this webinar have never used one before. So that’s interesting. And then that’s shortly followed by 27% say that they’ve used it to verify that the solution made a difference followed by they verify a root cause. And last but not the least, to test the guarantee made by a vendor.
Elisabeth: Fascinating. I was wondering if anyone did that because that’s really the least common. But it’s kind of a split between checking if your solution made a difference and verifying root cause. That’s great. OK.
And then glad to have those of you that never tried it before. Hopefully, this encourages you to give it a shot.
Elisabeth: Thank you, Tracy.
The Testing Process
So the testing process starts with a practical problem. And from there, we turn it into a statistical problem, right? So we go from English into Math. This comes from a scientific community. There are rules and regulations to follow. And I know that those of you that have just saw the word “statistics” or heard me say it, I might have lost you. But hang in there because we’re going to take it into statistics then we’re going to come back out and make sure that it’s grounded in the real impacts.
So first, practical then statistical then we’re going to do our analysis. We come back to our statistical conclusion which leads us into a practical conclusion. We’re going to come – go in and then come back out.
So now, we have our general hypothesis. So we want to know what’s the problem, what type of data are we looking at, and then a statistic refers to the measure. So that could be average or the median or standard deviation. So there are a few measurements we can look at. But we start with stating what we’re trying to figure out.
Bahama Bistro Example
So let’s go back to the Bahama Bistro. And the team at the Bahama Bistro is making an educated guess. They’re not sure if what seems like a real difference is just common fluctuation in sales. They said the problem is, “We can’t keep up with hot sauce demand, 16% of the time, customers cannot get the hot sauce they want.” So the general hypothesis is that the average hot sauce sales are higher at a downtown location and that’s causing hot sauce stock outs. So that’s their general hypothesis.
And now, we have to look at our data and that’s money and that’s continuous data, continuous or variable data. And if we have continuous data, continuous data just a quick reminder, that’s cycle time, weight, volume, things like as opposed to proportion defective. So we’re looking at continuous data. And we want to know, is it normally distributed?
If Continuous – Is It Normal?
So let’s take a look at what’s that about. If it’s continuous, is it normal? So this comes back to the Bell Curve. Most of you have been exposed to this normal distribution which is called a Bell curve. And our data could be normal or it could be slanted or skewed to one side. It could have two Bell curves, which is also called bi-model. But we just need to run a quick test to determine if the data is normally distributed. So the test is called Anderson-Darling. It’s the Anderson-Darling Test, which sounds more dialogue in a TV drama. But it does the trick.
This is an easy test. It means highlighting each column of continuous data that you’re planning to use and you select the Anderson-Darling Test for normality. It’s not bad if it’s not normal. It just impacts the type of test you get to use.
Instructions and practice datasets for this test, the normality test and all the others are on the GoLeanSixSigma.com’s site for both Minitab and SigmaXL. So if any of this is foreign and you’re thinking, “Well, I don’t know how to do that.” So that’s going to cause a problem. All of this is online and you can get more in depth if you’d like.
So the next thing we’re going to discuss is this idea of a P-Value. It says it’s not normal if the P-Value is less than .05. So we’ll touch on that again.
If Continuous – Test For Normality
But let’s go look at our datasets right now. So there’s one of our datasets. That’s the downtown sales. So I’ve ran the test and it looks at 200 different days of sales and there’s all this information. If you look at the very bottom, it says there’s the Anderson-Darling and there’s the P-Value.
And let’s go to the next one, so that one is not under .05. Now, let’s look at the beachfront sales. The same thing. And we looked down there and we see once again P-Value is not under .05. So both of these are normal, normal datasets.
Tracy: So Elisabeth, can I just recap really quick?
Elisabeth: Yeah, go ahead.
Tracy: So you’re saying if anyone out there has continuous data, you want to check to see if the data is normal, follows a normal distribution, because that helps determine which test you would be running. Is that right?
Elisabeth: That’s right.
Tracy: OK. Thank you.
Elisabeth: Absolutely. So now we know that. We know these – and we can even see it, right? The Bell Curve is right there. These looked normal. They tested normal. So they’re normal and that means that’s going to come into play when we choose our test.
Formal Hypothesis Statements
So next up, we’re going to move away from English into Math. Formal hypothesis require us to come up with formal hypothesis statements in order to run the test. So it says in this case, what is the Null Hypothesis and what is the Alternative Hypothesis? These are the two statements we have to come up with to run the test.
Why Null and Alternative?
Now, why? You’re thinking, “I have a hypothesis. What’s up with this Null and Alternative? Why do I have to have these two tests?” Why do you need them both? So let’s take a step back and say, “What if your theory is that all swans are white?” How do we prove it? Because I want to prove that swans are white. I cannot get all the swans together in one place. But if we find one black swan then we can prove our theory is false. So we use falsification. We set up the Null and we try to disprove it. And that’s the position we’re always in.
We take our theory, that’s the Alternative, and then we pair it with a Null, saying there’s no difference. Alternative, there’s a difference. Null, there’s no difference. And then we try to prove that Null is false. So that’s where we’re starting from is this idea of falsification.
Null and Alternative Theories
So here, the Null and the Alternative appear as HO and HA. Null zero and then HA or Null is a letter O.
So the way to remember this is HO, Ho Hum, no difference or HA, A Ha! there’s a difference. So the Null, nothing is going on. Nothing to see here. Keep moving. The Alternative, there is something going on. I get to pursue my theory. You try to reject the Null if you can then you can pursue the alternative. And that’s the way to remember them.
We’ve got a few little memory games here that are always helpful when you’re dealing with realm of statistics.
Formal Hypothesis Statements
OK. So now, let’s talk about our formal hypothesis statements for the Bahama Bistro. So they’ve built the Null. They said the No is there’s no difference between the average hot sauce sales at the downtown location and the beachfront location. And then you got these little Greek letters and we pronounce is µ MU. It just means average and it says, µ1 = µ2. Downtown sales equals beachfront sales. No difference. Ho Hum.
Alternative, the average hot sauce sales at the downtown location are not equal to the average sales at the beachfront location. µ1 ≠ µ2. There’s something going on. They’re not different. A Ha!
So we’re looking – our job is to falsify and disprove that Null so we can pursue this Alternative.
Elisabeth: Any questions, Tracy?
Tracy: I’m just going to say the one I always think about is the one that we use in our judicial system, which is there are two choices. They’re either guilty or not guilty. And that’s a Null and that’s an Alternative hypothesis, right? The Alternative I guess is guilty and then the Null is not guilty. So I think it’s really interesting. It’s very similar to what you were talking about with the swans, right? If you can’t prove guilt, it doesn’t mean they’re innocent. It just means they’re not guilty.
Elisabeth: Innocent until proven guilty, right? They’re Null. They’re just like everybody else until you disprove them.
Elisabeth: Cool! I like that.
Tracy: Thank you.
Elisabeth: Alright. So analysis, now we got to select a test, get the data, perform the test. Now for that, we’ve got this handy Hypothesis Tree, Hypothesis Testing Tree. And let’s follow the lines here.
Select the Appropriate Test
So what type of data? The first thing we ask, and we said it was money. Looking at sales, and that is continuous. I can continually divide money. It still makes sense. That’s continuous. Now, are you testing for correlation? As sales go up, something else goes up? No. We are not testing for correlation.
Well then, is the data normal? Yes. It passed the Anderson Normality Test. And now, we go to the type of statistic and we said we’re looking at average sales, downtown versus average in the beachfront. So that takes us down here. And we have two sets of data. We’ve got beachfront and downtown. So Two-Sample T.
And that’s what you do. You’ve got a test you can look at. You’ve got this tree to follow the logic to find the test that’s for you. And you can see on the left, there’s only – there’s fewer discrete tests, continuous or divide it up. We’ve got correlation then you’ve got normal, not normal. But we’re going to focus on this one test today that’s a very common test to use and let’s see – let’s get some more information from you guys.
What are you testing?
So those of you that are testing and some of you I’m sure are going to say you don’t know yet, but these are the questions for you. You’re comparing two or more defect rates, it’s this discrete data, proportion defective or you’re comparing to a more cycle time. That’s a very common thing to look at in process improvement. Are you checking if two things correlate as X increases, Y increases? Are you doing something else? Because there are other things to test or you’re not sure yet?
So let me launch the poll and then let you guys – and that is, what are you testing? There we go.
Tracy: So you know, I’ll probably tell a story here too.
Elisabeth: Go, Tracy. I love your stories.
Tracy: So while everybody is taking the poll, why not? So my favorite kinds of these project stories is when they think it’s – they are almost positive that the root cause is something and then they go and collect the data and it actually shows that it’s not the root cause. And I just love these stories because it just validates that the process works. And I’ve seen companies save hundreds of thousands of dollars because they were going to implement something for what they thought was the root cause and then they discover in this process that it wasn’t.
Tracy: So – and I’ll just say really quickly that this one company I was working with was going to buy credit card machines for all their trucks like 70 trucks because they thought that the problem that was the main root cause of incomplete deliveries because people weren’t paying. And when we collected the data, that was one of the lowest reasons why there were incomplete deliveries. So they would spend all this money on credit card machines for truck drivers when it wasn’t going to solve the problem.
Elisabeth: Yeah. So you needed to bust that assumption, which is great.
Tracy: Yup, they busted that assumption.
Elisabeth: So let’s close this and share.
Tracy: OK. So, 33% say they’re not sure yet how they would be using this in terms of what they’re going to be testing, 27% said comparing two or more cycle times, which is obviously very, very common, 21% we’re checking to see if two things correlate, 16% comparing two more defects, defect rates, and 3% are other.
Elisabeth: Very cool. So a lot of you, 27% of you, might even be using the Two-Sample T if you’ve got some normal data. But then some of you are going to use regression, which is always fun. It’s not a scary word but it’s a good test. And that’s great and fair enough. You guys are trying to get a feel for what test you would use.
Tracy: Yes. And I would just say too for those that are not sure yet. I mean and I know some people say, “Well, I’ve gotten this far in my life without any hypothesis testing.” So it could be scary but again, when the stakes are high and people are dying because the process isn’t working. I mean those are – or you really need to be statistically sure that there is an improvement like these companies, that’s when this is really important to do the testing. So you might have not – it doesn’t apply for everything and it’s not necessarily needed for everything. But it is important to know when you need it.
Elisabeth: It’s really true. And Tracy, because they’re on the webinar today, every single one of them is going to run a hypothesis test afterwards. I can feel it. OK. Thank you, guys, for taking the poll.
Gather the Hot Sauce Sales Data
So next, you’re going to gather the data. And all we need are columns B and C. There’s a total. There’s a date. And this datasets is online. You can play with this yourself. We just need the downtown sales and the beachfront sales.
Perform the Test
And the next thing we’re going to do, we’re going to run the test. So we provide instructions for whether you’re going to use Minitab which works on a PC or whether you’re going to use SigmaXL which works on a PC and a Mac.
Now, I have a Mac so I’m going to use SigmaXL for this one. But we have instructions for both. And I am going to bring us over to our datasets. So here’s our datasets. And as I mentioned, I’m going to capture downtown and beachfront. And there’s a little shortcut or if I hit command – let’s see. Let’s do this. I’ll do shift over here, command down arrow. Let’s see. Then it captures all – I think it’s 201 rows of data. We just make sure I did that right. OK. There we go, 201.
So then I’m going to go up to SigmaXL. And I’m coming down to Statistical Tools. Those of you that have this in different software packages, different ways to get there, but basically, you’re looking for where the stat tools are. And then you’re coming over here and you see, if I look up Descriptive Statistics, I’m going to come all the way down to a Two-Sample T. So that’s what we need to do, a Two-Sample T.
And I’m going to hit next. My data is on stock. So I’m going to hit both variables. OK. And let’s come back so we can see in a larger format. OK. So there is our data.
Tracy: Very nice.
Elisabeth: So let’s review these results, Tracy. We have the two hypotheses to the front. There’s the HO, the Null, and the mean difference is zero. And then HA, mean difference is not equal to zero. All right. They’re different.
So we look at the count. We know that there was 200 days of sales from each location. So the mean displays the average sales for the downtown versus beachfront. But we’re most interested in this P-Value that is highlighted in red for us.
Accept or Reject the Null
So let’s figure out what that means. And that is this phase of the process, which is the Statistical Conclusion. This is where we interpret the P-Value and we accept or reject the Null.
Alright. Let’s come back to this P-Value. Now, we’ve seen it with the Anderson-Darling. It’s going to appear on all these process – these hypothesis tests. So what is it? And P stands for Probability. It can be between 0 and 1. And there’s a lot of nuance to this but I’m going to make it simple for you. It boils down to the probability of the Null Hypothesis being true. So high P-Values, if there is something as high as .84. Remember it’s between 0 and 1. If it’s .84 then there is very little reason to doubt the Null. There is really no difference.
If there are low P-Values like if P equals .015, something as low as that, it could be all the way to zero then there’s very little probability the Null Hypothesis is true. So you have to reject it.
And let me give a fun little thing. Tracy and I use it all the time. If the P is low the Null must go. But if the P is high, the Null can fly. So if the P is low and low is less than .05 then you can reject it. But if the P is high then you cannot reject it. The Null gets to stay in place.
Tracy: So is that why the Null is set in this picture because it has been rejected?
Elisabeth: In this case, Tracy, yes. It’s very obvious to go away.
Poll #3 Mini-Quiz
In which case would you reject the Null Hypothesis?
OK. So now, we got another mini quiz for you. Were you listening when I told you to have the P-Value worked? So I’m going to launch this and I’m going to test whether you guys know when to reject the Null. This is your first practice. Go ahead and choose which of these would you, in which case would you reject the Null.
Oh, these people are good.
Tracy: Yeah, I could tell.
Elisabeth: If the P is low, the Null must go. It’s between 1 and 0 and anything between – anything lower than .05 is officially low. All right? Let’s close this and see what we got.
All right. So this takes a little getting used to but it looks like 88% were thinking .045 which is under .05, less than, then that would be a case for you to reject the Null and you’re right. So we’ll get to more practice with this. But that’s really how it works. That’s what you’re looking for.
Tracy: Looks like people are paying attention, Elisabeth.
Elisabeth: I think they were, 88%.
Tracy: 88% got it right.
Confidence & Risk
Elisabeth: OK. Let’s talk about something else you’re going to see. And that’s something called a confident level. And the flipside of that is risk. There’s confidence and risk. And you saw it in the test. You may have noticed it said, “95% confidence.” And that says, literally the translation is, “I am 95% confident that the inferences I’m making based on testing these samples are correct. I’m 95% confident.” And that means there’s a 5% risk I am wrong. So I may see a difference. I may say, “Hey, downtown is different from the beachfront and really, it’s not because there is a 5% chance I’m wrong.”
So if you’re wrong, you might reject the Null when there really is no difference. It’s possible because we’re working with samples. But the great news about this, this is standard. We’re not – unless you’re working with medical devices or on a NASA space shuttle, the process improvement projects, they understood is 95% confidence. It’s going to be a default in all the tests you use and you can leave it. So trust me on that and leave that and focus on the P-Value.
So, the P is low. The Null must go. We can reject the Null, right? It’s less than .05. It’s zero. It didn’t even register. It’s too small to show up in the space allotted here. So that means downtown has statistically significant different sales from the beachfront. And that means we got to now take this back into a practical conclusion.
So what is the practical conclusion? Since we statistically rejected the Null, that means location is a Significant X. We’re trying to understand what’s driving these, all the times that customers don’t get the hot sauce they want and we said, “You know what? Location is the Significant X.” And that X is downtown having a higher rate of sales. And so now, they’re going to have to take that into consideration when they do production and they do stocking. They were ignoring it. They didn’t recognize the difference. And now, they’re going to recognize the difference and basically build their process and forecast accordingly. So that’s our practical conclusion.
Update the Hypothesis Testing Plan
And once you’ve done that then you’d go to your hypothesis testing plan. So you’ve always got a plan. You’re always listing what the possible X is. In this case, it’s location. Your hypothesis is there. Your general hypothesis, the average hot sauce sales is higher at the downtown location than beachfront.
What’s the test we used? Two-sample T. What’s the result? It’s true. They’re different.
So now that we’ve updated that and we always update that then we come back and let’s just think about verifying results in general. So teams often neglect to conduct hypothesis test which can lead to as Tracy was describing, wasted time, misplaced resources, people get frustrated. And people fix things where it wasn’t a problem as she gave you an example.
…people fix things where it wasn’t a problem…
So have you made improvements that didn’t help? Here’s an example of not testing assumption. So there was a project I was involved with. It was reducing cost at a hotel chain restaurant. Or just in general, there’s a lot of – we’re working with a hotel chain so it was all their on-site restaurants.
And they were basing their staffing model on the occupancy rate so they could see how many people had booked at the hotel. And they had part-time and on-call staff. And they staffed the restaurant based on how many people had registered to stay at the hotel, which made perfect sense.
They assumed there was a correlation, right? As the number of guests increased, the number of meals consumed would go up. But since they often found their staffing, they were over or understaffed, mainly overstaffed which was causing the high cost of the restaurant, when they did the actual correlation, they didn’t find any. There was no relationship between the check-in rate and the number of meals consumed.
But if they took out both dinner and lunch, they did find a correlation. There was a correlation of breakfast meals consumed that was impacted by the occupancy rate of the hotel, which makes sense. You’re at the hotel, you might likely have breakfast there. But then you’re off. You might be at conference. You might be at meetings. You might be off-site. Lots of places to choose from. That really changed their staffing model.
Another example. Recently, I was working with a Black Belt who is sure that her improvements had increased revenue. She showed me. She said, “Look, I did the improvements and see, the revenue has gone up.” And I was like, “Well, this is – there are some seasonal cycles to this, right?” And she said, “Oh yeah, absolutely.” I said, “Well, let’s look at last year.” And actually, it looked exactly the same as last year. It really hadn’t made a difference. So she had to go back and implement more improvements. There was no verification there.
And Tracy, you had a great example I think you were working. It was financial services.
Tracy: Yeah. And I think the example that I would like to share is kind of the opposite. It’s people actually didn’t think they had made a significant impact when they actually did. So I think a lot of people are very comfortable using percentages, right? Like, “Oh, we have X percent improvement over last year or over this improvement cycle time, whatever it is.” And so, we’re very comfortable understanding percentages.
And in this case, they were working to reduce what we call undeliverable mails. So mail that never gets it to the recipients and cannot find these people, and we’re talking financial business. We’re talking money. We’re talking about trying to find these people. And then if you eventually cannot find these people, all that money gets turned over to the state.
So they were trying to reduce undeliverable mail. And they did the improvement project. And when they measured it initially, there was only a 1% improvement. And the team got completely discouraged. They had done so much work on this project and they thought, “We only have a 1%. That’s not even significant. It can’t be.” And I said, “Well, I think you should run the test anyway just to see.”
And they ran the test for statistical significance which tells us it’s not just, do the random chance that there was a statistically significant change. And it came out as reject the No, meaning this is a statistically significant difference in what has happened.
And when you think about it, 1% with hundreds of thousands of mail now being delivered, but when you look at the percentage, there was only 1% so it didn’t sound like they made a huge impact, but they really did.
So the team went from discouraged to very happy because they really did want to make an improvement. And when you just look at improvement percentages, it wasn’t registering.
Elisabeth: Yeah, that’s a clear example. I love that on the flipside that’s doubting yourself and finding out …
Tracy: And they actually won an award. They won an award from the postal service because the postal service was like, “Thank you!”
Elisabeth: Oh great! That’s great. Thank you, Tracy. That’s an awesome – that’s a great one.
Who here has ever:
So let’s do a little check on you, guys. I would like to find out if you have ever and then go ahead and let us know if you ever – if you have personal experience of what Tracy and I are describing; addressed a root cause that was not a critical X, fixed process that saw no positive results, looked on as others based fixes on assumptions, or experienced some combination, or none of that is familiar.
So any of this and any combination of this because those of you who have been in the process improvement for a while, you’re probably thinking, “I’ve seen it all.” So we just want to hear what you guys came up with
Tracy: It looks like we’re getting some good votes there.
Elisabeth: We are. OK. So I’m going to close and launch and let’s see what we got.
Tracy: So 50% say that there’s a combination of some of the above. So let me cover those first. 15% said looked on as others based fixes on assumption, so making assumptions when they shouldn’t have.
Fixed process that saw no results, 10%. I saw that a lot. That was the one that got my vote is seeing improvements implemented that had no effect on the problem. Like OK, that was a waste of time.
And then the last one, 8% addressing a root cause that was not a critical X. So those were the pieces that were a combination of something.
And then 17% said they’ve experienced none of the above. But maybe you just weren’t thinking about an example. But a lot of times when I ask people, “I mean how many of you have seen solutions implemented that didn’t solve the problem?” Like the whole class raises their hand.
Elisabeth: Yeah, yeah, a common experience. Well, thank you guys. Thank you for chiming in on that. That’s good to hear and good to see your experiences. And these are common problems which beg the question was there an opportunity for a hypothesis test? Could that have spared some of the wasted time or effort?
Why Use Tests?
OK. So why use test? You got root cause. You’re looking at are defects higher one place than another. I can investigate.
Or a high proportion of late shipments coming from particular supplier? I can investigate. Or switch suppliers.
Are the portions of weight the same across all meals? It could lead to mistake putting solutions like standard scoops, things like that.
Verifying standards, we asked this earlier. If somebody was promising you a certain delivery time in advertising it as part of your contract, you can test. Are they really doing it? And if they’re not, you could renegotiate.
Or verifying solutions, do the changes reduce defects, cycle time? Confirm and spread the news. Build trust. It builds confidence and it builds momentum. These are really helpful.
Any Words of Advice?
Some words of advice. Clarify the practical problem you’re addressing before turning it into a formal hypothesis statement.
Use the Decision Tree. Very easy. You can think about what kind of data you have, how many strata you’re looking at, datasets.
Spend time setting up your Null and Alternative. Check it with other people, making sure it’s no difference versus difference. Remember, you’re looking for a difference and/or correlation. Difference between different people or different places, different units, different times, before and after, and things like that. Correlation, as X changes, the Y changes. Either up or down.
P-Values are guide. If the P is low the Null must go.
And hypothesis tests are helpful when determining a potential root cause. They can also prove your solutions made a difference.
Hypothesis tests are helpful when determining a potential root cause. They can also prove your solutions made a difference.
Today We Covered
So, we talked about what are they, we talked about why you need them, when you should use them, how to set up and run one. We just did one. But it’s pretty much a similar process for all of them. And those are – we just gave you a little sort of primer on how to go through this and what works and what to watch out for.
And that brings us to Q&A. We’re going to have a short time for Q&A but we promise you, whatever questions you type in and please start typing in your questions now, we’re going to come to them after we give you some other info of upcoming events, if we don’t answer them today which a lot of them will not be answered, then we’re going to answer them online and then we’re going to post them for you. So you will see an answer to every question you give us. All right?
And the other thing I want to let you know about is you can learn more about hypothesis testing by signing up for Black Belt Training and Certification. You can get lots more training. This webinar is just a start of your journey. You’ve got lots of other topics you could look at. So learn more about Six Sigma tools and concepts with more training.
Tracy, there’s another webinar coming up. Do you want to tell them what that’s about?
Tracy: Absolutely. We’re going to be talking about Introduction to Lean, so very basic introductory, very quick webinar about what Lean is, why people use it, what it’s for. It’s great if – there’s lots of people actually that still in this world that don’t know what Lean and Six Sigma are and it could be helpful to understand that.
So it’s a basic webinar. But I find that it’s great for – especially even leaders who don’t really understand what it is and they don’t really – do not necessarily behind implementing it at their organization. So it can help if you end up having a leader or even an employee or a co-worker that wants to learn more about Lean. Forward them this and see if they want to attend.
Elisabeth: Yeah. That’s a good primer. Thank you, Tracy. There’s also – we’ve got a podcast. Do you want to tell us a little bit about Richard Baron?
Tracy: Yeah. So Richard Baron actually works for Coconino County in Arizona. He is a government employee. And it was really interesting talking with him about what he does for Coconino County. And if you didn’t know, Coconino County has the Grand Canyon in it so from a land mass perspective, one of the largest counties in the United States.
And the nice thing is, he wrote a book. Richard Baron wrote a book called Streamline Your Path to Government Efficiency Starts Here. And he talks a little bit about his book. It’s a fable. And really interesting book. And so, we get to hear him talk about what he likes about his book and what he likes about process improvement.
Elisabeth: And this just, for those of you that have never tuned in to the Just-In-Time Café podcast, we also talk about success stories we’ve seen. We also give you the latest apps that help out and latest books out on Lean and Six Sigma. In this case, it’s Richard Baron’s book. And then we do an interview with somebody, a notable in the Lean Six Sigma world.
And that brings us back to the Q&A. So Tracy, we only have a short time available but let’s get a few questions in.
Tracy: OK. Good. I have – there are a few questions. So one of them is, “There are other tools in analyze like the 5 Whys and the Ishikawa diagram. Are these methods also for verifying root cause?”
Elisabeth: That’s a great question. And they are to my mind some of the best tools you could possibly use in a process improvement project. But they are basically structured brainstorming. So you have a root cause analysis is saying, what could be causing the problem? If the problem is that we are basically running out of hot sauce, what could be the problem? And someone may have said, “You know what I think the problem is I think downtown is selling more than we think they are and we are having stock outs because we’re not accommodating that issue.”
Then you’d have to go collect data and test it. The same with the 5 Whys which is a great companion tool for the fishbone where you take something like that, take a symptom that might come up in the brainstorming on the root cause diagram and then ask 5 whys to get down to root cause.
Again, that’s brainstorming. It’s what the group thinks is going on. You still have to go get either get data and test it or go verify the process by looking at it. There’s lots of different ways to do verification. But those are a first step. The testing would be the verification step.
Tracy: Great. Thank you. And so, another question. Average sales comparison seems simple. Setting up a hypothesis test seems an unnecessarily complex approach to arrive at the answer. No.
Elisabeth: It depends. It may be really obvious. It maybe that you got something, the sales at one place are 50% higher than the sales at the other. I don’t need a test. I can see. It’s really obvious. These are when it’s tough to tell and you need to be sure. So we can try to keep it a simple example and in that case, would we have needed it if it was in your process? Maybe not.
We’re trying to keep it simple so that we could give you a good example. But there are other cases where as Tracy described, it’s not immediately obvious, 1% doesn’t sound great. It doesn’t sound big. But it turns out to have been a statistically significant difference.
So it depends. It’s when you need that extra proof, when it would cost you a lot of money in time and resources and momentum if you made the wrong choice.
Tracy: Thank you. This question is, “Is the Two-Sample T used because you have two samples and (different locations)?”
Elisabeth: Yeah, that’s a good point. If you – if we had three different locations, it would go to the next test on the tree and that would have been ANOVA or analysis of variants. We could have two or more strata, in this case as you point out, it’s location. But simply only had two, we could use the Two-Sample T.
Tracy: OK. We do have a few questions about statistical programs that are available. Which ones are better to use? Minitab, QI Macros, SigmaXL, SPC XL. They’re asking for advice I think.
Elisabeth: Yeah. And I appreciate that. There is a lot out there as you just listed some of the top vote-getters. And honestly, I used to use Minitab all the time but that’s when I had a PC. And it’s also when somebody else was paying the bill because Minitab cost a lot more than SigmaXL. I have a Mac now so I have reduced options. I could use QI Macros and I do, SigmaXL. So those are the ones I’m most familiar with and they’re all good. They all do the job.
I think you mentioned another one that I’ve also used, SPC XL. Anyway, I’d never had a bad experience with the stat package. It really comes down to what you get used to. And it’s also a matter of price. So they cost different amounts.
Elisabeth: And that also will drive. And so, we try to give kind of a Cadillac and then one that was more affordable in terms of options for you guys.
Tracy: Wonderful. And the last question is, “I’m new to process improvement but I’m very interested in becoming experienced with this. What is a practical way to get better at this?”
Elisabeth: New to process improvement.
Tracy: Besides of course going for a training.
Elisabeth: Yeah. Well, there’s – actually, I would say go to the training or take – go through one of the webinars that we’ve done that gives you the high level view. You’ve got intro to Lean Six Sigma. You’ve got White Belt. That’s a one hour training. You’ve got Yellow Belt which is 8 hours and free. It’s still free. I think it’s awesome. And that’s all self-paced.
When I say take a look at what – there are also some blogs we’ve done giving you. Go to the website and do a search on Lean Six Sigma or introductions if you switch camp with the officers. So, very many resources. But that’s a really good question. There’s a lot out there.
Tracy: Alright. So I think that’s tying up for us today. As I said, any questions that did not get answered, we will answer those online and post them in the next day or so. And then thank you for joining us, everybody. It’s a pleasure having you. Hope to see you at the next webinar. Bye-bye.
View our upcoming webinars and join live so you can ask questions and let us know what you’d like to us to cover next. We’re busy building new webinars all the time. And we’re happy to know you’re busy too – building your problem-solving muscles – keep it up!