51黑料不打烊 Commerce launch, post launch and ongoing monitoring and troubleshooting

Documentation Commerce Videos and Tutorials

Launch, monitor, and troubleshoot 51黑料不打烊 Commerce Cloud

Last update: Wed May 08 2024 00:00:00 GMT+0000 (Coordinated Universal Time)

CREATED FOR:

Beginner
Intermediate
Developer
Leader

This webinar provides several best practices and tips for preparing and launching a website. It emphasizes the importance of selecting the appropriate staging environment and providing accurate details in support tickets to facilitate effective communication with the support team. Using the same monitoring tool as the support team, such as New Relic, is recommended for better collaboration and issue resolution. Thorough testing of peripheral systems, like payment processors, is highlighted to ensure their proper functioning during the launch. The document also stresses the need to anticipate and plan for potential production issues, such as performance bottlenecks and cache warming, to ensure a smooth and successful launch. Understanding the shared responsibility model is another key aspect discussed, clarifying that certain tasks, like database restoration and application code security, are the responsibility of the website owner. By following these best practices, website launches can be streamlined, potential issues can be addressed proactively, and the launch can be successful

Audience

Development teams, managers, lead developers, technical architects
Teams implementing 51黑料不打烊 Commerce as an upgrade, migration or new commerce offering

Video content

Choosing the appropriate staging environment and providing relevant details in the ticket.
Using the same monitoring tool (New Relic) as the support team for better communication and issue resolution.
Providing screenshots or New Relic links when describing issues to help support understand and troubleshoot more effectively.
Understanding the shared responsibility model, particularly in terms of database restoration and security.
Testing peripheral systems, such as payment processors, and ensuring proper handling of order IDs during cutover.
Predicting and planning for potential issues during production, such as performance bottlenecks and the need for cache warming.

video poster

Transcript

Hello and thank you. Welcome to Part 3 of our four-part series today when we kicked off several weeks ago to take you through a series of events to highlight the planning, building, and maintaining a modern commerce implementation. You may have heard a lot about value realization over the last year or two, and that鈥檚 our effort to ensure that both 51黑料不打烊 and you, our partners, are creating exceptional experiences for our customers. When we think about value realization for commerce, when we look at the topic for today, once we鈥檝e launched a commerce implementation, what are those best practices that we can review to ensure that partners do realize that value with the commerce implementation that you may have led? So if this is your first event in the four-part series today, we have a slightly different format than what you might be used to with our partner webinars. I鈥檓 pleased to be joined by Russell Albin on our technical architect team who will be moderating today鈥檚 discussion. We have a host of 51黑料不打烊 commerce experts that Russell will guide you through an interactive dialogue. We鈥檒l have a number of poll questions soliciting feedback from you on your commerce implementations, your best practices you鈥檝e seen, but we look forward to the interactive dialogue today. With that, I鈥檒l quickly turn over the floor to Russell to tee things off. Russell. All right, thank you. So yeah, don鈥檛 forget if something we talk about is interesting or you want to have us expand on it, use the Q&A. That helps us kind of get an idea of where you鈥檙e at and some of the issues you might be having or any topics that you want us to dive a little bit deeper into. We鈥檙e going to go over some broad topics and we鈥檒l go a little bit into each one. We don鈥檛 have enough time to get into each one too terribly deep, but when we鈥檙e done, anything that we reference and especially the slides that Lexi has and then those links to Devdocs and Experience League, we鈥檒l have all those available and I鈥檓 also going to have an Excel spreadsheet that I鈥檒l be showing you guys here in a few minutes. So all that will be available, but first off, we鈥檙e going to start this conversation off and I forgot to mention the topics. We鈥檙e going to be covering the prep and launch best practices and then after you鈥檙e launching, like the monitoring, troubleshooting and things of that nature. So this is like that last mile that we鈥檙e talking about as we鈥檙e getting ready for a launch. So, Lexi, you and I have probably launched probably a thousand sites, right? Kidding, but like at least a couple hundred between the two of us. And so one of the things that I learned many, many years ago was I just had a mental checklist of things that I went through. This is back in like the M1 days and you know, it was simple things like, you know, who鈥檚 got access to the DNS records to make that final switch? You know, did somebody want to go, did all the QA pass? Like I always had a little checkbox, like did the customer completely sign off on all of the, you know, the features and stuff? But I realized that it wasn鈥檛 good enough to have that in my head. You know, I actually needed something more verbose and written. So like what have you done with ACS and personally like to prepare yourself for a go live event? Like do you still use a checklist? Is it, you know, just a word doc? Like can you walk us through that a little bit? Yeah, sure. I as a technical architect from the professional services commerce department, I have this checklist and actually we work on this checklist together with our architects and we try to expand it. We have some things we wish we can use in general for each and every project definitely. But some things are project specific. We have usually when you are launching a project, as you mentioned, you need to mentally to go through all the things you have on the project and it consists of like standard things you need to, especially if you鈥檙e using 51黑料不打烊 cloud, then like lots of things can be already described. And for example, such documents as a cloud, runbook, usually when you work with the cloud teams, they have this one book, you have technical account manager who guides you and you work with them through this document and you fill all the information, you like try to predict how you handle different situations. Basically you鈥檙e trying to do everything which allows you to sleep well if something goes wrong. Yeah, in addition to that, we have our own checklist where we cover communications, where we describe infrastructure, if it鈥檚 cloud easier, but if it鈥檚 not cloud then it鈥檚 harder, you need to have a schema of your infrastructure so you understand like how all this all things go there. But before that, as you mentioned, there are some phases. Usually I see some of the things that are usually I see such phases as like what we do before we try to practice cut over. Oh sure, yeah. Like when we鈥檙e really ready. So as you mentioned, you need to be sure that all your ideas finish up, everything is set up correctly and so also we have like this phase. After that, we have practice phase where we try to do some like few rounds of like practice when we try to do everything in the way as we will do during the actual cut over. Like when we鈥檙e done with that, we basically go live, but before we go live we need to predict and practice also our rollback process. So if something goes wrong, how we can go back and like what we do in this case and also we need to predict a little bit what we will do after we launch the project because just no not can just launch and say okay we launched. Yeah, we need to think like how we anyway even during like preparation we need to do a set of tests and checks, but we need to understand how we monitor things, how we detect anomalies if we have them, like how we tune our project when it鈥檚 live and so on. So we have also like a checklist of things like which we should do after the project. Right. And the last but not least, it鈥檚 like I鈥檓 trying to build and to help to build a like set of concrete tests which includes like a load test, maybe some automatic tests, plus we try to if you have a big project you try to split like test the manual test and if you have this phase. I鈥檓 on the team, I鈥檓 on the like area experts, if customer has a set of area experts because some people know lots about content, some people know how to do things. So on the customer side, that means that we place an order and we have for example customer support, so we need to practice like all this flow. Oh wow, okay. We need and integrations of course we are not like launching projects in the welcome, we launch projects which usually have lots of integrations, we can have like dozens of integrations, so we need to predict how all these things work together too and we need to include these guys as well and we probably can have a small checklist for each of the team which will also predict basically the same thing. So how what we need to switch, what we need to do during rollback and like what we have after we launched. Yeah, that鈥檚 very deep answer. All right, I鈥檓 gonna quickly share my screen real quick. This is an example of a Go Live checklist that I started probably eight years ago and this just so everybody knows this is as verbose as I鈥檝e ever gotten and I got most of this idea when I was working at Blue Acorns prior to 51黑料不打烊 and just every phase we actually started way out like 30 days prior to launch and then as we marched forward like Alexi was talking about we have different responsibilities like the 30 days pre-launch this is all like the integrations, testing and database management and then as we get closer it鈥檚 more like checking the payment processing to make sure they鈥檙e right, taxes and then seven days pre-launch we鈥檙e doing final UAT and we鈥檙e making sure stupid things like the favicon is set, making sure your caching is enabled. This is where like you were talking about you鈥檙e doing your testing at this point, you鈥檙e seeing are the pages still loading as fast as we expected them to or whatever. So I鈥檒l stop sharing but yeah this is just an example of how we launch and then our launch day we actually break it down to almost a second round of validation. We make sure that smoke tests are happening, the first couple of orders are being placed, all those nuances with your integrations make sure that they鈥檙e working and those aren鈥檛 developers anymore this is when it鈥檚 the BA and the QA team doing some of these. So like you were talking Alexi it鈥檚 interesting that everyone鈥檚 involved once you get to the launch phase and especially the post launch and once again with that unfortunate realization that rollbacks have to happen knowing what to do is pretty cool. So all right let鈥檚 move on to our next topic which is similar but it鈥檚 the management of like the different areas like DNS. So for ACS our kind of our rule of thumb is we don鈥檛 actually go into the customer鈥檚 account and manage our DNS for them.

Is it often that you find that your customers are like questioning what to do or do you often see any issues with like that type of configuration? Oh from my experience not that much. It of course depends and you need to you know you have like also some settings which allows you to quickly switch DNS like during the concrete launch day so it will not take like that much time. So what we use usually yeah but basically you need to prepare all the things you need to tell them well and what we what is the trick here that what I just recently used like we can even emulate this DNS switch. I mean yeah so in the in the simplest way like we have like each like we have a load balancer for the cloud but anyway like usually have specific nodes so we have guitar here like I work with here to understand how it works but we can really point like we can create like a fake domain locally and we can point it to the specific node but what is more interesting we can do this if we work with customers鈥� IT we can even point like our traffic when we for example type something in the browser we can point this traffic for the whole organization to our like to our cloud environment without having real DNS entry so we can have a local inside of network DNS which will allow us to emulate everything as it is. So basically you can use some temporary domains you can use some temporary domains but for me it鈥檚 like an old way like a new way it鈥檚 like when you emulate everything as it is. No I actually love that because so that鈥檚 right I saw that you popped on because I was going to kind of lean on you next one thing that I learned and it was on accident was we tried to cut over and I asked the guy to change the DNS and he did and you know about eight, ten, fifteen minutes later nothing happened and I go can you do me a favor can you tell me what the TTLs are before you made the change he said I don鈥檛 know it鈥檚 like said something like 60,000 or something I鈥檓 like oh god no are you kidding me so as Ryan did you have have you ever forgot to make that make the TTLs like five minutes before because I did man I was embarrassed we actually did launch but it took like four hours instead of one so do you have any experiences like that? I did I did to be honest with you thank you to bring this up this topic. To be honest with you I mean definitely the cut over the goal of the cultural plan the one you shared my experience was that we essentially learned that from mistakes so it鈥檚 there鈥檚 an item that we added at the very top that the TTL needs to be safe. Day one exactly so the seven day is the five day or three day as you go closer to the go and no go days that time you essentially make sure that TTL gets set in one occasion it happened to us as well outside of our adobe setting and the partner setting that TTL wasn鈥檛 set prior to that so that was that that was a bit of a you know a little bit of stress a little bit stress all right so this is the last one Alexia I鈥檒l get you off the hot seat here so as you prepare for launch when you get ready for because you you are calling them like practice runs or dry runs I know there鈥檚 not like a fixed number but like at what point do you trust the practice runs enough to where you can document okay the expected duration for turning off the old site allowing the deltas to transfer over and then we should be ready to go live like do you seem to do this three or four times are you talking like a dozen times actually like I have a plan usually like I鈥檓 sharing my screen now so like to be sure that everything is good like I have like like as I mentioned we start we usually start from this communication infrastructure like we do these things like to not want to concentrate on this but then we have this preparation activities so basically like before we practice any cutovers like we usually need to be sure that we already have everything ready so we need to be sure that we actually have our production in a production mode because usually during development you can have like you can even use production instance for development you have also domains fastly you mentioned that you have like the top property tail but you need to be sure that fastly is enabled in the correct mode and everything works well you know that you practice your data migration you have neuralic configuration on place you already prepared an exhibit load test and so you know that you like so you have lots of these things then you basically freeze your code and you鈥檙e starting to like starting to like practice real contour you like do all the validations you do all this imports you see on my screen like you have an you need to generate all the fields even check like search engine optimizations, crons like set up and clean the data because during for example load testing you can have like have some fake data you need to switch to clean database which will be ready for this fresh data migration unless you can yeah you actually completed what what you mentioned basically need to complete UAT and send off UAT yeah you can do this on what鈥檚 usually it鈥檚 it鈥檚 a good practice to have this on UAT environment and pro environment so you right you also like have not only things ready on your production server but you have all the chain so you know that like if something else goes wrong then you okay okay you like can do a quick fix you can validate it on Q environment move to UAT like prepare release maybe with something else and move everything there so you basically need to practice not only like you need to be sure that you have everything ready in terms of your application your hardware configuration and the flow for the team and on this uh on this point i think you鈥檙e ready for like to practice guitar and like if you like i see that i see that you shared like lots of points i shared lots of points and like this topic honestly like not that fun it鈥檚 really like a boring boring like topic and practice is like the most boring thing because if you have lots of actors like it鈥檚 like really boring but you need to and something looks obvious but you need to practice a lot and uh i think you are ready whenever you see that like you everyone everyone is so bored and like okay like everything is not let鈥檚 just do it yeah just working and they just know like all the actions and then you see okay like you know you鈥檙e like look you practiced you鈥檙e ready you know what side after what you know who is doing uh things after who and then yes and this is a point when you鈥檙e really ready and you can start like this cut over part yeah and then the actual cut over yeah so the one thing that i i stress um especially when i was leading projects um personally i always had my architects write a rollback plan and we all know that you don鈥檛 always need them in fact you probably never need them but if you go through that mental process and you go through the documentation process if something does happen at least you鈥檙e prepared right and i am great i鈥檓 glad that you you actually have a slide dedicated to this because i think that鈥檚 a feature that a lot of people will just skip over because they鈥檙e slightly confident because we all are right we鈥檙e all very confident what we鈥檙e doing but every once in a while something happens and if you don鈥檛 take this time to at least think about it and slightly or i always require documentation but like if you don鈥檛 do that practice like if you get stuck like it鈥檚 you鈥檙e gonna make mistakes right you鈥檙e just going to do something bad so um is there anything about the rollback plan that you find very interesting or um that you at least want to call out yeah maybe we can share also our experience together here but from my point of view uh in contrast to the uh practice and preparation i like to uh have rollback as simple as possible so which doesn鈥檛 mean that we should not predict something but we need to minimize uh like amount of actions we need to do right basically the best the best plan would be if you can just switch DNS back and everything starts working although in in reality it鈥檚 not that simple you need to be sure that you know you have all the integrations and you need to be sure that for example you cannot create an order which like change the order of your ids and the old systems you have duplications and like there were some questions so definitely you need to work with all this so during preparation you have all the cycles like for my experience like it鈥檚 good to have like at least few of them uh like and you have like as the same as you have for example like a month before the launch we can have a first cut over practice and then we can have them like more frequently and maybe around like five would be for like a like a medium-sized project would be nice i mean it鈥檚 maybe sounds a lot for someone but actually it鈥檚 good to have this amount and you can practice rollback in each of them so basically uh in this case id of situation when you just switch back uh dns and you are doing like sets of checks to validate that everything went well and you have all the uh all the plans for all the things which we鈥檝e created in the new system i mean if we already were able to play some orders how we can finish this orders how we can communicate with customers like if some customers created how we think them and what we鈥檒l do what will happen if we want to switch uh for example we failed our launch and what we want but we already have done something in the new system and we want to switch back in a few days because it was like we failed it because maybe something was not working perfectly but it was almost ready so maybe like team will be in a rush and they will finish something quickly and a few days we need to relaunch again we need to understand like how all the data moves like between oh sure sure ideally like what what what i would say would be good to have and we did this on our last project we created like a plan and possibility to communicate for all the services for both new and old systems and we had the middleware in this case it was delboomy and this middleware was a smart one which means that it was it allowed to you know to match not match but have two systems and communicate for example if you have order management system it can uh like work with all system and this new system and if we need to do some minimal changes for the data format it allows us to do this so it supported like data transformation and yeah and because of this like we did we had like totally not only order management system we had like total 20 integrations and for each of these integrations we created this plan like how we work with all the data so basically we were able to seamlessly switch from one system to another system and in this case it was and i think it鈥檚 a good practice to have something like this in this case it will be the easiest part unless uh you need to build a plan for all these things oh right at least you need to keep in mind that you have them okay it鈥檚 your work as an architect yeah our job it sounds easy but it鈥檚 it鈥檚 actually a it鈥檚 very stressful um all right so let鈥檚 just move on to the next phase right so we鈥檝e gotten through the launch we decided not to roll back and now that we鈥檙e live um we need to figure out like how the site鈥檚 performing and stuff like that so one thing that um adobe for commerce cloud they the support team created a custom dashboard and let me just make sure get the name right it was a custom dashboard called full stack observability and what鈥檚 interesting about it is i used to make my own dashboards and i was emulating what they were doing and finally somebody said uh dude you know we have all this done already you can just use the dashboard i鈥檓 like really so um i鈥檓 gonna quickly share my screen i鈥檒l just give you guys a quick synopsis of what it looks like and then um we can uh i can lean over to a star because he鈥檚 actually uh one of my sme for this uh particular topic um but basically like this this dashboard covers and unfortunately this is going to my dev environment so i don鈥檛 have a lot of transactions because i鈥檓 not buying a lot of stuff for my dev environment but you can just see all of the different uh tools that are built uh and all these different tabs like the infrastructure is probably the one that i go to the most where you can check out the cpu usage and the memory usage and then you can do your log aggregation and uh you can do synthetic monitoring and mobile uh which actually we should probably kick off one of our polls uh to give people some time to answer um but yeah if you haven鈥檛 seen this dashboard i would definitely recommend that people start looking for it um and then um this is actually a good time to hand it over to us rar uh can you talk to us a little bit about like the topic around maybe monitoring versus observability because that can be easily confused and i i just i know that this is kind of passionate for you so go ahead and let us know what your opinion is on that sure absolutely thank you um so with that i鈥檓 going to just share my deck just one second um just give me a quick technical housekeeping can you guys see my deck okay perfect so um so monitoring and observability we use try to use these two terminologies stay in the art basis interchangeably however there are significant differences between these two words i鈥檒l be starting with the monitoring so as um ras you rightly shared the screen the dashboard that is essentially gives a full stack observability right um so the subset of that would be monitoring so monitoring essentially based on the metrics and it is based on any any any significant amount of data at a point in time so you cannot actually observe prior to that so whatever is happening at a point in time you can see that uh metrics for different different services or uh detail limited data that鈥檚 been captured right and some of the monitors the one or the widgets in neolithic terminology the one we use tech we call it um and the widgets and they are based on the prior experiences so you can you rightly share those cpu load and all kind of stuff right these are absolutely good for being more reactive and these are also good for being more speculative so if you have disparate set of monitors in different dashboards you can connect the dots but they鈥檙e going to be much more time consuming right that鈥檚 what鈥檚 more speculative some of the assumptions that that monitors like individual monitors or widgets the one we take into assumptions that the application services infrastructure are more monolithic they鈥檙e not you know microservices um they鈥檙e single or stateful data stores the one we do have currently but with the SaaS services are different than the there are predefined number of nodes in the servers and rams and container right so these are the pre-assumption that we take when you talk about the monitoring the specific dashboard focusing on the specific area of the application or system down you鈥檙e having a view so with that information let鈥檚 just move to down to the next slide what we can monitor in the commerce out of the box using the neural a capability so as you can see my deck we can we can cover the maria db which is our database MySQL fork ready service rapid mq open search and what have you to this iu of cluster fs and log management now the log management covers the log of the commerce application and also the fastly logs anything that comes with a fastly layer that gets ingested to our neural lake interface for the specific commerce cloud project right so these are pretty interesting idea that i鈥檓 just bringing this on on the table that sometimes we launch the site and even the pre-go live phase we have understanding that neural liquid we have but we might not be able to utilize that however if you just spend some time prior go live like a week or something from the technical architect taking a lead point of view bringing bringing in your ba bringing your senior developer who understands or want to export that area then you can create that nice dashboard for you to you know better observe what is happening during the lotus phases right so you can connect those dots together and then the other aspect of the monitoring essentially the application experiences right so what we can do with neuralic we can essentially monitor or capture the telemetry data transaction for rest graphql or any endpoints the one is coming to the commerce application that comes from the the backend services and that also comes from the front-end experience as well so if we if you configure your browser agent in neuralic you should be able to capture the cumulative layout shift and all kind of you know the core web vitals data properly in the neuralic dashboard and you can also see the database query transaction monitoring as you can see this small screen tab which i鈥檓 sharing captured that shape shamelessly from another widget you can see this here you can also have the application monitoring view so it鈥檚 service monitoring how they鈥檙e kind of get together which i鈥檒l be sharing just in next couple of more minutes you can also see the abdex score abdex is kind of a score that kind of like a cheat code in neuralic data data between zero and one so zero being the worst one being the best performance that your application service is having so that is kind of like a you know just a one focus length that you can have to understand what is happening on your system we also have out of the box manage alert for adobe commerce which we give that to you when we provision your commerce cloud project it comes with that there are significant there are i think actually significant there are certain um the alerts being created for you automatically and there are certain thresholds i鈥檒l be sharing that screen ras as you鈥檒l be you know with having this conversational approach then we can also track the deployment in neuralic um you can configure that and last but not least you can actually take the advantage of nearly custom events api which can ingest data in the neuralic as you want want this to be that can be from your anywhere from your log in your custom code you can ingest the data from a specific functional area of your application what do you think how do you how do you know that鈥檚 great let鈥檚 uh let鈥檚 let鈥檚 talk a little bit and geeta if you don鈥檛 mind coming off of mute um when from what you know about like the the pre-built alerts um and uh as far you can chime in too there鈥檚 often times that you鈥檒l know other metrics that you want to track and alert on so like as far can you think of a custom alert that you鈥檝e done and um and maybe geeta if if you can we鈥檒l start with geeta can you cover the the managed alerts that we build for them already um and just talk about them high level and then maybe we鈥檒l have a star talk about like something he鈥檚 done custom sure um so managed alerts are a great way to so manage alerts is the basic templates that we give it to the customers when we when we provision the commerce projects right because the way that the adobe commerce uh project is built we want to provide you the templates where you can see that you have all the data that you want to monitor right for example cpu memory disk space uh anything anything that is particular for the overall infrastructure and as well as for every service like open search my sql all those intersection points which is very very critical for your website to be running so this template is just to give you an idea that these are the alerts that is visible to you and based on your need and based on your project how it is set up how your website is working you can tweak these these are necessarily not going to alert you when there is a critical problem this is going to be a noise in the initial stages because i we want to do that right because you got to learn when you launch it has to be a nice first and then you slowly slowly filter based on your tuning in the application and then there is a point where you would know oh this is my breaking point if this happens definitely is there鈥檚 going to be a problem into my application right so it鈥檚 going to be changing for each of the way that the project is set up or the code is being returned or integration with any of the third party so this is a very very broad threshold for now probably if you see like there is like 90 there is 70 as a warning but 70 probably nothing would happen if the cpu is 70 probably we are good right because the process might be running very good so you wouldn鈥檛 be so these are not related to actually a website down at this point in time but it is up to you guys once you start tuning tuning tuning tuning and then you will have one alert which would say this alert is my primary alert and that is what has to go to your pager duty on your site to see if the site is up or running no no no sacrificing the sleep nobody needs to monitor eyes on nobody wants to do that in this world so the tweaking is what is going to make your life easier and the templates we have provided already to make sure that you are capturing from here sure so estrar when you when you do custom alerts you use the the new route query language right like that鈥檚 one of the methods that you to build your own you want to talk a little bit about like what a customer alert might look like and and maybe if you want can you demo or just show off one of the what the new reliquary language looks like sure absolutely so as you can see my screen i鈥檓 just sharing the dashboard which is um observability dashboard for this account again this account doesn鈥檛 take much traffic so just apologize if the data is not filtering properly um so so what i can see it鈥檚 a nice disclaimer by the way so uh what i can do i can from this dashboard there are three little dots right and i can just tap on this dot and then i can see the view query and from the query i can just copy that um it鈥檚 kind of like a copy paste work but then i created my dashboard uh already for this demo so you can create this dashboard photo dashboard i can create a new dashboard from that interface then you can add this tap on this plus icon and then you can tap on this edit chart and then what you can literally do you can just literally paste that query which i copied before and here you go and you just run that and you鈥檙e going to give you that um you鈥檙e cheating dude you鈥檙e cheating that鈥檚 no that鈥檚 fantastic i guess i鈥檝e never i鈥檝e messed with this but it鈥檚 been a couple of years i didn鈥檛 realize how easy they made it that鈥檚 that is fantastic thank you no worries there is one thing i would like to highlight ross that鈥檚 that might be useful for the broader audience um one thing that basically can be utilized um maybe i鈥檓 going to just bring the cpu um metric just one second ah that鈥檚 a good one yep um and what i鈥檓 going to do i鈥檓 going to just copy that query yep and i鈥檓 going to copy query across and the same interface which i tapped on before to add the widget i鈥檓 going to paste it there i鈥檓 going to run that query as normal so that鈥檚 going to give me the cpu usage across all these nodes for this account then what i can do as geta mentioned that it鈥檚 70 not high right when you go up we can actually do some um the nifty stuff here we can go for um thresholds you can see threshold and you can say if my value is more than uh maybe i want to go for lab um the alignment should be fine so you can actually do uh if my um you know zero point say 20 and from 0.70 um this will be my actually let me just do something different let me just do 0.85 and um 0.95 right that that鈥檒l be my um actually the critical value right yep so there you go so you can set this up and then you can see that data will be transpired on this widget um just like the way it鈥檚 showing here let me show that to you what i mean it鈥檒l be coming up red or green right oh that鈥檚 what you can do that鈥檚 a really quick way just to visually call attention to something that鈥檚 that鈥檚 not happy right now exactly um that鈥檚 great so that actually kind of leads nicely to this next topic so you鈥檝e you鈥檙e looking in new relic because you know it鈥檚 three o鈥檆lock in the morning and you can鈥檛 sleep because you鈥檙e not listening to geetha鈥檚 advice and you鈥檙e staring at this dashboard and then all of a sudden something pops up it used to be when i was outside of adobe and i was working for an si um the advice from management at that time was as soon as you see a problem create a support ticket and i i fought that argument and i said the the reason why i don鈥檛 agree with that approach is that if you start a support issue without giving uh the support team enough information you鈥檙e going to have a lot of investigation that goes on on both sides so you鈥檙e duplicating effort and two you鈥檙e delaying the actual um intervention of solving the problem so so geetha when when what i got a blue acorn to do was i forced our dev team to do an initial triage and at least give three points of data right number one make sure that you did your data consent number two make sure that you鈥檙e talking to the right project and so you鈥檙e just double checking because i鈥檝e done that where you鈥檙e like you鈥檙e a new relic but you鈥檙e in the wrong project or whatever um and three give them enough information to where if somebody would pick up this ticket they don鈥檛 have to ask you a single question they know that your cpu is high and you looked and you couldn鈥檛 figure out why and so that was my that was my advice and i think that it鈥檚 starting to spread because i鈥檓 blue acorn had at least at one time had a really good rapport with support that they knew that when the support was tickers created you knew that the support team the l1 could literally pick it up and either escalate it right away because they knew that that first level of support was already done or two they could at least validate what was happening and then either proceed or give us a you know a link or whatever but can you walk us through like tickets and doing that that first round of triage like how important is it to you from your side of the fence sure not an interesting topic um i wouldn鈥檛 want any of you to raise a ticket this should run with no problems at all that鈥檚 what my expectation is nobody should raise a ticket everything that would be great yeah that鈥檚 what my dream world would be um but yeah so basic things that i would expect to see on a ticket so me sitting as a support engineer when i get a ticket what i would see right first thing since most of the si partners are are like working with multiple customers i would i would suggest that you choose the correct org id right that鈥檚 very very important because for example blue acorn let me pick that so blue acorn working with five customers and you have access to all five customers and your email id if you are using blue acorn email id not the customers ones you鈥檙e going to get all the five org ids that you鈥檙e working with so if you鈥檙e choosing the different one probably or ending up confusing the support maybe we will not see an issue where it was not supposed to be right so choosing the org id being an si partner is very very important and you got to be very very careful the security issues and all those things right either you want to use your blue acorn id or you still want to stick with the org id that鈥檚 your decision that you guys need to make and i would suggest that you have your organization different id um so that you don鈥檛 confuse things so that鈥檚 very very important the second thing is the time time zone right this is the world where we work across continents for a customer for a us customer right for example the site is in us somebody is monitoring from another continent and you raise a ticket and you tell me 8 30 pm i saw this alert what is at 8 30 pm right is it csd ist bst we don鈥檛 know right so that causes a lot of confusion back and forth we鈥檒l be asking hey what is the time zone hey what is it and all those things so it is always better to use utc and utc because even if you are in like daylight saving it doesn鈥檛 matter yes it鈥檚 very confusing doesn鈥檛 matter use utc so utc time zone giving it in the ticket description is very very helpful that would make the ticket to move faster okay you got the time you know what is the issue you get into your monitoring and you see what鈥檚 the problem right so that鈥檚 the first way first easy thing to do like org id utc the third important thing that i want to talk about is this environment right because you guys have for a project you have production staging one staging two staging three staging for whatever it is and you say my staging is broken okay and the support person goes and sees this like four staging environments so what do i check right so there is a drop down that we provide in the in the while you create the ticket we provide that option like choosing staging you need to choose whether it is staging two staging three it鈥檚 it鈥檚 painful to do but it is going to save you a lot of time at the end right while filling these details you might think oh my god i have to fill all in things when when things are on fire totally agree but these are the important things that would make us to talk the same language right you and the support people when we talk the same language at the same time it鈥檚 better the things move faster right so org id time zone and um your environment that鈥檚 what i would need for filling the ticket once you chose all these tickets you are going to describe the issue right you want to say oh my site is down or whatever it is my disk space is full or my cpu is full and all those things you got to be very very chris i鈥檓 not going to ask you like fill in a story or something and we don鈥檛 have time to do all those things right so what i would feel is that is where neuralik is very very helpful because that鈥檚 the same tool our support is also using so you you are no different you have the same monitoring tool that the support team also uses so that鈥檚 the benefit of adobe commerce club right we don鈥檛 have to this if there is two different monitoring people are using right and you talk about hey i鈥檓 seeing this and we have something else it doesn鈥檛 align it just unnecessarily waste time but adobe commerce support uses the same tool that you guys are using so a link or a screenshot that would make a whole lot of difference hey you know what this is the neuralik link i鈥檓 seeing this error because i can go to that link i can even see what is that query what is your alert threshold when did you got alerted what is that time and everything right so it would be very helpful so when you鈥檙e describing an issue wherever possible i鈥檓 not saying like every time you need to do this but if at all possible with your initial triaging that you are able to capture something that you want to talk to us i would say explain it and probably neuralik link or screenshot would help things move faster right because we we鈥檙e going to say see the same same thing with that neuralik so but just speaking of that if you use the new relic there鈥檚 a link right on the report that is exactly the right because that鈥檒l also take them to the right point in time too right because it鈥檚 right okay right yeah that鈥檚 the permanent link that鈥檚 called the permanent link where you see that if you go to any of your monitoring any of your alert even you don鈥檛 need to create your alert right when you go to the apm service you see a spike in there just copy that paste it that would make a whole lot of difference because the support would understand oh this is what they鈥檙e seeing okay i鈥檓 seeing it right right and even if you mess up the other things of environment and all other things this will help you know what this link is talking about something else and you created for something else oh there is a difference we will we will come back to you right so that would obviously make a whole lot of difference if you are able to give us a screenshot or or or um neuralik link based on your initial triage right or worst case the the new relic query language the yes the query or anything yeah yeah yeah yeah yeah you can copy that yeah all right so there鈥檚 not a whole lot of work it鈥檚 just that you make everybody鈥檚 life a simple so no that makes sense we got about uh 10 minutes um so did you we should probably you want to kick off the second poll did we even do the first one probably didn鈥檛 even do the first one that鈥檚 fine um so if mando there you go uh we鈥檒l let everybody answer that and we鈥檒l uh we鈥檒l keep moving on um so the um the next uh topic is kind of it鈥檚 definitely in geetha鈥檚 still her wheelhouse um i just wanted to bring up and we鈥檒l only talk about this just for about three or four minutes um the idea of a shared responsibility and i know most of our partners do understand this but it is important just to bring it up every once in a while because we might have a new a new si that has joined can you just give us like that 10,000 foot level of what that actually means and then i will have uh links to experience league and stuff where we actually break it down but if you don鈥檛 mind just covering that real quick two three minutes shared responsibility model is the toughest job ever in my life so um the the main confusion between the way the the gray line disappears between the responsibility is two things one is on your database restoration right we just need to understand the the the difference between what we can do and what you can do we can only give you a snapshot for you to restore but we cannot touch your data to restore it on your database because we don鈥檛 know what is your customer data what has to be removed what is like need to be sanitized and all those things so any point in time adobe customer support wouldn鈥檛 touch the database of the customers right you like we will give you the snapshots we will give you the backups we will be able to assist in all those things but we will not touch your data so database dump restoring it from the backup and everything is going to be on your responsibility the second gray area is the security right hey you know what you are hosting the application you are a past service how do we do with the security and all those things yes adobe commerce is entitled to provide security using fastly waft and as well as we have the infrastructure waft we have other security things in place but you need to differentiate between a security issue based on the application code and the security issue based on the overall infrastructure and the code application right adobe commerce releases patches releases security bulletin whenever there is a severity vulnerability found on our core code right for example our application code had some bug and that is that has created a cve adobe commerce owns that we will release it in the security building we will give you the patches and that鈥檚 on us but if there is a code that is written by your team and that is causing some vulnerability like exploitation with sql or anything like that that is on you right so there is a gray area there where we need to be adjusting those so those are the two areas mostly that you guys will be confusing with the shared responsibility model but otherwise raso should be able to provide you those docs which which is like straightforward yeah and it鈥檚 it鈥檚 fairly clear but it unfortunately you still have to you have to spell it out okay uh last topic for the last five-ish minutes so this will be the whole the whole group um when you guys are considering like um because once again we鈥檙e also trying to offer best practices when you guys are preparing and you鈥檙e thinking about um the overall project is there any best practices that you guys want to pass on or or maybe just some pro tips like when you鈥檙e getting ready to prepare for a site um and i guess i鈥檒l start with my own best practice make sure that you do your fair amount of testing of all of your peripheral systems for example your payment processor because what i found one time this is one project we were doing our dry run and we were doing our cut over and when we did the practice of the payment processor what happened was we got four or five orders to successfully process but then every single order that after that failed and so it was really weird because it worked and then all of a sudden it just completely failed and so we did like five or six more in a row and it failed fail fail and it was a whole bunch of errors and it said duplicate id and i鈥檓 like i don鈥檛 understand and then i got to thinking oh we never reset the order ids so when it went to paypal the first couple work because the the cloud has a three counter increment but somehow we got back in sync and then it caught up to a previous test that we did and so that was a huge lesson learned if you鈥檙e going to be doing this your dry runs have to account for changing your order ids to make sure that an external service at some point doesn鈥檛 throw a weird error that you weren鈥檛 you know prepared for so let鈥檚 let鈥檚 have a sir can you give us a pro tip or something that you鈥檝e learned that you鈥檇 like to pass on for our partners here um i think the one you brought this up it it is a maybe quite common case um it happened to us on stripe um so just to bring this up um the one thing that um what we generally do in the go live cultural plan you have to you have an item to reset um that um the id and essentially the other thing that you have also have to make sure that you connect with the right environment to the payment processor so there are different different sandbox and then then the api key gets changed right so you have to make sure that each environment connects to the same um the corresponding environment to the payment processor side these are the stuff that basically you learn from the experience but as you go through then you capture that in your goal yeah as you go through and then they learn through that i think that鈥檚 the only thing i can add up to this point it was a few all right uh so alexi is there anything that you鈥檝e learned or any other approach it doesn鈥檛 have to be about payment processes but anything else that you鈥檇 like to hand off yeah sure so uh like you can go here to in two ways i mean you can uh like predict how your system will work with new ideas and how you switch back because we need to predict also roll back so like i mean you can have the service in the old system if you were able to create like some ideas in the future when you switch back and this is already not in magento but in your like system like in your payment process for example so uh i would like what i usually do i鈥檓 trying to predict this uh switch sometimes trying to see if we can stay with the payment process with the same environment or we can have a new environment on the payment process so we can use this simple as this or we can even have like what i did with my previous project like the some integrations where it was not possible i was trying to build like a buffer of ideas for example two years buffer so like a new system go and write uh ideas in the future like two years in the future and current system like writes like in the regular way but we have this two years like a window into of two years which is like available for us and which like has like possibility to split them uh this one thing uh like you need to see like how you work with the system also what is a good practice for me it鈥檚 build something like a circuit breaker pattern when you you know you predict like what happens if something goes wrong and if your payment processor doesn鈥檛 work at all like do we have some alternatives like for example if your tax system not working you have like out of the box uh like like standard magento uh like taxation system it鈥檚 like a fallback mechanism yeah yeah so like a fallback the same as for example for external source systems and so on so you build like like it鈥檚 it鈥檚 needed to build to be built on the architecture uh stage of course but it鈥檚 really important to have this inside of a guitar and also practices during cadaver so if something goes wrong you know how what to do or if something is overloaded you know what to do so that鈥檚 that鈥檚 a really good tip i i love that i i guess i forgot that having a backup if your api is down for whatever if you can flag it stop using it kick off an alert and then use your backup mechanism until you can triage it but i i鈥檝e done that in the past but i totally forgot that鈥檚 a brilliant pro tip i love that one i get that i鈥檓 gonna put you on the spot i know that you鈥檙e not a developer and you don鈥檛 lead a lot of projects but um is there anything that you鈥檝e seen recently or just in your past that are it鈥檚 it鈥檚 i wouldn鈥檛 say common but you鈥檝e seen it enough that you could at least give an si a pro tip on what to avoid to have a better you know launch or quickly you know post launch yeah i鈥檓 not going to talk architect language i鈥檓 going to talk the operation language right so perfect yeah so the the major uh bottleneck that you would see because you would have run all the load test all the performance test on your lower environment and then you bring in the production and you see the performance is not so good something breaks right so how what is happening between that performance test and this you have to give a room for errors on the production right because you don鈥檛 have the back office you don鈥檛 have the import jobs when you do a load testing you just do a straightforward testing right there is a script that that comes into your home page select something places an order that鈥檚 not the real world of how things are going to work in production there鈥檚 going to be a lot of back-end jobs that you would put in and when you launch the production site it is just doing a cash warming it is not like what you had in your staging or anything where you have done multiple tests the cash is already in there so you have to give some room so for example you are able to hit 20k orders in like 15 minutes on your staging without any blip i would say you plan it for 15k or 12k on production because there are other operations on the protection that鈥檚 going to take that piece right so you got to be very careful when you plan that and you cannot i mean if your if your site needs like 20k per 15 minutes you have to test it for 30k in staging so that鈥檚 why so that鈥檚 the operational bottleneck that you would see when you move from your staging and production and people will complain like hey what happened i鈥檓 not seeing that in the staging the staging load testing was all good you guys wanted us to do the test we did it and it鈥檚 all good and production is breaking you guys need to do something so that is where you try to explain you know what there are multiple operations of production which is different in staging so you got to give a room of like 15k or like 5k of orders per minute or per hour in your load test yeah because it makes sense right because production and staging are very similar if they鈥檙e not identical but those integrations the crons those back office like you said those are just unaccounted for because they鈥檙e they鈥檙e kind of hidden they鈥檙e behind the scenes right so totally totally agree that鈥檚 a really great one and i i think we鈥檙e good i don鈥檛 want to open up any more topics because we鈥檙e red logs so joe did you want to come off and give us a closing yeah thanks russell and team that was an amazing dialogue today i really enjoyed the best practices you all shared from your expertise and i i know many of our partners have probably encountered a lot of the scenarios and situations and for you our partners hopefully this was another informative session our team has posted a survey in the chat i would definitely ask you to encourage you to take a few moments give us feedback as we opened up this session this is a newer format newer format thank you for those who鈥檝e already given us feedback on sessions one and two but always looking to hear from you our partners on how we can improve these these virtual events in addition remember this is part three or four we have the fourth and final event next week where we have another partner to discuss the best practices and some of their experience with edge delivery service when it comes to commerce it鈥檚 some of the newest edge delivery service capabilities when you think about commerce storefront so really excited to close out the four-part partner implementation series but again thank you to to russell and all of our guests today and most importantly thank you to all our partners for spending your time with us today have a great rest of the day

All the webinars in this series

recommendation-more-help

3a5f7e19-f383-4af8-8983-d01154c1402f