Table of Contents

Team:

Zahraa Sabra, Ali Alawieh, Fadi A Zaraket, AbdulRahman Bizri

“Superbugs” are a major worldwide concern owing to the increasing rates of bacterial antimicrobial resistance. Most interventions, to date, have done little to check this trend, and nternational agencies are acting to prevent an “Antimicrobial Armageddon”. Lately, the use of computational tools to handle large biological datasets gained robust attention with yet few applications in bacteriology. Our work presents and validates a novel hybrid method for understanding and predicting the progression of bacterial resistance at the population level using structural and probabilistic computational models. The method takes advantage of advances in computational modeling and data visualization techniques to develop new methods for the study of antimicrobial resistance. This allows investigators to understand patterns of increasing resistance, predict near-term future progressions, and orient infection control and antimicrobial stewardship programs. This paves the way for a new area of epidemiological research in microbiology using computational modeling.

This work include different files that need pre-configuration and a software installation to work:

- TableData.m file to generate the pre-processed data: this needs as input 3 excel files:
- Original excel file containing the data in the format of the file 'InitialData.xlsx'
- Antimicrobial excel file in the format of the file 'antimicrobialList.xlsx'
- Bacteria excel file in the format of the file 'bacteriaList.xlsx'

- DistInfo.m and DistInfoNonUrine.m files to generate the distribution similarity among different bacteria-antimicrobial combinations for different site conditions. It only needs the TableData.m to be executed first in the same directory of these two files.
- GUI for structural model:
- Download graphviz-2.30.1. For the dotty application, to get rid of the circle on the edge, go to the file dotty.lefty in the folder Graphviz2.30\lib\lefty\ and edit the file dotty.lefty and change the line that says: 'edgehandles' = 1; to 'edgehandles' = 0; it's around line 110.
- Make sure that TableData.m is already executed the same directory of the ABResistance.m file.
- In the ABResistance.m file change the paths:
- C:\Users\Hassan\Downloads\..\bin : Replace it with the path where you installed the bin folder in graphviz
- C:\Users\Hassan\Documents\Matlab : Replace it with the path where you want the output file to be set. This output file is automatically generated to draw the graph so the user should not worry about, but you just have to indicate where you want the GUI to create it.

- GUI for HMM validation and prediction: just make sure that TableData.m is already executed in the same directory of the ABResistanceHMM.m file.

The application of our work was written in Matlab and the open source code is available below.

The collected data from papers are recorded in excel sheets in the following format that shows the relevant features that allow the study of the bacteria antimicrobial relation.

The excel file that contains the collected data is named “InitialData.xlsx”.

The required fields in each entry in the excel are:

- Reference: indicates the paper number (each paper has a unique number that maps to it). It is indicated to refer to it in case we need further information, and to know the source of the information.
- Location: represents the hospital/center/lab/etc. from which the studied samples were taken.
- Start month, start year, end month, and end year: since the antimicrobial resistance doesn’t change from day to day, and since up to 6 month difference in dates doesn’t affect the results, we approximated the dates in a way to have start month and end month to be either 1 or 7 (January or July).
- Site: reflects the site from where the samples were taken; like urine, blood sample and CSF, clinical samples, etc…
- Studied bacteria: indicates the name of the studied bacteria
- Number of studied isolates: represents the number of the studied samples for a given bacteria and antimicrobial and for the above listed features.
- Studied antimicrobial: mentions the name of the studied antimicrobial on the samples.
- Number of resistant isolates: contains the number of isolates that show resistance on the mentioned antimicrobial.
- Percentage of resistant isolates: it is the percentile quotient of the number of resistant isolates over the studied number of isolates.

In order to optimize the execution of the code in MATLAB, for each column containing string values we associate numerical values that maps to the strings such that similar strings have similar numerical equivalent value, and different strings have different numerical equivalent values. This would fasten the comparison since string comparison is slower than numerical comparison.

On the other hand, we built up two matrices representing respectively the antimicrobial and bacteria features, all in numerical values.

The bacteria features are the bacteria name, the genus, the species, the group, and the preset category.

The antimicrobial features are the antimicrobial name, the group, the subgroup, the variant, and the preset category.

The user can select to study the AMR data based on a set of antimicrobial and bacteria features. The methods map the selected set of features using a filter on the matrices to extract the matching records.

The MATLAB file “TableData.m” generates the matrices of the data, the bacteria, and the antimicrobial using the excel files “InitialData.xlsx”, “bacteriaList.xlsx”, and “antimicrobialList.xlsx”. The generated matrices are saved in the file “DatabaseTable.mat”.

We implemented two models; the structural and the behavioral model. The output of the preprocessing step is the input of the two models below.

1-Structural model

When a user selects features of antimicrobial and bacteria from the structural GUI, the name fields of the antimicrobial and bacteria will include all names that match the selected features based on the predefined matrices. This is done by collecting the names of the antimicrobial and bacteria that match the features in the antimicrobial and bacteria matrices.

Then the user selects the sites he wants to study. From the generated matrix representing the whole database, we collect the entries that match the antimicrobial names, bacteria names, and studied sites. This is done using the Matlab file “SelectFromR.m” as part of the structural GUI file “ABresistance.fig”.

The new generated matrix that satisfies the selected conditions of antimicrobial, bacteria, and site is now ready for work:

- First order the entries based on the studied dates from the oldest to the newest.
- Then since we want to study the trend of AMR over years we will use a unit of one year. So the entries starting on July are changed to January of the same year, and the entries ending on July are changed to January of the next year.
- For the entries having an interval of studied time more than one year, the entry is split into a number of entries equivalent to the number of years. All the records of the cells of the original entry are left the same except for the total number of isolates and the number of resistant samples it is divided by the number of years. Steps 1 through 3 are done in the Matlab file “SlotTable.m” as part of the file “LumpMatrix.m”.
- Then if more than one entry of a given reference has the same date with different sites, we make them one entry and we add up the samples and the resistant samples and we calculate the percentage resistance for the new entry.
- If the user chooses from the GUI to lump the references, we do the same for the entries having the same starting and ending dates with different reference as we did in step 4.

Steps 1 through 5 are executed in the Matlab file “LumpMatrix.m” as part of the “ABResistance.fig” GUI file.

After preparing the data, now we can compute the difference resistance between two entries which are consecutive in time or with a difference of up to 5 available dates that comes after the date of the studied entry.

The GUI generates a graph that reflects the structure of the AMR over time. The graph is built up of nodes and edges connecting the nodes. In the graph a node represents the number of isolates studied over a year, along with the value of resistance during the specified interval of time between the two mentioned dates. The dates are presented as starting month and year and end month and year, where the month is here January since we are assuming that the unite of date is one year. As for the edge, the label indicates the difference of months between two nodes along with the difference of resistances between the two nodes. This difference may be negative if the resistance decreased from a node going to another one. Also, the difference is not a straight forward subtraction of values between the nodes. The description of the edge label will come shortly.

For a given node i, we calculate in months the average of the date between the start date and end date in the node, let’s say it is t(i) . In our model, the graph can visualize up to five differences relationship. The nth difference of time for the edge pointing to node i is calculated as follows:

Δ(n)t(i)=t(i)-[(t(i-1) +t(i-2)+..+t(i-n-1))/(n-1)]

And the nth resistance difference for the edge pointing to node i is calculated as follows:

Δ(n)R(i)=t(i)-[(R(i-1) +R(i-2)+..+R(i-n-1))/(n-1)]

The user can choose to visualize the first, second, third, fourth, and/or fifth difference AMR. The importance of such visualizations is to regard the evolution of the resistance difference over years.

Users may choose to visualize the AMR for a site or to lump the results over multiple sites. This may be done to visualize if the AMR differs depending on the selected site for a given antibiotics-bacteria combination.

“graphviz” is used as tool to visualize the graph of AMR trend over years. After specifying the content of the nodes, the connected nodes and the labels of the edges connecting those nodes, the code that generates the graph is written in an output text file (in our case its name is “output.txt”). Then we run from Matlab the dotty application and we pass it the output file in order to visualize the graph. The visualization of the structural graph is done when pressing on the button in the structural GUI.

The generation of nodes, edges, labels, and other details related to the graphviz visualization (like title, colors of the edges, etc.) is done in the Matlab file “GenGraphvizLumpNew1.m” as part of the GUI file “ABResistance.fig”. The generated graph starts with a Start node and ends with an End node so that we will have a connected graph that is contained between the start and the end nodes.

In the graph, for the selected differences, if the resistance difference on an edge is bigger than 90% of the variance of resistance differences over similar edge differences, the node to which this edge point to is colored red to indicate that a jump happened and that the resistance rose unexpectedly faster than before and/or after.

On the opposite side, if the resistance difference is less than -90% of the variance of resistance differences over similar edge differences, the node to which this edge point to is colored green to indicate that a great resistance reduction happened and that the resistance diminished unexpectedly faster than before and/or after.

For both cases the alarm encourages scientists to refer to the period of abrupt changes to analyze what could have happened to influence dramatically the resistance difference. This would lead to hypotheses of some historical/medical/environmental/economical/human interference/etc.. that may have caused the major changes on AMR.

The importance of the graph, in addition to what has been mentioned till now and the tracking of abrupt changes in AMR, is the ability to visualize the relationship not only between one specific antimicrobial and one specific bacteria from the table; rather the data is concatenated and aggregated for a set of features of antimicrobial and bacteria, which allows the visualization of the results over a wider range and over a holistic view.

Moreover one can choose to see one of the differences up to five differences where a flow may seem clear if the studied AMR was regarded over 3 difference date intervals for example and still be ambiguous if regarded over a one difference graph.

Apart from the GUI, but using the file “DatabaseTable.mat”, and based on the behavior of the AMR over years, we can track relations among antimicrobial-bacteria combinations that could lead to the discovery of similar genetic background, pattern recognition, for the related bacteria behaviors against antimicrobial. In order to visualize the antimicrobial-bacteria combinations and relations we did the following:

- For each possible combination of one bacteria and one antimicrobial we recorded the edges values into vectors. Each vector represents the edges of a specified combination.
- Then the dendrogram for the vectors having same lengths is plotted. The minimum acceptable length to do the comparison is four, so we neglected the combinations having smaller vector length.
- Since the comparison of AMR differs depending on whether we are taking the sample from urine or from any other site (called nonurine), we presented the dandrogram for both cases.

The steps 1 through 3 can be executed from the Matlab files “DistInfo.m” and “DistInfoNonUrine.m”

2-Behavioral GUI

The Hidden Markov Model (HMM) was selected to predict the evolution of AMR over one year based on the given history.

The file “DatabaseTable.mat” is used since it contains the preprocessed data as historical background to train the HMM, to predict the next year resistance, and to validate our model. Since the HMM is already explained in the text, we will only go into the technical steps to generate the HMM scores to come up with the next year predicted resistance.

In the behavioral GUI, a user first selects the features of antimicrobial, bacteria, and site to study. Then he chooses a threshold value after which the expected resistance is classified beyond the acceptable medical value. Thus if the predicted next year resistance was above such threshold the GUI will color it red to indicate to the physician that it is not recommended to use the selected antimicrobial set to fight the selected set of bacteria.

The user can choose the statistical mode to use for the predicted scores. He can select it to be permissive, moderate or restrictive. Also he can select whether he is using the model to validate its performance against the actual last recorded resistance, or to predict the expected next year resistance. The next year resistance is the resistance of the year next to the last entered year in the excel sheet. We will shortly explain the difference among these modes.

When the user presses on the button “Generate HMM Score” the following is done:

- First, a matrix containing the entries satisfying the selected antimicrobial features, bacteria features, and sites is generated using the Matlab file “SelectFromR.m” as part of the behavioral GUI file “ABresistanceHMM.fig”.
- The same work done in the structural model to sort the data based on their date, and divide the entries that are over more than one year into entries of one year, and lumping the sites and references is done here also using the same files: “SlotTable.m” as part of the file “LumpMatrix.m” and “LumpMatrix.m” as part of the “ABResistanceHMM.fig” GUI file.
- In the new preprocessed matrix, we quantize the resistance values to the nearest multiple of five number to be able to process the data adequately, since the presence of decimal numbers (as resistance values) may complicate the HMM training and prediction because it would significantly raise the number of possible emissions. For example, and since the difference resistance range lies between -100 and 100, if we have a unit difference of 5 between two consecutive resistances the total number of emissions would be 41. Whereas if we have a unit difference of 0.1 between two consecutive resistances instead of 5, the number of emissions would be 2001.
- Now the data is ready for training the HMM:
- Generate all the possible observations of month difference and resistance difference for the given data using the Matlab file “GenerateAllSets.m” as part of the behavioral GUI file “ABResistanceHMM.fig”.
- Then input the observations to the HMM train that Matlab has as built in function along with the set of possible emissions. This is done using the Matlab file “TrainHMMNew.m” as part of the behavioral GUI file “ABResistanceHMM.fig”.

- After training the HMM, predict for the next year the score of each resistance (0%, 5%, 10%, .., 100%). And based on the score the predicted resistance is calculated as explained in the paper. To get the HMM score for each resistance we calculate the probability of the new observation sequence given by the HMM, let’s call it P(B1,B2,..Bn),and that of the first n-1 observation sequence which is P(B1,B2,B(n-1)), so that we can get the probability P(Bn) since the observations are mutually exclusive:

P(Bn)=P(B1,B2,..Bn)/P(B1,B2,B(n-1))

- Depending on the used mode (permissive, moderate, or restrictive), we choose a value α to be respectively 0, 0.01, or 0.1.The final HMM score value for a given resistance is given by:

ScoreHMM = P(Bn)-α^(|threshold-expected Resistance|/5 +1)

The Matlab code for the 5th and 6th steps is present in the file “ABResistanceHMM.fig”.

Note that for the validation of our model, for a given n entries we train the HMM over n-1 entries. Then we predict the HMM score for the next year and we compare the predicted resistance to the actual one recorded in the excel sheet.

Please note also that the HMM need at least a data set of more than five years to work correctly, so make sure that the selected set of antimicrobial, bacteria, sites give this minimum required number of years.

The data used in the Matlab applications were taken from the following papers:

[1] P. Santanam, G. Morenzoni, and F. Kayser, “Prevalence of antimicrobial resistance in *Haemophilus* *influenzae* in Greece, Israel, Lebanon and Morocco,” *European Journal of Clinical Microbiology and Infectious Diseases*, vol. 9, pp. 818-820, 1990.

[2] G. F. Araj, M. M. Uwaydah, and S. Y. Alami, “Antimicrobial susceptibility patterns of bacterial isolates at the American University Medical Center in Lebanon,”* Diagnostic microbiology and infectious disease*, vol. 20, pp. 151-158, 1994.

[3] M. Uwaydah, M. Jradeh, and Z. Shihab, “Antimicrobial resistance of clinical isolates of Streptococcus pneumoniae in Lebanon,” *Journal of Antimicrobial Chemotherapy*, vol. 38, pp. 283-286, 1996.

[4] M. Hamze and D. Sarkis, “Etude bicentrique de la sensibilité des sérotypes de *Pseudomonas aeruginosa* aux antibiotiques au liban,” Médecine et maladies infectieuses, vol. 28, pp. 668-672, 1998.

[5] G. Araj, H. Bey, L. Itani, and S. Kanj, “Drug-resistant *Streptococcus pneumoniae* in the Lebanon: implications for presumptive therapy,” International journal of antimicrobial agents, vol. 12, pp. 349-354, 1999.

[6] M. Hamze and D. Izard, “Sensibilité des entérobactéries aux antibiotiques. Situation en 1997 au Nord du Liban,” *Médecine et maladies infectieuses*, vol. 29, pp. 527-531, 1999.

[7] W. Kalaajieh, “Epidemiology of human brucellosis in Lebanon in 1997,” *Médecine et maladies infectieuses*, vol. 30, pp. 43-46, 2000.

[8] T. Shaar and R. Al-Hajjar, “Antimicrobial susceptibility patterns of bacteria at the Makassed General Hospital in Lebanon,”* International journal of antimicrobial agents*, vol. 14, pp. 161-164, 2000.

[9] M. Zouain and G. Araj, “Antimicrobial resistance of enterococci in Lebanon,” *International journal of antimicrobial agents*, vol. 17, pp. 209-213, 2001.

[10] A. I. Sharara, M. Chedid, G. F. Araj, K. A. Barada, and F. H. Mourad, “Prevalence of *Helicobacter pylori* resistance to metronidazole, clarithromycin, amoxycillin and tetracycline in Lebanon,” *International journal of antimicrobial agents*, vol. 19, pp. 155-158, 2002.

[11] Z. Daoud and N. Hakime, “Prevalence and susceptibility patterns of extended-spectrum betalactamase-producing Escherichia coli and Klebsiella pneumoniae in a general university hospital in Beirut, Lebanon,” *Rev Esp Quimioter*, vol. 16, pp. 233-8, 2003.

[12] M. Hamze, F. Dabboussi, W. Daher, and D. Izard, “Antibiotic resistance of Staphylococcus aureus at north Lebanon: place of the methicillin resistance and comparison of detection methods],” *Pathologie-biologie*, vol. 51, p. 21, 2003.

[13] M. Hamze, F. Dabboussi, and D. Izard, ”[A 4-year study of Pseudomonas aeruginosa susceptibility to antibiotics (1998-2001) in northern Lebanon],” *Médecine et maladies infectieuses*, vol. 34, pp. 321-324, 2004.

[14] J. N. Samaha-Kfoury, S. S. Kanj, and G. F. Araj, “In vitro activity of antimicrobial agents against extended-spectrum β-lactamase-producing *Escherichia coli* and *Klebsiella pneumoniae* at a tertiary care center in Lebanon,” *American journal of infection control*, vol. 33, pp. 134-136, 2005.

[15] S. D. Karam, A. Hajj, and A. Adaimé, “Evolution of the antibiotic resistance of Streptococcus pneumoniae from 1997 to 2004 at Hôtel-Dieu de France, a university hospital in Lebanon],” *Pathologie-biologie*, vol. 54, p. 591, 2006.

[16] M. Uwaydah, J. E. Mokhbat, D. Karam-Sarkis, R. Baroud-Nassif, and T. Rohban, “Penicillin-resistant *Streptococcus pneumoniae* in Lebanon: the first nationwide study,” *International journal of antimicrobial agents*, vol. 27, pp. 242-246, 2006.

[17] S. S. Kanj, O. El-Dbouni, Z. A. Kanafani, and G. F. Araj, “Antimicrobial susceptibility of respiratory pathogens at the American University of Beirut Medical Center,” *International Journal of Infectious Diseases*, vol. 11, pp. 554-556, 2007.

[18] M. Borg, V. De Sande‐Bruinsma, E. Scicluna, M. De Kraker, E. Tiemersma, J. Monen, and H. Grundmann, “Antimicrobial resistance in invasive strains of Escherichia coli from southern and eastern Mediterranean laboratories,” *Clinical Microbiology and Infection*, vol. 14, pp. 789-796, 2008.

[19] I. Saleh, O. Zouhairi, N. Alwan, A. Hawi, E. Barbour, and S. Harakeh, “Antimicrobial resistance and pathogenicity of Escherichia coli isolated from common dairy products in the Lebanon,” *Annals of Tropical Medicine and Parasitology*, vol. 103, pp. 39-52, 2009.

[20] G. Sawma-Aouad, F. Hashwa, and S. Tokajian, “Antimicrobial resistance in relation to virulence determinants and phylogenetic background among uropathogenic Escherichia coli in Lebanon,” *Journal of Chemotherapy*, vol. 21, pp. 153-158, 2009.

[21] N. El‐Najjar, M. Farah, F. Hashwa, and S. Tokajian, “Antibiotic resistance patterns and sequencing of class I integron from uropathogenic Escherichia coli in Lebanon,” *Letters in applied microbiology*, vol. 51, pp. 456-461, 2010.

[22] A. Hannoun, M. Shehab, M.-T. Khairallah, A. Sabra, R. Abi-Rached, T. Bazi, K. A. Yunis, G. F. Araj, and G. M. Matar, “Correlation between Group B Streptococcal genotypes, their antimicrobial resistance profiles, and virulence genes among pregnant women in Lebanon,” *International journal of microbiology*, vol. 2009, 2010.

[23] P. F. Abou Khalil, “Occurence of y-hemolysin and panton valentine leukocidin genes and antimicrobial susceptibility patterns of staphylococcus aureus isolated from clinical samples in Lebanon.(c2007),” 2011.

[24] W. Bahnan, F. Hashwa, G. Araj, and S. Tokajian, “emm typing, antibiotic resistance and PFGE analysis of Streptococcus pyogenes in Lebanon,” *Journal of medical microbiology*, vol. 60, pp. 98-101, 2011.

[25] Z. Daoud and C. Afif, “Escherichia coli isolated from urinary tract infections of lebanese patients between 2000 and 2009: Epidemiology and profiles of resistance,” *Chemotherapy research and practice*, vol. 2011, 2011.

[26] M. J. Farah, “Antibiotic resistance patterns and characterization of class I integron in uropathogenic escherichia coli in Lebanon.(c2008),” 2011.

[27] D. M. Haddad, “Antibiotic susceptibility patterns and detection of genes for enterotoxins and toxic Shock Syndrome Toxin-1 in Staphyloccoccus aureus involved in human infections in Lebanon.(c2007),” 2011.

[28] N. G. Issa, “Antimicrobial susceptibility testing and identification of exoS and exoU toxin genes of Pseudomonas aeruginosa isolated from clinical samples in lebanon.(c2008),” 2011.

[29] T. Sima, H. Dominik, A. Rana, H. Fuad, and A. George, “Toxins and Antibiotic Resistance in Staphylococcus aureus Isolated from a Major Hospital in Lebanon,” *ISRN microbiology*, vol. 2011, 2011.

[30] R. Hanna-Wakim, H. Chehab, I. Mahfouz, F. Nassar, M. Baroud, M. Shehab, G. Pimentel, M. Wasfy, B. House, G. Araj, G. Matar, and G. Dbaibo, “Epidemiologic characteristics, serotypes, and antimicrobial susceptibilities of invasive Streptococcus pneumoniae isolates in a nationwide surveillance study in Lebanon,” *Vaccine*, vol. 30 Suppl 6, pp. G11-7, Dec 31 2012.

[31] K. Imad, H. Monzer, and D. Fouad, “Molecular Characterization and resistance of H. influenzae isolated from Nasopharynx of Students in North Lebanon,” *The International Arabic Journal of Antimicrobial Agents*, vol. 2, 2012.

Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported