Research Article - (2023) Volume 14, Issue 6
Cigarette chemicals are harmful to Deoxyribonucleic Acid (DNA). Cells have a strict time repairing DNA damage due to cigarette toxins. Additionally, they break the DNA regions that guard against cancer. Cancer is caused by the accumulation of DNA damage in one cell over time. There are around sixteen cancers which cause risk to human beings due to smoking as follows-cancer of the lung, cancers of the mouth (Squamous cell carcinomas), throat, nose, and sinuses, cancers of the esophagus, cancers of the bladder and ureter (Urothelial carcinoma/transitional cell carcinoma), cancers of kidney (Renal cell carcinoma), cancer of the pancreas (Pancreatic adenocarcinoma), cancer of the stomach (Adenocarcinomas), cancer of the liver (Cholangiocarcinoma), cancer of the cervix and ovary (Ovarian cancers). However, smokers often pass away from other smoking-related conditions, including heart disease, stroke, or emphysema. About 10% to 15% of the smokers acquire lung cancer. People who never smoked or who have quit smoking years ago have also been reported to die from lung cancer. In this research, people suffering from cancer and healthy people were separated using Decision Tree, AdaBoost, and aimed to evaluate a specific gene and smoking history algorithms.
DNA, Smoking history, Cancer, Decision tree, AdaBoost
Colorectal Cancer (CRC) is the third most often diagnosed disease globally and the fourth most prevalent cause of cancer-related mortality, with more than one million new cases and about seven hundred thousand fatalities per year (Ferlay J, et al ., 2015). Despite recent advances in early diagnosis and treatment, almost half of patients still die within five years of their diagnosis (de Angelis R, et al ., 2014), necessitating more prognosis-improving initiatives. Smoking, a known risk factor for colorectal adenomas (Botteri E, et al ., 2008; Hoffmeister M, et al ., 2010) and CRC (Botteri E, et al ., 2008; Gong J , et al ., 2012; Hurley S, et al ., 2013; Parajuli R, et al ., 2013; Rasool S, et al ., 2013), has recently been linked to higher overall (Baer HJ, et al ., 2011; Lantz PM, et al ., 2010) and CRC-specific mortality (Botteri E, et al ., 2008; Hou L, et al ., 2014; Liang PS, et al ., 2009) in people who were previously cancer-free. Additionally, a 26% higher total mortality in incident CRC patients was found to be significantly associated with current smoking compared to never smoking (Zhu Y, et al ., 2014). The most recent meta-analysis summarised current evidence on the relationship between pre-diagnostic smoking and post-diagnostic of CRC prognosis (Aarts MJ, et al ., 2013; Ali RA, et al ., 2011; Boyle T, et al ., 2013; Cavalli-Björkman N, et al ., 2012; Daniell HW 1986; Diamantis N, et al ., 2013). Dose-response relationships between smoking intensity and projection were also noted. Of all malignancies, lung cancer has the most remarkable overall fatality rate (Jadallah F, et al ., 1999; McCleary NJ, et al ., 2010; Munro AJ, et al ., 2006; Nickelsen TN, et al ., 2005; Park SM, et al ., 2006; Phipps AI, et al ., 2011; Phipps AI, et al ., 2013; Richards CH, et al ., 2010; Sharma A, et al ., 2013; Warren GW, et al ., 2013; Walter V, et al ., 2014). Numerous studies have linked Socioeconomic Status (SES) to lung cancer, with those from lower socioeconomic origins having the most significant incidence rates (IARC, 2012; Ekberg-Aronsson M, et al ., 2006; Hart CL, et al ., 2001; Mao Y, et al ., 2001; Clegg LX, et al ., 2009; van der Heyden JH, et al ., 2009; Hrubá F, et al ., 2009; Sharpe KH, et al ., 2012; Braveman P, et al ., 2011; Adler NE and Ostrove JM, 1999). The interconnected variables of education, employment, and income are often used to measure SES, which represents one’s place in social hierarchies. Through some related pathways, including material and social resources, physical and psychosocial stresses, and health-related behaviors, SES is connected to health/disease. The most significant risk factor in the etiology of lung cancer, smoking habit, is substantially correlated with SES (Schaap MM, et al ., 2008). However, much research on lung cancer and SES fails to account for smoking behavior effectively (Sidorchuk A, et al ., 2009), and results about how much SES may be attributed to smoking vary (Menvielle G, et al ., 2009; Nkosi TM, et al ., 2012).
Research algorithms
Research algorithms can be used to determine and classify and analyze a computation. There are several research algorithms among which this study used Decision tree and AdaBoost algorithm. Decision tree: One of the well-known techniques for data categorization is the Decision tree Classifier (DTC). The most important characteristic of DTC is its capacity to transform complex decision-making issues into straightforward procedures, resulting in a solution that is clearer and simpler to perceive.
Adaboost: The Boosting approach, known as the AdaBoost algorithm, sometimes called Adaptive Boosting, is used as an Ensemble Method in machine learning. The weights are redistributed to each instance, with larger weights given to mistakenly categorized cases, thus the name “adaptive boosting.”
Data preparation
The data is divided into different genes and smoking history. These classes show the order of the genes in people with two different types of tumors (Kaggle, 2023).
Evolution process
In this study, 10-fold cross-validation is used to improve the efficiency of model training. The 10-fold method divides the input data into a total of ten subsets. 9 subsets are used as training data for each round of the methods’ training, while the remaining subsets are used as test data (Malakouti SM, et al ., 2022; Malakouti SM and Ghiasi AR, 2022; Malakouti SM, et al ., 2022). Figure 1explains about the design of a typical k-fold. The input data is split into 10 subsets (where k=Ten) and the method in this study is trained for ten epochs. The assessed data includes 1022 sick individuals and healthy individuals, among which two hundred fifty-three (253) people were chosen to test the algorithms after they had been trained on 769 people. Figure 2shows the lung cancer prediction learning curve for a logistic regression classifier. After training 769 training samples, the accuracy of the training curve reached 100% and it shows that the performance of the AdaBoost classifier has been very good. Also, the accuracy of the validation curve was 98.9%. Figure 3shows the learning curve for cancer categorizing a Decision tree classifier. After training 769 training samples, the accuracy of the training curve reached 100% the accuracy of the training curve reached 100% and it shows that the performance of Decision tree classifier has been good as well. Also, the accuracy of the validation curve was 98.7%.
Figure 1: 10-fold cross-validation. Note: : Training data; : Test data
Figure 2: Learning curve for cancer categorization for an AdaBoost classifier. Note: : Training score; : Validation score
Figure 3: Learning curve for cancer categorization for a Decision tree classifier. Note: : Training score; : Validation score
Figure 4shows the confusion matrix for cancer categorization for an AdaBoost classifier. Confusion matrix is used to measure the performance of a classification model. For this, Sixty-two (62) people with cancer and 91 healthy people were diagnosed properly. One person with cancer was wrongly diagnosed to be healthy individual. Figure 5shows the confusion matrix for cancer categorization for a Decision tree classifier.
Figure 4: Confusion matrix for cancer categorization for an Ada- Boost classifier
Figure 5: Confusion matrix for cancer categorization for a Decision tree classifier
Calculation of precision, F1, and recall
Formulae for calculating precision, F1 and recall has been described below- Precision=Patients with cancer are correctly identified as ill/patients with cancer are correctly identified as ill+patients without cancer identified incorrectly as sick;
Recall=Patients with cancer are correctly identified as ill/patients with cancer are correctly identified as ill+patients with cancer incorrectly identified as healthy.
F1=2 × Precision × Recall/Precision+Recall
The classification report for cancer categorization for an AdaBoost classifier has been provided as shown in Figure 6. The precision, recall, and F1 score evaluation criteria can be observed in this figure. The precision evaluation criteria for people with cancer was 100%, and the precision evaluation criteria for healthy people was found to be 98.9%. The recall evaluation criteria for healthy people had 100% accuracy. Also, the evaluation criterion of recall for sick people was 98.4%. Finally, the evaluation criterion of the F1 score was 99.5% for healthy people while for that of sick people was 99.2%.
Figure 6: Classification report for cancer categorization for an AdaBoost classifier
Figure 7shows the classification report for cancer categorizing a Decision tree classifier. The precision, recall, and F1 score evaluation criteria can be seen in this figure. The precision evaluation criterion for people with cancer was 100%. And the precision evaluation criterion of 98.9 was obtained for healthy people. The recall evaluation criteria for healthy people had 100% accuracy. Also, the evaluation criterion of recall for sick people was 98.4%. Finally, the evaluation criterion of the F1 score was 99.5% for healthy people and 99.2% for sick people. Class prediction error for cancer categorization for an AdaBoost classifier has been elucidated using Figure 8. Prediction error can measure the discrepancy between expectation and reality. The green color was considered for healthy people and the blue color for people with cancer. The blue color presented shows that our model did not correctly identify healthy people. The absence of green color in people with cancer indicates that people with cancer were correctly diagnosed. Figure 9shows class prediction error for cancer categorization for a Decision tree classifier. The green color was considered for healthy people and the blue color for people with cancer. The blue color in healthy people shows that our model did not correctly identify healthy people. The absence of green color in people with cancer indicates that people with cancer were correctly diagnosed. Moreover the influence of family history and smoking habits also play an important role in cancer. Ultimately, this study depicted that computational models like AdaBoost classifier Decision tree classifier can predict and diagnose particular changes which are likely to be associated with cancer.
Figure 7: Classification report for cancer categorization for a Decision tree classifier
Figure 8: Class prediction error for cancer categorization for an AdaBoost classifier. Note: : Cancer; : Normal
Figure 9: Class prediction error for cancer categorization for a Decision tree classifier. Note: : Cancer; : Normal
A person’s specific gene, family history and smoking history may be associated with the risk of cancer acquiring. This research investigated 1022 healthy and cancer patients. With the help of Decision tree and AdaBoost algorithms the precision, recall, and F1 score criteria were obtained among healthy people and people suffering with cancer. In compassion with the evaluation of AdaBoost classifier and Decision tree classifier the study found that Decision tree classifier had a same function in predicting cancer, so that the precision of people with cancer was obtained 100%. It is crucial to elaborate such computational methods, which would be beneficial for health care professional in detecting fatal illness. Further studies are required to evaluate the degree of accuracy, trueness and precision of such computational methods.
[Crossref] [Google Scholar] [Pubmed]
[Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Google Scholar] [Pubmed]
[Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
[Crossref] [Google Scholar] [Pubmed]
Citation: Malakouti SM: Cancer Risk Assessment Based on Family History and Smoking Habits
Received: 12-May-2023 Accepted: 26-May-2023 Published: 02-Jun-2023, DOI: 10.31858/0975-8453.14.6.396-400
Copyright: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.