SEGMENTATION OF LUNG CANCER IMAGE BASED ON CYTOLOGIC EXAMINATION USING THRESHOLDING METHOD

Lung cancer is the most dangerous cases which mostly attacks the man with the biggest causes of smoking. This cancer threatens the second largest death after heart attack, lung cancer cases increase significantly every year in various countries. Several methods have been established to detect lung cancer, including Computed Tomography of the thorax, sputum examination and cytology examination. The most decisive examination is through cytologic examination of the pleural fluid. However, the current state of biopsy performed by doctors does not always get a lot of specimens, making it difficult to determine the presence of cancer cells in the lungs. Cytological examination through the pleural fluid has difficulty in detecting cell images. The image of pleural fluid that has a high density between cells will produce an image with low detail, while an image with a low density will produce an image with high detail. Image segmentation is an important part in determining the cellular anatomy of pleural fluid to characterize images with cancer or normal categories. We propose the methodology of research by using group images to separate objects from other objects by highlighting important parts using image segmentation on pleural fluid of patients suspected of having lung cancer. Thresholding method used to see the comparison is Adaptive Thresholding, binary thresholding and Otsu Thresholding. The classification results of the three methods show a high accuracy of 99% on binary thresholding, then 97% accuracy on otsu thresholding and the lowest accuracy of 96% on adaptive thresholding, the three methods are considered to increase in proportion to the addition of the epoch parameter.


INTRODUCTION
Lung cancer is the most dangerous case that attacks men with the biggest cause being smoking.This cancer threatens the second largest death after heart attack.The increase in cases of lung cancer each year occurs significantly in various countries [1].Several methods have been established to detect lung cancer, including Computed Tomography of the thorax/chest area, examination of sputum and cytology examination.Various tests are carried out to confirm the cancer in the lungs, the most providing the highest accuracy through cytological examination of the pleural fluid.
But, the current condition of biopsy performed by doctors does not always get a lot of specimens, so it is difficult to determine there are indications of cancer cells in the lungs [2].
Cytological examination through the pleural fluid has a complexity in image detection.Pleural fluid images that have a high density between cells will produce images with low detail, while the images high density between cells will produce images with low density will produce images with high detail [3].That will influence the accuracy in determining lung cancer cells.Image segmentation is an important part of determining the cellular anatomy of the pleural fluid to characterize images with cancer or normal categories.This study proposes an approach to separate objects from other objects by confirm the important parts.Image segmentation in the pleural fluid was performed using the thresholding method.That is methods implemented for comparison are Binary Thresholding, Adaptive Thresholding and Otsu Thresholding.Before the thresholding process is carried out, it is important to make improvements the object by doing some processing on the image so that the object is more easily recognized by the system and can be implemented better in the segmentation of pleural fluid objects.The results of the thresholding process are classified using the CNN algorithm to find the best segmentation with the highest accuracy.
Some researchers that discussed about image segmentation using thresholding: Rosyani et al. [4] object recognition in digital image segmentation is very important to separate objects from the background of an image.So that the system gets the features it needs.Image of Flowers is an object of research that has difficult complexity.So, this study proposes the Otsu Thresholding method to separate objects from the background.The segmentation process to get the shape features in the form of area, eccentricity, and perimeter.The calculation results of these features are classified using the Naive Bayes algorithm with a higher accuracy of 99.17% with a relative absolute error of 8.093%.The conclusion of this study is that segmentation using Otsu Threshold can clean noise better and increase the level of accuracy in the classification process using Naive Bayes.
Badriyah et al. [3] the research use of thresholding method was also carried out in the object segmentation study from a CT Scan of the brain to determine the characteristics of the stroke type, by looking for anatomy, contour and location of the stroke to characterize the type of stroke.This research was conducted with an object segmentation approach on CT Scan using the thresholding method with a binarization process, including binary thresholding, adaptive thresholding, Otsu thresholding and binary Otsu thresholding.The best segmentation results from this study using Otsu Thresholding.
Siddique et al. [5] this research use Otsu's thresholding method which is implemented for image segmentation.The Otsu thresholding method works by studying the threshold that can maximize the inter-class variance of the entire image.Thresholding is considered as a statistical decision-making theory that can reduce the average error.The result of this study is that the three-level thresholding produce more information about the image than the two-level threshold.So, in conclusion that the image detail will increase with increasing in thresholding level.
Heryanto et al. [6] The research uses the thresholding method for image segmentation, aim to determine the percentage of a color occupying the image area.The results show that the thresholding method produces an accuracy of more than 80% for green, brown and red colors.While objects that are dark or black produce low accuracy, which is less than 60% and below.Accuracy can be increased by making changes to the lower limit and threshold values.

MATERIAL AND METHODS
The system design in this study consists of four (4) main parts, namely data preparation, data preprocessing, segmentation using thresholding and performance analysis method as shown in Figure 1.

Data Collection
The first step of system design is data collection.Data obtained from RSUD dr.Soetomo Surabaya with pleural fluid image object from cytology examination to detect lung cancer.The dataset available has been labeled Cancer and Normal, both of which have been validated by doctors.The number of data with the Cancer label is 500, while the Normal label is 400.So the total available data is 900.Figures 2 and 3 show the shape of cancer and normal cells.The total amount of data is not too much, it is important to do an augmentation process to generate new data without losing the quality of the image.Augmentation is a technique to change and modify an image without losing the essence of the image, so that it is detected by the computer that the image is different.However, humans can recognize that the images are the same image.Augmentation in this study was carried out in the form of flip, brightness and contrast, the aim was that the augmentation process did not remove much of the essence of the original image.The amount of data after augmentation was doubled, so that we got 1000 for all data with cancer label and 800 for data with normal label.

Data Pre-Processing
Data pre-processing is the preparation stage for data processing which has an important role in the process of improving the quality of image data.The pre-processing stage in this study consists of resizing, scaling and grayscale.

Resizing
The initial process after the raw data collected is Resize, where the function and purpose is to make the image size to the size required by the system [7].In general, the larger image size has been obtained the greater accuracy, but the longer training time.The image data available in this study has a size of 4896 x 3672.To reduce the long training time, the data needs to be resized to a smaller size.In this study the image is resized to 456 x 456 for both height and weight.

Scalling
Scaling is a process for enlarging or reducing the number pixels of an image [7].In this study, pixels were subtracted from the dataset.Image pixel values from range [0, 255] are changed to range [0, 1].In addition, the function of scaling is to reduce processing time, because the more pixels used, the longer processing time.This is because the Neural Network works ideally at small input values [8].

Grayscale
Grayscale is the process of changing the color to gray, due to the uneven level of gray in the image.The gray color in a grayscale image is the color R (Red), G (Green), B (Blue) which have the same intensity.So that grayscale images only require a single intensity value compared to color images that require three intensities for each pixel [9].In this study, all images were converted to grayscale color to facilitate retrieval of information in the image.

Segmentation using Threshold
At this stage, experimental scenarios are carried out for several thresholding methods, in order to compare the methods with higher accuracy results.Thresholding is a process where the image is divided into certain areas, namely color or grayscale images into binary images, binary images consist of values 0 and 1.

Fig 3. Thresholding flowchart
Thresholding is used as a determination of area detection based on pixels and intensity which are considered the same [2].The following is a flowchart of the thresholding process as shown in figure 3. Threshold assumes that each image has different pixels from the background.So that, it allows the threshold value to change based on the position in the image.The use of this type of thresholding will affect output of the image [8].This study combines 3 thresholding methods which will compare the results of their respective performances, including:

Adaptive Thresholding
Adaptive Thresholding is a type of local threshold for different image areas.This is known as the local/dynamic threshold.Thresholding treatment on an image will be able to distinguish the pixel value of the object with the background.Adaptive thresholding is customized because the cell image data has an RGB value, with an intensity value varied from 0 -255.So, the use of a threshold is expected to be able separate objects from the background, as well as separate each cell boundary [10].
The adaptive thresholding technique works by calculating the average locally on the lines in the image using a recursive filter.The local threshold value in Adaptive thresholding can be written with the following equation: Where : W = processed block The adaptive thresholding technique works by calculating the average locally on the lines in the image using a recursive filter.The following is the result of the Adaptive Thresholding process seen in Figure 4.

Binary Thresholding
Binary thresholding is a threshold used to separate interesting objects from overlapping backgrounds.The concept and workings of binary thresholding is to find the boundary between the object and the background where both of them have their own pixels, then the boundary is determined by the gray value of the pixels with sharp changes in the image [11].The binary threshold works optimally for character segmentation on a homogeneously illuminated background.The concept and workings of binary thresholding by finding the boundary between the object and the background, then the boundary is determined by the gray value of the pixels with sharp changes in the image.The following is the result of the Binary Thresholding process seen in Figure 5.

Otsu Thresholding
Otsu thresholding is a simple thresholding method used in segmentation techniques, this method divides homogeneous areas based on similarity criteria to identify an object.Binary image grouping assumes that the image contains two classes in the form of a histogram [12].This method is very suitable for finding the threshold value of grayscale images, so as to produce good image segmentation.In this method, the area of the segmented object is obtained quite accurately using a grayscale histogram.The threshold value is search by determining the between class variance with the following equation: The results of the above calculations are search for the highest/maximum value.So that the highest value is used as the threshold value (k), which is written in the following equation: Where : This method performs a homogeneous area division based on similarity criteria to identify an object.Binary image grouping assumes that the image contains two classes in the form of a histogram.The following is the result of the Binary Thresholding process seen in Figure 6.

Analysis Performance Method
Classification is a process to predict the results of image segmentation provide a high accuracy value, and with which thresholding method is suitable to be applied to the image.The image that has been carried out a series of data collection and processing processes, the next step is to classify the data which will be carried out by the training and testing process to determine the appropriate class with the characteristics of data.The use of Deep Learning method will have a significant effect on the data to be tested on the system.
The performance of CNN is expected to show good performance by using the EfficientNet architecture.The use of EfficientNet architecture is considered to be better than others, consisting of 1 Flatten Layer and 1 Output Layer so that it is able to provide higher accuracy and better efficiency levels than other CNN architectures.

RESULT AND DISCUSSION
This section shows the results of the performance analysis of each thresholding.The experiment was conducted with 900 data consisting of cancer and normal labels.The image is segmented using 3 different thresholds which are then classified using the CNN Efficientnet model with batch size 16 parameters, learning rate 0.005 and the use of varying epochs to determine which method has the highest accuracy.The test results will be compared and conclusions will be drawn regarding the effectiveness of the segmentation method.The following results from the experiment using adaptive thresholding segmentation can be seen in Table 1.Experimental results with Adaptive Thresholding get 96% accuracy both with the use of epochs of 30 and 50.There is no change at all in the use of epochs.The following is a plot of the accuracy and loss model shown in Figure 7.The picture above shows that the accuracy value in the training data is higher than the validation data, almost reaching 100%, but the Loss value in the training data is close to 0. Therefore, there is no indication of overfitting.The following results from the second experiment with Binary Thresholding can be seen in Table 2.The results of the experiment with binary Thresholding get 98% accuracy at epoch 30 and an increase of 1% in the use of epoch 50, which is at 99%.The following is a plot of the accuracy and loss model shown in Figure 8.The picture above shows that the accuracy value in the training data is higher than the validation data, almost reaching 100%, but the Loss value in the training data is close to 0. Therefore, there is no indication of overfitting.The following results from the second experiment with Otsu Thresholding can be seen in Table 3.The results of the experiment with Otsu Thresholding get 96% accuracy at epoch 30 and an increase of 1% in the use of epoch 50, which is at 97%.The following is a plot of the accuracy and loss model shown in Figure 9.The picture above shows that the accuracy value in the training data is higher than the validation data, almost reaching 100%, but the Loss value in the training data is close to 0. Therefore, there is no indication of overfitting.Based on the results obtained in the experiment, it can be concluded that the comparison of accuracy is as follows in table 4. The results of the experiment above can be seen that the results obtained are maximal when combined with 50 epochs.The accuracy value increases in proportion to the addition of epochs.This is in line with the research of Rezoana, et al. said that the increasing epoch value, the lower the loss value and increase the accuracy of CNN, so that the resulting image detection can be effective and accurate [13].
The following is a comparison chart of the accuracy results from trial I.

Fig 10. Comparisson of accuracy value
Based on the graph in Figure 10, it can be seen that the results of the experimental scenario using the CNN classification show good performance.It can be seen from the accuracy obtained from each method, as well as the absence of indications of overfitting at the training stage.The performance of using the CNN algorithm provides a fairly high accuracy.This is because on CNN there is feature extraction, so that it is able to provide maximum results even without feature extraction.Research from Suartika et al. proposed the CNN classification method applied to the image, because this method is considered reliable enough to determine the accuracy of the classification image data.CNN performs convolution operations on the underlying main layer.The purpose of convolution on image data is used for feature extraction so as to produce a linear transformation of the input data [14].The more use of the epoch parameter is considered sufficient to have an impact on increasing accuracy, although not significantly.The highest accuracy is obtained in image segmentation using the binary thresholding method at both 30 and 50 epochs.This is because the binary thresholding method works by separating objects from overlapping backgrounds, then at the limit given a sharp change in the image, so the resulting image looks prominent and specific so that it's more easily recognized by the system.The second highest accuracy is the Otsu thresholding method, and the lowest accuracy is the adaptive thresholding.This is because the adaptive thresholding technique works by calculating the average locally on the lines in the image using a recursive filter, so the results from the image are less significant and less well recognized by the system.

CONCLUSION
This research proposes an approach to separate objects from other objects by confirming the important parts.Image segmentation on pleural fluid is done by three thresholding methods.Namely the methods applied for comparison are Binary Thresholding, Adaptive Thresholding and Otsu Thresholding.From the experimental results and analysis obtained, that the results segmentation using thresholding method, all three have good performance when implemented in images using CNN classification.the use of epochs and combinations of methods affect the level of accuracy.The increasing epoch will also increase the value of accuracy.In this experiment, the highest accuracy value was obtained at the epoch 50 parameter with the binary thresholding method, which was 99%, then followed by Otsu thresholding, which was 97% and the lowest was in adaptive thresholding, which was 96%.

Fig 2 .
Fig 2. The cells of (a) cancer (b) normal

Fig 7 .
Fig 7. Accuracy and loss with adaptive thresholding

Fig 8 .
Fig 8. Accuracy and loss with binary thresholding

Fig 9 .
Fig 9. Accuracy and loss with otsu thresholding

Table 1 .
Matrix Comparison use Adaptive Thresholding.

Table 2 .
Matrix Comparison use Binary Thresholding

Table 3 .
Matrix Comparison use Otsu Thresholding

Table 4 .
Comparisson of Accuracy Value