Iranian Society of Machine Vision and Image ProcessingJournal of Machine Vision and Image Processing2383-119710120230321Unsupervised Domain Adaptation in Person Reidentification by Learning the Features of Both Source and Target DomainsUnsupervised Domain Adaptation in Person Reidentification by Learning the Features of Both Source and Target Domains115159221FASaba Sadat Faghih ImaniMS in Information Technology Engineering, Deep Learning Research Lab, Department of Computer Engineering, College of Farabi, University of TehranKazim FouladiDeep Learning Research Lab, Department of Computer Engineering, College of Farabi, University of TehranHossein AghababaDepartment of Computer Engineering, College of Farabi, University of TehranJournal Article20221020Person reidentification problem is intended to retrieve images of one person from the images captured by non-overlapping cameras. Despite the successful performance of the deep person reidentification models, the performance usually decreases during testing the model on different unlabeled datasets.<br />In this paper, a well-generalized model for unsupervised domain adaptation in person reidentificationis proposed. The model uses both labeled source dataset and unlabeled target dataset during training and the goal is to generalize well on the unlabeled target domain. To this end, our model is optimized by three loss functions. The final loss function consists of one loss function for supervised learning of the source domain’s features, another for unsupervised learning of the target domain’s features, and a triplet loss function for learning the features of both source and target domains. The proposed model with strategy 2 for selecting neighbors achieves 84.5 % in rank-1 accuracy and 63% for mAP on Duke -> Market setting. It also achieves 70.1 % in rank-1 accuracy and 49.1 % for mAP on Market -> Duke setting.<br /> Person reidentification problem is intended to retrieve images of one person from the images captured by non-overlapping cameras. Despite the successful performance of the deep person reidentification models, the performance usually decreases during testing the model on different unlabeled datasets.<br />In this paper, a well-generalized model for unsupervised domain adaptation in person reidentificationis proposed. The model uses both labeled source dataset and unlabeled target dataset during training and the goal is to generalize well on the unlabeled target domain. To this end, our model is optimized by three loss functions. The final loss function consists of one loss function for supervised learning of the source domain’s features, another for unsupervised learning of the target domain’s features, and a triplet loss function for learning the features of both source and target domains. The proposed model with strategy 2 for selecting neighbors achieves 84.5 % in rank-1 accuracy and 63% for mAP on Duke -> Market setting. It also achieves 70.1 % in rank-1 accuracy and 49.1 % for mAP on Market -> Duke setting.<br /> https://jmvip.sinaweb.net/article_159221_224c9145fa1ac9a9e100c6b97f416df2.pdfIranian Society of Machine Vision and Image ProcessingJournal of Machine Vision and Image Processing2383-119710120230321A two-stream action recognition method based on complementary traditional and deep featuresA two-stream action recognition method based on complementary traditional and deep features1731154176FAAtefe MoradyaniMs.C Student of computer engineering, Faculty of Engineering, University of Kurdistan, Sanandaj, IranMohsen RamezaniDepartment of computer engineering, Faculty of Engineering, University of Kurdistan, Sanandaj, IranFardin Akhlaghian TabDepartment of computer engineering, Faculty of Engineering, University of Kurdistan, Sanandaj, IranRahmatollah MirzaeiDepartment of Electrical engineering, Faculty of engineering, University of Kurdistan, Sanandaj, IranJournal Article20220729Today, human action recognition as an important research field is used in different applications and many computer-vision researches have focused on this area to improve recognition accuracy. In this paper, a two-stream method is introduced incorporating a new structure including two spatial features to cover their defects. Utilizing this structure leads to better performance finally. In the first stream, wavelet coefficients of key-frames with proper multi-resolution are extracted, and deep features of these key-frames are also extracted to be used in the other stream. The features in each stream are gathered in a spatial feature map. The temporal changes in both streams are learnt using a new deep network and the classification information of these streams are combined to achieve an accurate action label. The proposed method is examined on three challenging datasets as UCFYT, UCF-sport, and JHMDB with real videos which its accuracy on these datasets is 98.7, 99.83, and 92.86, respectively. The proposed method has about 4.6 percent better performance rather than the best previously introduced method on average.Today, human action recognition as an important research field is used in different applications and many computer-vision researches have focused on this area to improve recognition accuracy. In this paper, a two-stream method is introduced incorporating a new structure including two spatial features to cover their defects. Utilizing this structure leads to better performance finally. In the first stream, wavelet coefficients of key-frames with proper multi-resolution are extracted, and deep features of these key-frames are also extracted to be used in the other stream. The features in each stream are gathered in a spatial feature map. The temporal changes in both streams are learnt using a new deep network and the classification information of these streams are combined to achieve an accurate action label. The proposed method is examined on three challenging datasets as UCFYT, UCF-sport, and JHMDB with real videos which its accuracy on these datasets is 98.7, 99.83, and 92.86, respectively. The proposed method has about 4.6 percent better performance rather than the best previously introduced method on average.https://jmvip.sinaweb.net/article_154176_729db704d5038e1645d36ea9f0e5ebe9.pdfIranian Society of Machine Vision and Image ProcessingJournal of Machine Vision and Image Processing2383-119710120230321Covid-19 Detection based on Multi-Source Adversarial Transfer Learning and Center Loss FunctionCovid-19 Detection based on Multi-Source Adversarial Transfer Learning and Center Loss Function3348154817FAHadi AlharesPhD. Student of Computer Engineering, Faculty of Electrical and Computer Engineering, University of Tabriz, IranJafar TanhaFaculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranMohammad Ali BalafarFaculty of Electrical and Computer Engineering, University of Tabriz, Tabriz, IranJournal Article20220814In recent years, deep learning techniques have been widely used to diagnose diseases. However, in the diagnosis of Covid-19 disease, due to insufficient data, the model is not properly trained and as a result, the generalizability of the model decreases. To address this, data from several different sources can be combined using transfer learning. technique. In this paper, to improve the transfer learning technique and better generalizability between multiple data sources, we propose a multi-source adversarial transfer learning model. In this method, the network, while trying to classify the data correctly, tries to make the representations of the source and target datasets as similar as possible to achieve better results in terms of quantity and quality for both datasets. we also use the center loss function to train the model. Using the center loss function helps to better distinguish classes from each other. We show that accuracy can be improved using the proposed framework, and surpass the results of current successful transfer learning approaches. The proposed method has achieved 2, 15, 15, and 8% improvement compared to the best results of other compared methods for the criteria of accuracy, precision, recall, and F1. The implementation code of the proposed method is available at the following GitHub address: https://github.com/HadiAlhares/Covid19In recent years, deep learning techniques have been widely used to diagnose diseases. However, in the diagnosis of Covid-19 disease, due to insufficient data, the model is not properly trained and as a result, the generalizability of the model decreases. To address this, data from several different sources can be combined using transfer learning. technique. In this paper, to improve the transfer learning technique and better generalizability between multiple data sources, we propose a multi-source adversarial transfer learning model. In this method, the network, while trying to classify the data correctly, tries to make the representations of the source and target datasets as similar as possible to achieve better results in terms of quantity and quality for both datasets. we also use the center loss function to train the model. Using the center loss function helps to better distinguish classes from each other. We show that accuracy can be improved using the proposed framework, and surpass the results of current successful transfer learning approaches. The proposed method has achieved 2, 15, 15, and 8% improvement compared to the best results of other compared methods for the criteria of accuracy, precision, recall, and F1. The implementation code of the proposed method is available at the following GitHub address: https://github.com/HadiAlhares/Covid19https://jmvip.sinaweb.net/article_154817_16094d1bc7e1f32b26ccf85f8c0a3fe2.pdfIranian Society of Machine Vision and Image ProcessingJournal of Machine Vision and Image Processing2383-119710120230321Mass Detection in Automated Three Dimensional Breast Ultrasound using Improved Inception 3D U-NetMass Detection in Automated Three Dimensional Breast Ultrasound using Improved Inception 3D U-Net4959154823FASepideh BarekatrezaeiPhD. Student of Computer Engineering, Iran University of Science and Technology, Tehran, IranAmin MalekmohammadiMsC. Student Computer Engineering, Iran University of Science and Technology, Tehran, IranEhsan KozegarDept. of Engineering, University of Guilan, Guilan, IranMasoumeh SalamatiMohsen SoryaniDept. of Computer Engineering, Science and Technology University of Iran, Tehran, Iran0000-0002-8555-9617Journal Article20220814Breast cancer is the leading cause of cancer death among women in most countries. Early detection of breast cancer has a significant effect on reducing mortality. Automated three-dimensional breast ultrasound (3D ABUS) is a type of imaging that has recently been used alongside mammography for the early detection of breast cancer. The 3D volume includes many slices. The radiologist will have to look at all the slices to find the mass, which is time-consuming with a high probability of mistakes. Today, many computer-aided detection (CAD) systems have been proposed to help radiologists in mass detection.<br />In this paper, the 3D U-Net architecture is improved by placing two types of modified Inception modules in the encoder and used to detect masses in 3D ABUS imahges. In the first Inception module, which is located in the first layer of the encoder, various three-dimensional features with two different fields of view are generated. In the second module, which is placed in the following layers of the encoder, line-wise features and plane-wise features are extracted. The dataset contains 60 3D ABUS volumes from 43 patients and includes 55 masses. The proposed network achieves a sensitivity of 92.9% and a false-positive per patient of 22.75Breast cancer is the leading cause of cancer death among women in most countries. Early detection of breast cancer has a significant effect on reducing mortality. Automated three-dimensional breast ultrasound (3D ABUS) is a type of imaging that has recently been used alongside mammography for the early detection of breast cancer. The 3D volume includes many slices. The radiologist will have to look at all the slices to find the mass, which is time-consuming with a high probability of mistakes. Today, many computer-aided detection (CAD) systems have been proposed to help radiologists in mass detection.<br />In this paper, the 3D U-Net architecture is improved by placing two types of modified Inception modules in the encoder and used to detect masses in 3D ABUS imahges. In the first Inception module, which is located in the first layer of the encoder, various three-dimensional features with two different fields of view are generated. In the second module, which is placed in the following layers of the encoder, line-wise features and plane-wise features are extracted. The dataset contains 60 3D ABUS volumes from 43 patients and includes 55 masses. The proposed network achieves a sensitivity of 92.9% and a false-positive per patient of 22.75https://jmvip.sinaweb.net/article_154823_7b63ce84567f6607adb5e76ab6daac18.pdfIranian Society of Machine Vision and Image ProcessingJournal of Machine Vision and Image Processing2383-119710120230321The effect of image normalization and iteration number of the linear despeckle filtering on the structural similarity criteria of the consecutive ultrasound images of the common carotid arteryThe effect of image normalization and iteration number of the linear despeckle filtering on the structural similarity criteria of the consecutive ultrasound images of the common carotid artery6174156310FAEffat SoleimaniDepartment of radiology technology, Shahid Beheshti University of Medical Sciences;0000-0003-3376-7323Hazhir SaberiImaging Center of Imam Khomaini Hospital, Tehran University of Medical Sciences,Journal Article20220908The aim of the present study is to evaluate the effect of image normalization and iteration number of the linear despeckle filtering on the consecutive ultrasound image quality of the carotid artery and to select the optimum iteration number of ultrasound despeckle filtering. 750 consecutive ultrasonic images over three cardiac cycles of the common carotid artery of three healthy male volunteers (32±9Yr) and 250 consecutive ultrasonic images over three cardiac cycles of the common carotid artery of a male volunteers (65 Yr) having atherosclerotic stenosis were recorded. Using a custom-written program in MATLAb software, the images were first normalized based on gray scale level of the blood and adventitia. Then a linear despeckle filter was applied in 10 iteration to the normalized images. The quality of the images processed with different iterations were evaluated via metrics including mean, variance, signal to noise ratio, relative contrast, noise speckle index, contrast to speckle ratio and structural similarity.<br />Results of the present study shows that among all evaluated metrics, structural similarity is the only metric which is not monotone with iteration number so that by increasing the iteration, initially it increases and then decreases. The optimum iteration of the despeckling filter is that of the maximum structural similarity. According to the results of the present study it seems that 2 to 5 iterations of linear filtering of size 5×5 is required to obtain the maximum structural similarity and further increasing the iteration number results in image texture loss while more computational cost.The aim of the present study is to evaluate the effect of image normalization and iteration number of the linear despeckle filtering on the consecutive ultrasound image quality of the carotid artery and to select the optimum iteration number of ultrasound despeckle filtering. 750 consecutive ultrasonic images over three cardiac cycles of the common carotid artery of three healthy male volunteers (32±9Yr) and 250 consecutive ultrasonic images over three cardiac cycles of the common carotid artery of a male volunteers (65 Yr) having atherosclerotic stenosis were recorded. Using a custom-written program in MATLAb software, the images were first normalized based on gray scale level of the blood and adventitia. Then a linear despeckle filter was applied in 10 iteration to the normalized images. The quality of the images processed with different iterations were evaluated via metrics including mean, variance, signal to noise ratio, relative contrast, noise speckle index, contrast to speckle ratio and structural similarity.<br />Results of the present study shows that among all evaluated metrics, structural similarity is the only metric which is not monotone with iteration number so that by increasing the iteration, initially it increases and then decreases. The optimum iteration of the despeckling filter is that of the maximum structural similarity. According to the results of the present study it seems that 2 to 5 iterations of linear filtering of size 5×5 is required to obtain the maximum structural similarity and further increasing the iteration number results in image texture loss while more computational cost.https://jmvip.sinaweb.net/article_156310_2ae84f54aa3be1b320a1d2a7918f33b3.pdfIranian Society of Machine Vision and Image ProcessingJournal of Machine Vision and Image Processing2383-119710120230321Automatic image captioning using capsule neural network and ELMo embedding techniqueAutomatic image captioning using capsule neural network and ELMo embedding technique7591160256FAShima JavanmardiPhd. Student, Computer Science, Yazd University0000-0002-3027-5895, Ali Mohammad LatifDepartment of Computer Engineering , Yazd University, Yazd, IranMohammad Taghi SadeghiDepartment of Electrical Engineering , Yazd University, Yazd, IranJournal Article20221109Automatic image captioning is a challenging task in computer vision and aims to generate computer-understandable descriptions for images. Employing convolutional neural networks (CNN) has a key role in image caption generation. However, during the process of generating descriptions for an image, there are two major challenges for CNN, such as: they do not consider the relationships and spatial hierarchical structures between the objects in the image, and the lack of resistance against rotational changes of the images. In order to solve these challenges, this paper presents an improved capsule network to describe image content using natural language processing by considering the relations between the objects . A capsule contains a set of neurons that consider the parameters of the state of objects in the image, such as size, direction, scale, and relationships of objects to each other. These capsules have a special focus on extracting meaningful features for use in the process of generating relevant descriptions for a given set of images. Qualitative tests on the MS-COCO dataset using the capsule network and ELMo embedding technique have resulted in 2-5% improvement in the evaluated metrics compared to existing image captioning models.Automatic image captioning is a challenging task in computer vision and aims to generate computer-understandable descriptions for images. Employing convolutional neural networks (CNN) has a key role in image caption generation. However, during the process of generating descriptions for an image, there are two major challenges for CNN, such as: they do not consider the relationships and spatial hierarchical structures between the objects in the image, and the lack of resistance against rotational changes of the images. In order to solve these challenges, this paper presents an improved capsule network to describe image content using natural language processing by considering the relations between the objects . A capsule contains a set of neurons that consider the parameters of the state of objects in the image, such as size, direction, scale, and relationships of objects to each other. These capsules have a special focus on extracting meaningful features for use in the process of generating relevant descriptions for a given set of images. Qualitative tests on the MS-COCO dataset using the capsule network and ELMo embedding technique have resulted in 2-5% improvement in the evaluated metrics compared to existing image captioning models.https://jmvip.sinaweb.net/article_160256_b219b6b2dde59970511ca2d4fbe433be.pdf