Recently I started working with Glaucoma Images preparation for AI/CV applications. Computer Vision Annotation Tool (CVAT) was deployed locally and fundus images uploaded into an organizational project within CVAT. Each Image is being annotated as follows
- Grading [Tag]
- Glaucoma Status [SELECT RADIO BUTTON]
- —– [DEFAULT]
- Normal
- Suspect
- Glaucoma
- Non-Gradable
- Other Retinal
- Grader Remarks [TEXT]
- Consultant Remarks [TEXT]
- Glaucoma Status [SELECT RADIO BUTTON]
Attributes
- rading [Tag]
- Glaucoma Status [SELECT RADIO BUTTON]
- —– [DEFAULT]
- Normal
- Suspect
- Glaucoma
- Non-Gradable
- Other Retinal
- Grader Remarks [TEXT]
- Consultant Remarks [TEXT]
Constructor Code
[
{
"name": "Grading",
"id": 60,
"color": "#97af27",
"type": "tag",
"attributes": [
{
"id": 215,
"name": "Glaucoma Status",
"input_type": "radio",
"mutable": false,
"values": [
"—-",
"Normal",
"Suspect",
"Glaucoma",
"Not Gradable",
"Other Retinal"
],
"default_value": "—-"
},
{
"id": 216,
"name": "Grader Remarks",
"input_type": "text",
"mutable": false,
"values": [
""
],
"default_value": ""
},
{
"id": 217,
"name": "Consultant Remarks",
"input_type": "text",
"mutable": false,
"values": [
""
],
"default_value": ""
}
]
}
]
Code language: JSON / JSON with Comments (json)
Exporting the Attrinbute annotations
I thought the best format would be the CVAT for Images format as that supports all attributes. Hence that was chosen. Export can be done at project level [for all tasks in that project] as well as at task level [for only that task and its jobs]
Annotations were therefore exported and saved as a zip file. Within the zip file is an ANNOTATIONS.XML that containes the metadata as well as the data.
How to convert the annotations as a simple CSV file that can be easily shared and analysed in Stata etc
Python to the rescue
It took a half day to figure the following code out
from zipfile import ZipFile
from pprint import pprint
from lxml import etree
import csv
startFile = 'PROJECT_EXPORT.zip'
with ZipFile(startFile, 'r') as zip:
zip.printdir()
zip.extractall()
tree = etree.parse('annotations.xml')
root = tree.getroot()
# METADATA
set_name= root.find('meta/project/name').text
set_created= root.find('meta/project/created').text
set_updated= root.find('meta/project/updated').text
set_dumped= root.find('meta/dumped').text
# opening the csv file in 'w' mode
csvfile = open(startFile+'.csv', 'w', newline ='')
with csvfile:
fieldnames = ['set_name','image_name', 'Glaucoma_Status', 'Grader_Remarks', 'Consultant_Remarks', 'set_created', 'set_updated', 'set_dumped']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
# Looping over images and getting relevant tag attributes
list_images = root.findall(".//image")
for image in list_images:
name = image.attrib.get("name")
Glaucoma_Status = image.find('.//tag/attribute[@name="Glaucoma Status"]').text
Grader_Remarks = image.find('.//tag/attribute[@name="Grader Remarks"]').text
Consultant_Remarks = image.find('.//tag/attribute[@name="Consultant Remarks"]').text
data = {'image_name': name, 'Glaucoma_Status': Glaucoma_Status, 'Grader_Remarks': Grader_Remarks, 'Consultant_Remarks': Consultant_Remarks, 'set_name':set_name, 'set_created': set_created, 'set_updated':set_updated, 'set_dumped': set_dumped}
print(data['Glaucoma_Status'])
writer.writerow(data)
Code language: Python (python)
Let us break it down
- Open the exported ZIP file and Extract its contents
- Parse the XML using LXML ElementTree and save the root element in a variable
- Extract meta-data information from the relevant XML nodes <– Change this as you deem fit
- Open a new CSV dile and name it the same as start file
- Write the CSV header using dictionary writer
- Use the findall method on root element to identify all ‘image’ elements
- Loop over the list of all image elements and extract the three relevant attributes along with image name
- Create a ‘data’ python dictionary mapping the image level fields as well as the meta fields into the dictionary
- Write the ‘data’ dictionary to the CSV file
Done !
TBD: Error handling, stripping line breaks from the text fields