Contact

Address

Department of Computer Science

The University of Sheffield

Regent Court, 211 Portobello Street

Sheffield S1 4DP

United Kingdom

Email

j.k.wang [(åt)] sheffield ({dÖt}) ac {[dôt]} uk

About

I am a Postdoctoral Researcher, currently working on the MultiMT project (led by Prof. Lucia Specia) with the Natural Language Processing Research Group at the Department of Computer Science, The University of Sheffield.

I previously worked on the ERA-NET CHIST-ERA D2K 2011 Visual Sense (ViSen) project, a joint Computer Vision and Natural Language Processing consortium led by Prof. Robert Gaizauskas in Sheffield, and coordinated by Dr Krystian Mikolajczyk from Imperial College London.

I completed my Ph.D. at the School of Computing, University of Leeds, where I worked under the supervision of Dr. Katja Markert (from the Natural Language Processing Group) and the late Dr. Mark Everingham (from the Vision Group). Prior to that I worked on my M.Sc. dissertation under the supervision of Prof. David Hogg.

Research

My research interest lies at the intersection of two applied areas of Artificial Intelligence: Natural Language Processing and Computer Vision. I am interested in exploring the link between textual and visual data and how these two distinct but related modalities can be integrated and be used to complement each other for better text and image understanding. I am also interested in the link between natural language models and vision models and how they relate to human perception and language acquisition. For example, my Ph.D. work involved learning visual recognition of fine-grained categories using 'free-form' textual descriptions.

Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions

Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions

[In EMNLP 2015]

Predict the preposition that best expresses the relation between two visual entities in an image.

See project webpage

Learning Models for Object Recognition from Textual Descriptions

Learning Models for Object Recognition from Natural Language Descriptions

[In BMVC 2009]

Can a system automatically learn an object category model solely from a textual description?

See project webpage

Dataset available for download

Publications

  • Don't Mention the Shoe! A Learning to Rank Approach to Content Selection for Image Description Generation (NEW!)
    Josiah Wang, Robert Gaizauskas
    International Natural Language Generation Conference (INLG), 2016
    Download: [ Paper | Poster | BibTeX ]
    @InProceedings{Wang-Gaizauskas:2016:INLG,
        author    = {Josiah Wang and Robert Gaizauskas},
        title     = {Don't Mention the Shoe! A Learning to Rank Approach to Content Selection for Image Description Generation},
        booktitle = {Proceedings of the Ninth International Natural Language Generation Conference (INLG 2016)},
        month     = {September},
        year      = {2016},
        address   = {Edinburgh, UK},
        publisher = {Association for Computational Linguistics}
    }
    					
  • Overview of the ImageCLEF 2016 Scalable Concept Image Annotation Task (NEW!)
    Andrew Gilbert, Luca Piras, Josiah Wang, Fei Yan, Arnau Ramisa, Emmanuel Dellandrea, Robert Gaizauskas, Mauricio Villegas, Krystian Mikolajczyk
    CLEF2016 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, 2016
    Download: [ Paper | Challenge Webpage | BibTeX ]
    @InProceedings{Gilbert-EtAl:2016:ImageCLEF, 
        title = {{Overview of the ImageCLEF 2016 Scalable Concept Image Annotation Task}},
        author = {Andrew Gilbert and Luca Piras and Josiah Wang and Fei Yan and Arnau Ramisa and Emmanuel Dellandrea and Robert Gaizauskas and Mauricio Villegas and Krystian Mikolajczyk},
        booktitle = {CLEF2016 Working Notes},
        series = {{CEUR} Workshop Proceedings}, 
        year = {2016}, 
        volume = {}, 
        publisher = {CEUR-WS.org}, 
        pages = {}, 
        month = {September 5-8}, 
        address = {Évora, Portugal}
    }
    									
  • General Overview of ImageCLEF at the CLEF 2016 Labs (NEW!)
    Mauricio Villegas, Henning Müller, Alba Gracía Seco de Herrera, Roger Schaer, Stefano Bromuri, Andrew Gilbert, Luca Piras, Josiah Wang, Fei Yan, Arnau Ramisa, Emmanuel Dellandrea, Robert Gaizauskas, Krystian Mikolajczyk, Joan Puigcerver, Alejandro H. Toselli, Joan-Andreu Sánchez, Enrique Vidal
    Experimental IR Meets Multilinguality, Multimodality, and Interaction
    Lecture Notes in Computer Science (vol 9822), 2016
    Download: [ Paper (Springer) | ImageCLEF 2016 Webpage | BibTeX ]
    @InCollection{Villegas-EtAl:2016:ImageCLEF,
      title     = {{General Overview of ImageCLEF at the CLEF 2016 Labs}},
      author    = {Villegas, Mauricio and M\"uller, Henning and Garc\'ia Seco de Herrera, Alba and Schaer, Roger and Bromuri, Stefano and Gilbert, Andrew and Piras, Luca and Wang, Josiah and Yan, Fei and Ramisa, Arnau and Dellandrea, Emmanuel and Gaizauskas, Robert and Mikolajczyk, Krystian and Puigcerver, Joan and Toselli, Alejandro H. and S\'anchez, Joan-Andreu and Vidal, Enrique},
      booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction},
      series    = {Lecture Notes in Computer Science},
      volume    = {9822},
      year      = {2016},
      editor    = {Fuhr, Norbert and Quaresma, Paulo and  Gon\c{c}alves, Teresa and Larsen, Birger and Balog, Krisztian and Macdonald, Craig and Cappellato, Linda and Ferro, Nicola},  
      publisher = {Springer International Publishing},
      isbn      = {978-3-319-44563-2},
      issn      = {0302-9743}
    }
    					
  • SHEF-Multimodal: Grounding Machine Translation on Images (NEW!)
    Kashif Shah, Josiah Wang, Lucia Specia
    First Conference on Machine Translation (WMT), 2016
    Download: [ Paper | Poster | BibTeX ]
    @InProceedings{Shah-EtAl:2016:WMT,
      author    = {Shah, Kashif and Wang, Josiah and Specia, Lucia},
      title     = {SHEF-Multimodal: Grounding Machine Translation on Images},
      booktitle = {First Conference on Machine Translation, Volume 2: Shared Task Papers},
      series = {WMT},
      year      = {2016},
      address   = {Berlin, Germany},
      pages     = {657--662},
      url       = {http://www.aclweb.org/anthology/W/W16/W16-2363}
    }
    					
  • Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer (NEW!)
    Yuxing Tang, Josiah Wang, Boyang Gao, Emmanuel Dellandrea, Robert Gaizauskas, Liming Chen
    Computer Vision & Pattern Recognition (CVPR), 2016
    Download: [ Paper | BibTeX ]
    @InProceedings{Tang-EtAl:2016:CVPR,
      author    = {Yuxing Tang and Josiah Wang and Boyang Gao and Emmanuel Dellandrea and Robert Gaizauskas and Liming Chen},
      title     = {Large Scale Semi-supervised Object Detection using Visual and Semantic Knowledge Transfer},
      booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016)},
      page      = {2119--2128},
      month     = {June},
      year      = {2016}
    }
    					
  • Cross-validating Image Description Datasets and Evaluation Metrics
    Josiah Wang, Robert Gaizauskas
    Language Resources and Evaluation Conference (LREC), 2016
    @InProceedings{Wang-EtAl:2016:LREC,
      author    = {Josiah Wang and Robert Gaizauskas},
      title     = {Cross-validating Image Description Datasets and Evaluation Metrics},
      booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
      year      = {2016},
      month     = {may},
      date      = {23-28},
      location  = {Portorož, Slovenia},
      editor    = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
      publisher = {European Language Resources Association (ELRA)},
      pages     = {3059--3066},
      address   = {Paris, France},
      isbn      = {978-2-9517408-9-1},
      language  = {english}
     }
    					
  • Harvesting Training Images for Fine-Grained Object Categories using Visual Descriptions
    Josiah Wang, Katja Markert, Mark Everingham
    European Conference on Information Retrieval (ECIR), 2016
    @InProceedings{Wang-EtAl:2016:ECIR,
      author    = {Wang, Josiah and Markert, Katja and Everingham, Mark},
      title     = {Harvesting Training Images for Fine-Grained Object Categories using Visual Descriptions},
      booktitle = {Advances in Information Retrieval - 38th European Conference on {IR}
                   Research, {ECIR} 2016, Padua, Italy, March 20-23, 2016. Proceedings},
      pages     = {549--560},
      month     = {March},
      year      = {2016},
      crossref  = {DBLP:conf/ecir/2016},
      url       = {http://dx.doi.org/10.1007/978-3-319-30671-1_40},
      doi       = {10.1007/978-3-319-30671-1_40},
      timestamp = {Fri, 11 Mar 2016 14:07:43 +0100},
      biburl    = {http://dblp.uni-trier.de/rec/bib/conf/ecir/WangME16},
      bibsource = {dblp computer science bibliography, http://dblp.org}
    }
    					
  • Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions
    Arnau Ramisa*, Josiah Wang*, Ying Lu, Emmanuel Dellandrea, Francesc Moreno-Noguer, Robert Gaizauskas (* = equal contribution)
    Empirical Methods in Natural Language Processing (EMNLP), 2015
    @InProceedings{Ramisa-EtAl:2015:EMNLP,
      author    = {Ramisa, Arnau and Wang, Josiah and Lu, Ying and Dellandrea, Emmanuel and Moreno-Noguer, Francesc and Gaizauskas, Robert},
      title     = {Combining Geometric, Textual and Visual Features for Predicting Prepositions in Image Descriptions},
      booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing},
      month     = {September},
      year      = {2015},
      address   = {Lisbon, Portugal},
      publisher = {Association for Computational Linguistics},
      pages     = {214--220},
      url       = {http://aclweb.org/anthology/D15-1022}
    }
    					
  • Defining Visually Descriptive Language
    Robert Gaizauskas, Josiah Wang, Arnau Ramisa
    Workshop on Vision and Language (VL'15) @ EMNLP, 2015
    Download: [ Paper | Poster | Project Webpage | BibTeX ]
    @InProceedings{Gaizauskas-EtAl:2015:VL,
      author    = {Gaizauskas, Robert  and  Wang, Josiah  and  Ramisa, Arnau},
      title     = {Defining Visually Descriptive Language},
      booktitle = {Proceedings of the Fourth Workshop on Vision and Language},
      month     = {September},
      year      = {2015},
      address   = {Lisbon, Portugal},
      publisher = {Association for Computational Linguistics},
      pages     = {10--17},
      url       = {http://aclweb.org/anthology/W15-2805}
    }
    									
  • Generating Image Descriptions with Gold Standard Visual Inputs: Motivation, Evaluation and Baselines
    Josiah Wang, Robert Gaizauskas
    European Workshop on Natural Language Generation (ENLG), 2015

    * There was a bug in our original implementation of the visual prior based on bounding box position. Please refer to the errata for more details.

    @InProceedings{Wang-Gaizauskas:2015:ENLG,
        author    = {Josiah Wang and Robert Gaizauskas},
        title     = {Generating Image Descriptions with Gold Standard Visual Inputs: Motivation, Evaluation and Baselines},
        booktitle = {Proceedings of the 15th European Workshop on Natural Language Generation (ENLG)},
        month     = {September},
        year      = {2015},
        address   = {Brighton, UK},
        publisher = {Association for Computational Linguistics},
        pages     = {117--126},
        url       = {http://www.aclweb.org/anthology/W15-4722}
    }
    					
  • Overview of the ImageCLEF 2015 Scalable Image Annotation, Localization and Sentence Generation Task
    Andrew Gilbert, Luca Piras, Josiah Wang, Fei Yan, Emmanuel Dellandrea, Robert Gaizauskas, Mauricio Villegas, Krystian Mikolajczyk
    CLEF2015 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org, 2015
    Download: [ Paper | Challenge Webpage | BibTeX ]
    @InProceedings{Gilbert-EtAl:2015:ImageCLEF, 
        title = {{Overview of the ImageCLEF 2015 Scalable Image Annotation, Localization and Sentence Generation task}},
        author = {Andrew Gilbert and Luca Piras and Josiah Wang and Fei Yan and Emmanuel Dellandrea and Robert Gaizauskas and Mauricio Villegas and Krystian Mikolajczyk},
        booktitle = {CLEF2015 Working Notes},
        series = {{CEUR} Workshop Proceedings}, 
        year = {2015}, 
        volume = {}, 
        publisher = {CEUR-WS.org}, 
        issn = {1613-0073}, 
        pages = {}, 
        month = {September 8-11}, 
        address = {Toulouse, France}
    }
    									
  • General Overview of ImageCLEF at the CLEF 2015 Labs
    Mauricio Villegas, Henning Müller, Andrew Gilbert, Luca Piras, Josiah Wang, Krystian Mikolajczyk, Alba G. Seco de Herrera, Stefano Bromuri, M. Ashraful Amin, Mahmood Kazi Mohammed, Burak Acar, Suzan Uskudarli, Neda B Marvasti, José F. Aldana, María del Mar Roldán García
    Experimental IR Meets Multilinguality, Multimodality, and Interaction
    Lecture Notes in Computer Science (vol 9283, pp 444-461), 2015
    @InCollection{Villegas-EtAl:2015:ImageCLEF, 
        title = {General Overview of {ImageCLEF} at the {CLEF} 2015 Labs},
        author={Villegas, Mauricio and M{\"u}ller, Henning and Gilbert, Andrew and Piras, Luca and Wang, Josiah and Mikolajczyk, Krystian and de Herrera, Alba Garc{\'i}a Seco and Bromuri, Stefano and Amin, M. Ashraful and Mohammed, Mahmood Kazi and Acar, Burak and Uskudarli, Suzan and Marvasti, Neda B. and Aldana, Jos{\'e} F. and del Mar Rold{\'a}n Garc{\'i}a, Mar{\'i}a},
        booktitle = {Experimental IR Meets Multilinguality, Multimodality, and Interaction}, 
        year = {2015}, 
        publisher = {Springer International Publishing}, 
        series = {Lecture Notes in Computer Science}, 
        editor={Mothe, Josiane and Savoy, Jacques and Kamps, Jaap and Pinel-Sauvagnat, Karen and Jones, Gareth J. F. and San Juan, Eric and Cappellato, Linda and Ferro, Nicola},
        volume = {9283}, 
        isbn = {978-3-319-24026-8}, 
        issn = {0302-9743}, 
        doi = {10.1007/978-3-319-24027-5_45}, 
        pages = {444--461},
        url = {http://dx.doi.org/10.1007/978-3-319-24027-5_45}
    } 
    									
  • A Poodle or a Dog? Evaluating Automatic Image Annotation Using Human Descriptions at Different Levels of Granularity
    Josiah K. Wang, Fei Yan, Ahmet Aker, Robert Gaizauskas
    Workshop on Vision and Language (VL'14) @ COLING, 2014
    Download: [ Paper | Presentation Slides | BibTeX ]
    @InProceedings{Wang-EtAl:2014:VL,
        author    = {Josiah Wang and Fei Yan and Ahmet Aker and Robert Gaizauskas},
        title     = {A Poodle or a Dog? Evaluating Automatic Image Annotation Using Human Descriptions at Different Levels of Granularity},
        booktitle = {Proceedings of the Third Workshop on Vision and Language},
        month     = {August},
        year      = {2014},
        address   = {Dublin, Ireland},
        publisher = {Dublin City University and the Association for Computational Linguistics},
        pages     = {38--45},
        url       = {http://www.aclweb.org/anthology/W14-5406}
    }
    
    									
  • Learning Models for Object Recognition from Natural Language Descriptions
    Josiah Wang, Katja Markert, Mark Everingham
    British Machine Vision Conference (BMVC), 2009
    @InProceedings{Wang-EtAl:2009:BMVC,
       title = "Learning models for object recognition from natural language descriptions",
       author = "Josiah Wang and Katja Markert and Mark Everingham",
       booktitle = "Proceedings of the British Machine Vision Conference",
       year = "2009"
    }
    									
  • Which English dominates the World Wide Web, British or American?
    Eric Atwell, Junaid Arshad, Chien-Ming Lai, Lan Nim, Noushin Rezapour Asheghi, Josiah Wang, Justin Washtell
    Corpus Linguistics, 2007
    Download: [ Paper | BibTeX ]
    @InProceedings{Atwell-EtAl:2007:CL,
       title = "Which {English} dominates the {W}orld {W}ide {W}eb, {B}ritish or {A}merican?",
       author = "Eric Atwell and Junaid Arshad and Chien-Ming Lai and Lan Nim and Noushin Rezapour Asheghi and Josiah Wang and Justin Washtell",
       booktitle = "Proceedings of Corpus Linguistics 2007",
       year = "2007"
    }
    									

    Theses

  • Learning Visual Recognition of Fine-grained Object Categories from Textual Descriptions
    Ph.D. Thesis, 2013
    Download: [ PDF upon request | BibTeX ]
    @PhdThesis{Wang:2013:PHDTHESIS,
        author      = "Josiah Wang",
        title       = "Learning visual recognition of fine-grained object categories from textual descriptions",
        school      = "School of Computing, University of Leeds",
        address     = "England",    
        year        = "2013"
    }
    									
  • Representation and Recognition of Compound Spatio-temporal Entities
    M.Sc. Thesis, 2007.
    Download: [ PDF | BibTeX ]
    @MastersThesis{Wang:2007:MSCSTHESIS,
        author      = "Josiah Wang",
        title       = "Representation and recognition of compound spatio-temporal entities",
        school      = "School of Computing, University of Leeds",
        address     = "England",
        year        = "2007"
    }