梳理 Pedestrian Attribute Recognition(行人属性识别) 相关介绍，数据集，算法

Intro

Foreword

（reference https://arxiv.org/pdf/1901.07474.pdf https://github.com/wangxiao5791509/Pedestrian-Attribute-Recognition-Paper-List）

What is Pedestrian Attribute Recognition

Pedestrian attributes, are humanly searchable semantic descriptions and can be used as soft-biometrics in visual surveillance, with applications in person re-identification, face verification, and human identification. Pedestrian attributes recognition (PAR) aims at mining the attributes of target people when given person image, as shown in follow.
行人属性是人为的可搜索的拥有语义信息的描述，可以用来作为一种软生物识别技术在视频监控领域，在re-id，人脸识别，人体识别上都有应用。下面是行人属性识别的展示

Traditional practice and more

Traditional pedestrian attributes recognition methods usually focus on developing robust feature representation from the perspectives of hand-crafted features, powerful classifiers or attributes relations. Some milestones including HOG [1], SIFT [2], SVM [3] or CRF model [4]. However, the reports on large-scale benchmark evaluations suggest that the performance of these traditional algorithms is far from the requirement of realistic applications. Over the past several years, deep learning have achieved an impressive performance due to their success on automatic feature extraction using multi-layer nonlinear transformation, especially in computer vision, speech recognition and natural language processing. Several deep learning based attribute recognition algorithms has been proposed based on these breakthroughs.
传统的做法基本就是手工设计的特征 + 强力的分类器/属性关联，如 HOG SIFT SVM CRF，但是这些在大规模的benchmark上就没那么好使了。在过去的这些年里，深度学习方法因其自动强大的特征提取器带来了令人印象深刻的表现，在行人属性识别上也是如此。

PROBLEM FORMULATION AND CHALLENGES

Multi-views 对同一对象，不同视角观察带来的差异
Occlusion 遮挡
Unbalanced Data Distribution 不平衡的数据|lable 分布
Low Resolution 低分辨率
Illumination 亮暗在一日内变化大
Blur 模糊

Evaluation Criteria

REGULAR PIPELINE FOR PAR

Multi-task Learning
- deep learning based multi-task learning, i.e. the hard and soft parameter sharing. The hard parameter sharing usually take the shallow layers as shared layers to learn the common feature representations of multiple tasks, and treat the high-level layers as taskspecific layers to learn more discriminative patterns. This mode is the most popular framework in the deep learning community. The illustration of hard parameter sharing can be found in Figure 4 (left sub-figure). For the soft parameter sharing multi-task learning (as shown in Figure 4 (right sub-figure)), they train each task independently, but make the parameters between different tasks similar via the introduced regularization constrains, such as L2 distance [53] and trace norm [54].
- 深度学习里多任务学习往往分硬|软参数共享
  - 硬参数共享共享一个backbone，后面接多任务
  - 软参数共享几乎完全独立，通过使不同任务中的参数相似（例如 L2-Distance, trace norm）来建立任务间的联系
Multi-label Learning
- problem transformation
  - binary relevance algorithm
    - 将问题转化为后接多个二分类的分类问题
    - 简单，符合直觉
    - 忽略了标签间的关联关系
  - classifier chain algorithm
    - 将问题转化为二分类链问题，每一个二分类结果以来之前的结果
  - calibrated label ranking algorithm.
    - 将问题转化为标签排序问题
  - random k-Labelsets algorithm
    - 将问题转化为多组标签分类问题
algorithm adaptation
1. multi-label k-nearest neighbour
2. multi-label decision tree
3. ranking support vector machine Rank-SVM
4. collective multilabel classifier

APPLICATIONS

Visual attributes can be seen as a kind of mid-level feature representation which may provide important information for high-level human related tasks, such as person re-identification [128], [129], [130], [131], [132], pedestrian detection [133], person tracking [134], person retrieval [135], [136], human action recognition[137], scene understanding [138]. Due to the limited space of this paper, we only review some works in the rest of this subsections.
视觉属性可以视作一种中间等级的特征，可以用来协助高级的与人相关的任务，例如 re-id 行人检测行人跟踪行人回复行人动作识别场景理解等。此外，行人属性也作为一种标签用于统计单体和群体的画像，在商业场景下用途广泛

Algorithm

Global Image-based Method

ACN

paper Person Attribute Recognition with a Jointly-trained Holistic CNN Model
they adopt a pre-trained AlexNet as basic feature extraction sub-network, and replace the last fully connected layer with one loss per attribute using the KL-loss.
In addition, they also propose a new dataset named as PARSE-27k to support their evaluation. This dataset contains 27000 pedestrians and annotated with 10 attributes. Different from regular person attribute dataset, they propose a new category annotation, i.e., not decidable (N/A). Because for most input images, some attributes are not decidable due to occlusion, image boundaries, or any other reason.
最早期的文章之一，想法简单直接，推出了新的数据集PARSE-27k用做评估

DeepSAR/DeepMAR

paper Multi-attribute Learning for Pedestrian Attribute Recognition in Surveillance Scenarios
git https://github.com/dangweili/pedestrian-attribute-recognition-pytorch
Single Attribute Recognition (SAR)
- DeepSAR model is proposed to recognize each attribute one by one.Treating each attribute as an independent component, the DeepSAR method is proposed to predict each attribute.
Multi-attribute Recognition (MAR)
- To better utilize the relationship among attributes, the unified multi-attribute jointly learning model (DeepMAR) is proposed to learn all the attributes at the same time.
- Different from DeepSAR, the input of the DeepMAR is an image with its attribute label vector and the loss function considers all the attributes jointly.
Experiment
- 属性间的关联关系有助于提升平均准确率，但在具体各项上，有提升的有下降的

MTCNN

paper Multi-task CNN Model for Attribute Prediction
Because our model requires more than one CNN model, we remove the last fully connected layers, as we substitute these layers with our own joint MTL objective loss, depending on the weight parameter matrix learned within.
- 移除原始网络最后一层的fc，使用了都有独特的 joint MTL objective loss 来联合训练
- Feature Sharing and Competition in MTL

Attention-based Method

HydraPlus-Net

paper HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis
git https://github.com/xh-liu/HydraPlus-Net
提出了PA-100K
在行人分析里，不同的分析对象所需要的特征级别不同，尺度不同
- 语义级不同人之间有所区分，但是咋看上去还挺相似例如长头发vs短头发长袖vs短袖
- 低层级例如 clothing stride 就可以很好地在低层特征算出，比高层算出的结果要好
- 尺度差异有些任务的关注点在手部而有些是全身，尺度变化大
由 MainNet AttentiveFeatureNet 构成，由MNet生成特征由AFNet生成attention mask，其中F函数对应的incept模块是MNet中三个incept模块的复制（MNet会先被预训练，主体是当年流行的inception-v2）
一个MDA的例子，对于每一个在MNet中的incept模块，通过和三层incept的相互关联得到新的结果
举个例子如上图所示，原始输出结果结合毗邻特征，结合低层特征就变得更加惊喜噪音也更多，结合高层特征整体更加突出完整信息也变少更集中，对于不同的任务可以各取所需

VeSPA

paper Deep viewsensitive pedestrian attribute inference in an end-to-end model
We adapt a deep neural network for joint pose and multi-label attribute classification. The overall design of our approach is shown in Figure 1. The main network is based on the GoogleNet inception architecture [20]. As shown, the network contains a view classification branch and three view-specific attribute predictor units. The view classifier and attribute predictors are both trained with separate loss functions. Prediction scores from weighted view-specific predictors are aggregated to generate the final multi-class attribute predictions. The whole network is a unified framework and is trained in an end-to-end manner.
- 只要使用了GoogleNet inception architecture
- 分成了正面背面侧面三个分支预测属性
- 引入额外分支用于分类是正面背面侧面，并将其分值分别乘在三个分支上，最终综合得到最后结果
这里引入了front back side，下面引用一张图说明
从结果上看效果很理想，甚至超过了花里胡哨的HPNET

DIAA

paper Deep Imbalanced Attribute Classification using Visual Attention Aggregation
简单直接，且看看具体有什么不同
特殊的attention机制，一边是sigmoid的权重层，另一边是常规卷积层加上空间的正则化
通过spatial softmax达到单层的normalization，结果在iv中展示，确实能使attention更集中了
和focal loss的变种，其中Wc被设定为与类别属性分布相关的权重值，是每一类不同的，作者称其为 Deep Imbalanced Classification
对于非主分支的两个分支使用了常规的分类loss，并记录前值计算标准差作为系数加大loss
PETA上实验结果比较理想

CAM

paper Human attribute recognition by refining attention heat map
git https://github.com/hguosc/Human-attribute-recognition-by-refining-attention-heat-map
展示了效果
在backbone各attribute共享，在FC开始分离，通过weight average layer（The customization of the weighted average layer is used to embed the attention heat map into the network training and the modification of the fully-connected layer is used to output both features and weights.）处理得到attention map，继续常规操作得到结果
可见这个exponential loss主要是为了使最大值更大，使关注点更集中
只有 wider 的结果，较baseline有所提升

Curriculum Learning Method

MTCT

paper Multi-Task Curriculum Transfer Deep Learning of Clothing Attributes
训练分两个阶段，一阶段是常规的属性学习MTN，使用常规的分类loss；第二阶段将一阶段训练完成的MTN重新构建成为三元组，使用e t-distribution Stochastic Triplet Embedding (t-STE) loss进行训练
文章不是针对行人属性的，不做拓展

CILICIA

paper Curriculum learning for multi-task classification of visual attributes
这篇文章的主要思想就是先学习得到task间的相关性，然后定义下个阶段的task组，循环学习到只有2个任务
类似这样，是多stage的做法，不是现在的主流

Graph based Method

DCSA

paper Describing clothing by semantic attributes
整体的思路即使放在现在也不过时，12年的文章。先通过pose estimation得到大致的区域，然后划分区域特征抽取，得到各属性分类结果，然后构建多属性图使用CRF进行推论得到结果。具体方法如SIFT SVM等已不适用现在，不展开

A-AOG

paper Attribute and-or grammar for joint parsing of human pose, parts and attributes
人体分割，按约定的树构建层级关系，各部位分别出各自的属性，效果很炫酷
低层组件定义了14个，中间件2个，root件一个。每个部件拥有9个相关属性，通过属性关联链连接各部分属性。
如果结果理想的话，单个属性会被某几个部分突出地代表
本文对于loss只是一句 CE 带过，整体结果也没有在常见的数据集上，只有思路供参考

PAR w/ GCN

paper Multi-Label Image Recognition with Graph Convolutional Networks
git https://github.com/2014gaokao/pedestrian-attribute-recognition-with-GCN
b图是原始论文网络框架图，c图是git项目的图，原理是相同的。上面分支是backbone出特征，下面是GCN分支出关联关系，点乘得到结果。那么这个GCN的输入输出是什么呢？究竟完成了一个什么事情？
理解GCN 推荐 https://www.zhihu.com/question/54504471/answer/332657604

介绍下GCN代码和glove

# GCN方法，通过矩阵乘改变输出维度
class GraphConvolution(nn.Module):
 
    def __init__(self, in_features, out_features, bias=False):
        super(GraphConvolution, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(in_features, out_features))
        if bias:
            self.bias = Parameter(torch.Tensor(1, 1, out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()
 
    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1))
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)
 
    def forward(self, input, adj):
        support = torch.matmul(input, self.weight)
        output = torch.matmul(adj, support)
        if self.bias is not None:
            return output + self.bias
        else:
            return output
 
    def __repr__(self):
        return self.__class__.__name__ + ' (' \
               + str(self.in_features) + ' -> ' \
               + str(self.out_features) + ')'
 
# GLOVE其实指的是 原始label word2vec 后得到的库，文章里称glove
word_to_ix = {j: i for i, j in enumerate(select_name)}
embeds=nn.Embedding(60,300)
word2vec=torch.tensor([])
for i in range(len(select)):
    lookup_tensor=torch.tensor([word_to_ix[select_name[i]]],dtype=torch.long)
    embed=embeds(lookup_tensor)
    word2vec=torch.cat((word2vec,embed),0)

这张图说明了GCN想要的结果，主要是想要这个条件概率。以下是adj的初始化方式

# 初始化
for idx in partition['train'][0]:
    t=np.array(dataset['att'][idx])[dataset['selected_attribute']]
    for i in range(len(np.array(dataset['att'][idx])[dataset['selected_attribute']])):
        if t[i]==1:
            for j in range(len(np.array(dataset['att'][idx])[dataset['selected_attribute']])):
                if t[j]==1 and j!=i:
                    concur[i][j]+=1
 
# 初始化A norm化卡阈值 + 1 得到初始值
def gen_A(num_classes, t, adj_file):
    import pickle
    result = pickle.load(open(adj_file, 'rb'))
    _adj = result['adj']
    _nums = result['nums']
    _nums = _nums[:, np.newaxis]
    _adj = _adj / _nums
    _adj[_adj < t] = 0
    _adj[_adj >= t] = 1
    #_adj = _adj * 0.9 / (_adj.sum(0) + 1e-6)
    _adj = _adj + np.identity(num_classes, np.int)
    return _adj
 
 
# 真实使用时，对应矩阵运算就是 D^(-1)AD
def gen_adj(A):
    D = torch.pow(A.sum(1).float(), -0.5)
    D = torch.diag(D)
    adj = torch.matmul(torch.matmul(A, D).t(), D)
    return adj
 
 
# adj其实指的是 adjacency matrix, 用于描述图节点的矩阵。预处理后使用

rap的结果，原始论文不是拿来做这个用途没有相关的结果。从结果上看还是很顶的，基本达到了sota水平

VSGR

paper Visua-semantic graph reasoning for pedestrian attribute recognition
Visual-to-semantic Sub-network + Semantic-to-visual Sub-network fuse得到结果
- 目前无开源代码，原文里操作多且复杂，难以非常清晰地理解
Experiment
有待开源后进一步研究

Loss Function based Method

WPAL

paper Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization
从结构上看主要的贡献是 multi-level feature + FSPP + weighted-CE（Flexible Spatial Pyramid Pooling ）
level 1是全图的conv，level 2是用了预设的9个bins去框特征得到的结果
Wi是label在数据集中相对数量的百分比，平衡数据
提出了新的衡量指标 IoP，然而并没有多少人用
在RAP上表现不错，但是这个方法对头发极为不鲁邦

AWMT

paper Adaptively weighted multi-task deep network for person attribute classification
git https://github.com/qiexing/adaptive_weighted_attribute
结构上只是常规的res50，输入同时有train和val，在loss上有所技巧
任务目标是使 Θ 最小，其中 λ 为可更新超参（权重），Lkj是阈值
主题算法一共有两个部分
- 输入 train val 进行forward，train_loss * λ 作为backward的loss; 权重λ在到达更新period时更新
- λ 的更新方法
其实还是用了 val的数据，还间接参与了backward

Part-based Method

Poselets

paper Describing People: A Poselet-Based Approach to Attribute Classification
先使用poselets讲身体分为几张图
- poselet paper http://www.cs.utexas.edu/~cv-fall2012/slides/dinesh-paper.pdf
- We use the method of Bourdev et al. [3] to train 1200 poselets using images from the training and validation sets. Instead of all poselets having the same aspect ratios, we used four aspect ratios: 96x64, 64x64, 64x96 and 64x128 and trained 300 poselets of each. For each poselet, during training, we build a soft mask for the probability of each body component (such as hair, face, upper clothes, lower clothes, etc) at each location within the normalized poselet patch (Figure 5) using body component annotations on the H3D dataset [4].
- 后接3个SVM用于分类

RAD

paper Human Attribute Recognition by Rich Appearance Dictionary

PANDA

paper PANDA: Pose Aligned Networks for Deep Attribute Modeling
与Poselets很相似，但是有了独立的CNN支持
在有独立CNN支持后性能改善较多

MLCNN

paper Multi-label CNN Based Pedestrian Attribute Learning for Soft Biometrics
直接把一张图划分为有overlap的15个块，对不同的块进行卷积操作，使用特定块结果得到特定标签结果，如mid-hair标签使用了1,2,3块。

AAWP

paper Actions and Attributes from Wholes and Parts
这篇文章开始使用 R-CNN 检测，从图像中得到全身，头部，上身，下身的具体位置，切割出来进行属性识别
对检测部分，使用高斯金字塔处理过的图像作为输入，通过多尺度提高结果，已经非常类似与FPN
分类器还是很普通的 CNN + SVM 的做派

ARAP

paper Attribute Recognition from Adaptive Parts
通过关键点得到各部件，然后得到各部分属性
关键点检测和属性识别共享基础backbone，先进行关键点检测，得到感兴趣的部件区域，将特征层上采样与部件区域对其，最后fc分类得到结果
值得一提的是，作者不仅对关键点设置了回归loss，也对bbox的结果进行了 loss 约束，虽然形式有些怪异，不过确实改进了性能
毫无疑问的是，精准定位会带来的属性识别性能提升，尤其是ratio loss带来的对bbox的修正使整体方法更完整

DeepCAMP

paper Deepcamp: Deep convolutional action & attribute mid-level patterns
看着这个示意图青一块紫一块的，其实是聚类(cluster) 的结果，继续往下看
不同于后来的基于attention的无监督注意力机制，本文使用了聚类方法，对各patch特征进行无监督处理
伪代码简单明了很清晰，先抽特征，更新聚类器，得到分值，最后得到结果。值得一提的是，本文使用了LDA作为降维聚类工具
Mid-level Deep Patterns Network 具体如上图所示，图上没有标出的是作者使用了fast-RCNN，在conv4后面是roipooling层，对两个关注的patch点及整个人进行roipooling，concat后得到结果
大幅超过了PANDA

PGDM

paper POSE GUIDED DEEP MODEL FOR PEDESTRIAN ATTRIBUTE RECOGNITION IN SURVEILLANCE SCENARIOS
整体思路和16年的ARAP基本一致，在网络设计上做了改进
Lm:MainNet分类loss*热力weight
Lr:关键点回归smooth-l1loss
Lp:fuse后分类loss
展示了PETA RAP上对HP-Net的全方位压制

DHC

paper Human Attribute Recognition by Deep Hierarchical Contexts
发布了 WIDER Attribute dataset，多人多属性数据集
对于一张input，将其经过高斯金字塔处理，一起送入网络。得到 1.目标人整体 2.目标人各部分检测 3.多尺度各部件近邻特征 4.全图
连乘概率得到最终结果

LGNet

paper Localization guided learning for pedestrian attribute recognition
git https://github.com/lpzjerry/Pedestrian-Attribute-LGNet
介绍地很好，a.原图 b.按预设模板直接分块干 c.基于关键点 d.激活图方法(类似attention) e.依赖定位的方法
1.上面的是全局分支，得到属性
2.1 特征层卷积得到激活图，联合Edge Boxes结果对结果进行多尺度多比例扩展，得到分类层特征 2.2 卷积 + Edge Boxes结果进行roipooling得到基于边缘框的特征 2.3 联合Edge Boxes特征和分类层特征得到最终属性结果
3 联合全局分支和Edge Boxes激活图分支，得到最终结果
Edge Boxes

Sequential Prediction based Method

CNN-RNN

paper Cnn-rnn: A unified framework for multi-label image classification
提出了用联合嵌入空间对标签关联关系的学习 The framework learns a joint embedding space to characterize the image-label relationship as well as label dependency
主要是通过标签隐向量间的关联关系加强结果，本文不是行人属性专门的文章，没有对应的结果

JRL

paper Attribute recognition by joint recurrent learning of context and correlation
一方面是通过LSTM改善暴力切片分析中的上下文信息的关联性，另一方面使用了类似CNN-RNN类似想法的属性关联关系的LSTM，loss就是常规的CE
结果上看，相对于CNN-RNN的各种改版有较大优势

GRL

paper Grouping attribute recognition for pedestrian with joint recurrent learning
git https://github.com/slf12/GRLModel
一个分支通过FCN生成骨骼点，然后通过区域生成层生成头上身下身三个区域，生成特征；另一个分支直接原图进入backbone得到全身特征，再对应FC出几个分支。通过LSTM对特征进行相互关联，最终得到结果
loss上带了正项的权重用于解决标签不平衡的问题
GRL但模型干过了JRL ensemble的结果，主要优势猜测还是骨骼点带来的更精准的语义信息比起JRL的暴力分割

JCM

paper Sequence-based person attribute recognition with joint ctc-attention model
很简略的overview和structure，主要的contribution在于CTC loss和attention model
CTC loss是联立条件概率loss，是通过关联关系提高标签准确率的思路
没有单独画图，只有这一段话说明attention model，感觉不是很清楚
不仅在PETA上表现优秀，在re-id上同样有不俗的表现，同时也用实验证实了re-id任务和attention recognition任务的相辅相成
全文很有借鉴意义，但是attention model没有非常详细的说明，而且没有开源代码，是遗憾

RCRA

paper Recurrent attention model for pedestrian attribute recognition
上来先介绍了 ConvLSTM，和LSTM的主要区别是用 Conv 取代 Linear 以保留spatial 信息
这是作者提出的第一个网络结果叫做 RC ，是一个常规的recurrent 结构。作者设计这个结构的初衷是从中级别卷积特征图中去挖掘标签相关性（A Recurrent Convolutional (RC) framework is proposed to mine the attribute correlations from mid-level convolutional feature maps of attribute groups.）
这是作者提出的第二个网络称之为 RA，和RC非常相似。作者设计这个结构的初衷是希望同时在组间和组内的attention机制能帮助属性识别（And a Recurrent Attention (RA) framework is formulated to recognise pedestrian attributes by group step by step in order to pay attention to both the intra-group and inter-group attention relationship.）
效果都挺不错的，在不同的指标下RC RA各有千秋
这个图的意思是，预测label的顺序，实验证明，从全局到局部的预测顺序比随机预测的效果要好

Dataset

overview

按学术界喜爱排序
1. PETA RAP RAP2.0(19年新出，新的paper会用) PA-100K
2. Market-1501 DukeMTMC (主要用于联合reid使用)
3. 其他

PETA

home http://mmlab.ie.cuhk.edu.hk/projects/PETA.html （港中文信息工程系14发布）
paper http://mmlab.ie.cuhk.edu.hk/projects/PETA_files/Pedestrian%20Attribute%20Recognition%20At%20Far%20Distance.pdf
download(drop box需要翻墙) https://www.dropbox.com/s/52ylx522hwbdxz6/PETA.zip?dl=0
The capability of recognizing pedestrian attributes, such as gender and clothing style, at far distance, is of practical interest in far-view video surveillance scenarios where face and body close-shots are hardly available. We make two contributions in this paper. First, we release a new pedestrian attribute dataset, which is by far the largest and most diverse of its kind. We show that the large-scale dataset facilitates the learning of robust attribute detectors with good generalization performance. Second, we present the benchmark performance by SVM-based method and propose an alternative approach that exploits context of neighboring pedestrian images for improved attribute inference.

RAP Richly Annotated Pedestrian

home(无法正常访问) http://rap.idealtest.org/
git https://github.com/dangweili/RAP
1.0-paper https://arxiv.org/pdf/1603.07054.pdf
- 申请数据集的pdf RAP V1.0 Database License Agreement.pdf
- 申请数据集邮箱 jiajian2018@ia.ac.cn
- RAP has in total 41,585 pedestrian samples, each of which is annotated with 72 attributes as well as viewpoints, occlusions, body parts information.
2.0-paper https://ieeexplore.ieee.org/document/8510891
- 申请数据集的pdf RAP V2.0 Database License Agreement.pdf
- 申请数据集邮箱 jiajian2018@ia.ac.cn
- RAP is a large-scale dataset which contains 84928 images with 72 types of attributes and additional tags of viewpoint, occlusion, body parts, and 2589 person identities.

PA-100K

from sensetime
paper https://arxiv.org/pdf/1709.09930.pdf
git https://github.com/xh-liu/HydraPlus-Net
download-baiduyun https://pan.baidu.com/s/1l-5a__OTwZVkhm_A16HraQ#list/path=%2F
download-googledrive https://drive.google.com/drive/folders/0B5_Ra3JsEOyOUlhKM0VPZ1ZWR2M
we construct a new large-scale pedestrian attribute (PA) dataset named as PA-100K with 100, 000 pedestrian images from 598 scenes, and therefore offer a superiorly comprehensive dataset for pedestrian attribute recognition. To our best knowledge, it is to-date the largest dataset for pedestrian attribute recognition.

Market-1501 Attribute & DukeMTMC-attribute

paper https://arxiv.org/pdf/1703.07220.pdf
We have manually labeled a set of pedestrian attributes for the Market-1501 dataset and the DukeMTMC-reID dataset.
download Market-1501 https://drive.google.com/file/d/1kbDAPetylhb350LX3EINoEtFsXeXB0uW/view
download-annotation-git https://github.com/vana77/Market-1501_Attribute (The annotations are contained in the file market_attribute.mat. “gallery_market.mat” is one prediction example. Then download the code “evaluate_market_attribute.m” in this repository, change the image path and run it to evaluate.)

attribute	representation in file	label
gender	gender	male(1), female(2)
hair length	hair	short hair(1), long hair(2)
sleeve length	up	long sleeve(1), short sleeve(2)
length of lower-body clothing	down	long lower body clothing(1), short(2)
type of lower-body clothing	clothes	dress(1), pants(2)
wearing hat	hat	no(1), yes(2)
carrying backpack	backpack	no(1), yes(2)
carrying bag	bag	no(1), yes(2)
carrying handbag	handbag	no(1), yes(2)
age	age	young(1), teenager(2), adult(3), old(4)
8 color of upper-body clothing	upblack, upwhite, upred, uppurple, upyellow, upgray, upblue, upgreen	no(1), yes(2)
9 color of lower-body clothing	downblack, downwhite, downpink, downpurple, downyellow, downgray, downblue, downgreen,downbrown	no(1), yes(2)

download DukeMTMC baiduyun https://pan.baidu.com/s/1jS0XM7Var5nQGcbf9xUztw (密码 bhbh)
download DukeMTMC Google Drive https://drive.google.com/open?id=1jjE85dRCMOgRtvJ5RQV9-Afs-2_5dY3O
download-annotation-git https://github.com/vana77/DukeMTMC-attribute （The annotations are contained in the file duke_attribute.mat.）

attribute	representation in file	label
gender	gender	male(1), female(2)
length of upper-body clothing	top	short upper body clothing(1), long(2)
wearing boots	boots	no(1), yes(2)
wearing hat	hat	no(1), yes(2)
carrying backpack	backpack	no(1), yes(2)
carrying bag	bag	no(1), yes(2)
carrying handbag	handbag	no(1), yes(2)
color of shoes	shoes	dark(1), light(2)
8 color of upper-body clothing	upblack, upwhite, upred, uppurple, upgray, upblue, upgreen, upbrown	no(1), yes(2)
7 color of lower-body clothing	downblack, downwhite, downred, downgray, downblue, downgreen, downbrown	no(1), yes(2)

WIDER

home http://mmlab.ie.cuhk.edu.hk/projects/WIDERAttribute.html
paper http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2016_human.pdf
download-images https://drive.google.com/file/d/0B-PXtfvNMLanWEVCaHZnR0RHSlE/view
download-annotation http://mmlab.ie.cuhk.edu.hk/projects/WIDERAttribute_files/wider_attribute_annotation.zip
WIDER Attribute is a large-scale human attribute dataset. It contains 13789 images belonging to 30 scene categories, and 57524 human bounding boxes each annotated with 14 binary attributes.