一.问题描述.
1. 标题:
【专项测试】【T卡】【>10%】将SD卡格式化为内部存储后选择立即迁移数据,再安装应用,然后数据迁移回内部存储,最后将SD卡格式化为便携式存储,出现手机重启.
2. 前提条件: 插入Kingston 256G SD卡,不插SIM卡
3.【测试步骤】:
(1)插SD卡,格式化为内部存储后立即迁移数据;
(2)连接电脑,选择文件传输模式,将apk文件拖到SD卡中;
(3)进入设置-存储-SD卡-系统自带的文件管理器,点击拖到的3个apk进行安装;
(4)在待机界面分别进入安装的应用后,概览界面清除后台;
(5)进入设置-存储-设备,选择迁移数据;
(6)设置-存储-SD卡,选择将SD卡格式化为便携式存储
4.预期结果:可以成功格式化为便携式存储
5.实际结果:格式化进度在20%时卡住,出现手机自动重启现象.
二.日志分析:
发生了watchdog ,阻塞在ActivityManagerService.monitor()函数. Watchdog 监听android.fg 进程,当长时间阻塞时,会被watchdog 杀掉,因为是system server 进程,所以手机就发生了重启.
08-18 15:21:18.726 8517 8541 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on display thread (android.display)
08-18 15:21:18.734 8517 8541 W Watchdog: android.fg annotated stack trace:
08-18 15:21:18.770 0 0 I [T313686] kworker/3: 1: [name:ufs_mediatek_mod&]ufsshpb_resume:3873 info: ufsshpb_lu 2 resume. do_task_work 0 retry 0
08-18 15:21:18.770 0 0 I [T313686] kworker/3: 1: [name:ufs_mediatek_mod&]ufstw_resume:686 info: ufstw resume start.
08-18 15:21:18.734 8517 8541 W Watchdog: at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:16273)
08-18 15:21:18.734 8517 8541 W Watchdog: - waiting to lock <0x0d7e51e3> (a com.android.server.am.ActivityManagerService)
08-18 15:21:18.734 8517 8541 W Watchdog: at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:354)
08-18 15:21:18.734 8517 8541 W Watchdog: at android.os.Handler.handleCallback(Handler.java:942)
08-18 15:21:18.735 8517 8541 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
08-18 15:21:18.735 8517 8541 W Watchdog: at android.os.Looper.loopOnce(Looper.java:210)
08-18 15:21:18.735 8517 8541 W Watchdog: at android.os.Looper.loop(Looper.java:297)
08-18 15:21:18.735 8517 8541 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
08-18 15:21:18.735 8517 8541 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
08-18 15:21:18.735 8517 8541 W Watchdog: main annotated stack trace:
08-18 15:21:18.735 8517 8541 W Watchdog: at com.android.server.wm.ActivityTaskManagerService$LocalService.isBaseOfLockedTask(ActivityTaskManagerService.java:6659)
08-18 15:21:18.735 8517 8541 W Watchdog: - waiting to lock <0x0c86a295> (a com.android.server.wm.WindowManagerGlobalLock)
08-18 15:21:18.735 8517 8541 W Watchdog: at com.android.server.pm.DeletePackageHelper.deletePackageVersionedInternal(DeletePackageHelper.java:682)
08-18 15:21:18.735 8517 8541 W Watchdog: at com.android.server.pm.PackageManagerService.deletePackageVersioned(PackageManagerService.java:3261)
08-18 15:21:18.735 8517 8541 W Watchdog: at com.android.server.pm.StorageEventHelper.onVolumeForgotten(StorageEventHelper.java:117)
08-18 15:21:18.735 8517 8541 W Watchdog: - locked <0x093b1eea> (a com.android.server.pm.PackageManagerTracedLock)
08-18 15:21:18.735 8517 8541 W Watchdog: at android.os.storage.StorageManager$StorageEventListenerDelegate.lambda$onVolumeForgotten$4$android-os-storage-StorageManager$StorageEventListenerDelegate(StorageManager.java:379)
08-18 15:21:18.735 8517 8541 W Watchdog: at android.os.storage.StorageManager$StorageEventListenerDelegate$$ExternalSyntheticLambda1.run(Unknown Source:4)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Handler.handleCallback(Handler.java:942)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Looper.loopOnce(Looper.java:210)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Looper.loop(Looper.java:297)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.server.SystemServer.run(SystemServer.java:981)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.server.SystemServer.main(SystemServer.java:657)
08-18 15:21:18.736 8517 8541 W Watchdog: at java.lang.reflect.Method.invoke(Native Method)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:569)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:997)
08-18 15:21:18.736 8517 8541 W Watchdog: android.display annotated stack trace:
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.server.wm.WindowManagerService.reportFocusChanged(WindowManagerService.java:5247)
08-18 15:21:18.736 8517 8541 W Watchdog: - waiting to lock <0x0c86a295> (a com.android.server.wm.WindowManagerGlobalLock)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.server.wm.InputManagerCallback$$ExternalSyntheticLambda0.accept(Unknown Source:6)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.internal.util.function.pooled.PooledLambdaImpl.doInvoke(PooledLambdaImpl.java:281)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.internal.util.function.pooled.PooledLambdaImpl.invoke(PooledLambdaImpl.java:204)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.internal.util.function.pooled.OmniFunction.run(OmniFunction.java:97)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Handler.handleCallback(Handler.java:942)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Handler.dispatchMessage(Handler.java:99)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Looper.loopOnce(Looper.java:210)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.Looper.loop(Looper.java:297)
08-18 15:21:18.736 8517 8541 W Watchdog: at android.os.HandlerThread.run(HandlerThread.java:67)
08-18 15:21:18.736 8517 8541 W Watchdog: at com.android.server.ServiceThread.run(ServiceThread.java:44)
08-18 15:21:18.736 8517 8541 W Watchdog: *** GOODBYE!
08-18 15:21:18.743 8517 8541 D AES : AEEIOCTL_RT_MON_Kick IOCTL,cmd= 2147774474, lParam=330.
三. 代码分析:
发现ActivityManagerService.java的monitor 函数,synchronized 里的this对象阻塞.
androidt/frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java
16271 /** In this method we try to acquire our lock to make sure that we have not deadlocked */
16272 public void monitor() {
16273 synchronized (this) { }
16274 }
四. 查看watchdog 发生的堆栈打印:
1. ActivityManagerService.monitor 是在android.fg 线程,目前状态是Blocked. 而且是被thread 198 的线程阻塞.
"android.fg" prio=5 tid=13 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x142c3ac0 self=0xb400007bf6e691b0
| sysTid=1432 nice=0 cgrp=foreground sched=1073741824/0 handle=0x79faf65cb0
| state=S schedstat=( 3468895480 3136054587 9924 ) utm=288 stm=58 core=4 HZ=100
| stack=0x79fae62000-0x79fae64000 stackSize=1039KB
| held mutexes=
at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:16263)
- waiting to lock <0x07c644c5> (a com.android.server.am.ActivityManagerService) held by thread 198
at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:354)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loopOnce(Looper.java:208)
at android.os.Looper.loop(Looper.java:295)
at android.os.HandlerThread.run(HandlerThread.java:67)
at com.android.server.ServiceThread.run(ServiceThread.java:44)
2. 查看thread 198 的打印栈. thread 198 的线程名称是binder:1412_1C, 而且属于Blocked 状态. 被阻塞的线程是thread 1.
"binder:1412_1C" prio=5 tid=198 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x142e2e08 self=0xb400007bf704c590
| sysTid=6253 nice=-2 cgrp=foreground sched=1073741824/0 handle=0x79abb1fcb0
| state=S schedstat=( 2014008775 1150188497 9002 ) utm=146 stm=55 core=2 HZ=100
| stack=0x79aba28000-0x79aba2a000 stackSize=991KB
| held mutexes=
at com.android.server.pm.PackageManagerService$PackageManagerInternalImpl.notifyPackageUse(PackageManagerService.java:6489)
- waiting to lock <0x0eed63f5> (a com.android.server.pm.PackageManagerTracedLock) held by thread 1
at com.android.server.am.ActivityManagerService.notifyPackageUse(ActivityManagerService.java:2849)
at com.android.server.am.ActiveServices.realStartServiceLocked(ActiveServices.java:4396)
at com.android.server.am.ActiveServices.bringUpServiceLocked(ActiveServices.java:4233)
at com.android.server.am.ActiveServices.bindServiceLocked(ActiveServices.java:2956)
at com.android.server.am.ActivityManagerService.bindServiceInstance(ActivityManagerService.java:12789)
- locked <0x07c644c5> (a com.android.server.am.ActivityManagerService)
at com.android.server.am.ActivityManagerService.bindServiceInstance(ActivityManagerService.java:12745)
at android.app.ContextImpl.bindServiceCommon(ContextImpl.java:2035)
at android.app.ContextImpl.bindServiceAsUser(ContextImpl.java:1974)
at com.android.server.job.JobServiceContext.executeRunnableJob(JobServiceContext.java:319)
- locked <0x02bbd241> (a java.lang.Object)
at com.android.server.job.JobConcurrencyManager.startJobLocked(JobConcurrencyManager.java:1118)
at com.android.server.job.JobConcurrencyManager.assignJobsToContextsInternalLocked(JobConcurrencyManager.java:898)
at com.android.server.job.JobConcurrencyManager.assignJobsToContextsLocked(JobConcurrencyManager.java:620)
at com.android.server.job.JobSchedulerService.maybeRunPendingJobsLocked(JobSchedulerService.java:2637)
at com.android.server.job.JobSchedulerService.scheduleAsPackage(JobSchedulerService.java:1259)
- locked <0x02bbd241> (a java.lang.Object)
at com.android.server.job.JobSchedulerService$JobSchedulerStub.enqueue(JobSchedulerService.java:3114)
at android.app.job.IJobScheduler$Stub.onTransact(IJobScheduler.java:165)
at android.os.Binder.execTransactInternal(Binder.java:1280)
at android.os.Binder.execTransact(Binder.java:1244)
3. 查看thread 1. thread 1的名称是main, 也是主线程. 目前状态是Blocked. 等待的线程是thread 175.
"main" prio=5 tid=1 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x7230aaa8 self=0xb400007bf6e33380
| sysTid=1412 nice=-2 cgrp=foreground sched=1073741824/0 handle=0x7d42e1e4f8
| state=S schedstat=( 7442442904 3638959461 25618 ) utm=436 stm=307 core=4 HZ=100
| stack=0x7fe25cf000-0x7fe25d1000 stackSize=8188KB
| held mutexes=
at com.android.server.wm.ActivityTaskManagerService$LocalService.isBaseOfLockedTask(ActivityTaskManagerService.java:6659)
- waiting to lock <0x06ff71a4> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 175
at com.android.server.pm.DeletePackageHelper.deletePackageVersionedInternal(DeletePackageHelper.java:682)
at com.android.server.pm.PackageManagerService.deletePackageVersioned(PackageManagerService.java:3261)
at com.android.server.pm.StorageEventHelper.onVolumeForgotten(StorageEventHelper.java:117)
- locked <0x0eed63f5> (a com.android.server.pm.PackageManagerTracedLock)
at android.os.storage.StorageManager$StorageEventListenerDelegate.lambda$onVolumeForgotten$4$android-os-storage-StorageManager$StorageEventListenerDelegate(StorageManager.java:379)
at android.os.storage.StorageManager$StorageEventListenerDelegate$$ExternalSyntheticLambda1.run(unavailable:4)
at android.os.Handler.handleCallback(Handler.java:942)
at android.os.Handler.dispatchMessage(Handler.java:99)
4. 查看线程 175. thread 175 的名称是binder:1412_13,也是处于Blocked 状态. 发现thread 175 被 thread 1 阻塞了. 从这里就看到了一个死锁的问题. thread 175 等待 thread 1 ,而thread 1 又在等待 thread 175.
"binder:1412_13" prio=5 tid=175 Blocked
| group="main" sCount=1 ucsCount=0 flags=1 obj=0x142e29d0 self=0xb400007bf6fe40d0
| sysTid=5336 nice=-10 cgrp=foreground sched=1073741824/0 handle=0x7967815cb0
| state=S schedstat=( 3461362720 2761431048 14469 ) utm=247 stm=98 core=2 HZ=100
| stack=0x796771e000-0x7967720000 stackSize=991KB
| held mutexes=
at com.android.server.pm.UserManagerService.isQuietModeEnabled(UserManagerService.java:1283)
- waiting to lock <0x0eed63f5> (a com.android.server.pm.PackageManagerTracedLock) held by thread 1
at android.os.UserManager.isQuietModeEnabled(UserManager.java:4567)
at com.android.server.wm.ActivityStartInterceptor.interceptQuietProfileIfNeeded(ActivityStartInterceptor.java:232)
at com.android.server.wm.ActivityStartInterceptor.intercept(ActivityStartInterceptor.java:166)
at com.android.server.wm.ActivityStarter.executeRequest(ActivityStarter.java:1059)
at com.android.server.wm.ActivityStarter.execute(ActivityStarter.java:702)
- locked <0x06ff71a4> (a com.android.server.wm.WindowManagerGlobalLock)
at com.android.server.wm.ActivityTaskManagerService.startActivityAsUser(ActivityTaskManagerService.java:1250)
at com.android.server.wm.ActivityTaskManagerService.startActivityAsUser(ActivityTaskManagerService.java:1213)
at com.android.server.wm.ActivityTaskManagerService.startActivity(ActivityTaskManagerService.java:1188)
at android.app.IActivityTaskManager$Stub.onTransact(IActivityTaskManager.java:893)
at com.android.server.wm.ActivityTaskManagerService.onTransact(ActivityTaskManagerService.java:5241)
at android.os.Binder.execTransactInternal(Binder.java:1280)
at android.os.Binder.execTransact(Binder.java:1244)
5. 整理watchdog 发生的原因:
简单示意图,死锁的原因是tid=1 和 tid=175 相互等待导致的,只要解此锁即可.
"android.fg"(tid=13)
|
|
等待
|
|
V
"binder:1412_1C"(tid=198)
|
|
等待
|
|
V
"main"(tid=1)<-------------------
| |
| |
等待 等待
| |
| |
V |
"binder:1412_13"(tid=175)-----
五. 查看代码.
1. androidt/frameworks/base/services/core/java/com/android/server/pm/StorageEventHelper.java
此时在做sdcard 格式化,调用了deletePackageVersioned 函数.
103 @Override
104 public void onVolumeForgotten(String fsUuid) {
105 if (TextUtils.isEmpty(fsUuid)) {
106 Slog.e(TAG, "Forgetting internal storage is probably a mistake; ignoring");
107 return;
108 }
109
110 // Remove any apps installed on the forgotten volume
111 synchronized (mPm.mLock) {
112 final List<? extends PackageStateInternal> packages =
113 mPm.mSettings.getVolumePackagesLPr(fsUuid);
114 for (PackageStateInternal ps : packages) {
115 Slog.d(TAG, "Destroying " + ps.getPackageName()
116 + " because volume was forgotten");
117 mPm.deletePackageVersioned(new VersionedPackage(ps.getPackageName(),
118 PackageManager.VERSION_CODE_HIGHEST),
119 new PackageManager.LegacyPackageDeleteObserver(null).getBinder(),
120 UserHandle.USER_SYSTEM, PackageManager.DELETE_ALL_USERS);
121 // Try very hard to release any references to this package
122 // so we don't risk the system server being killed due to
123 // open FDs
124 AttributeCache.instance().removePackage(ps.getPackageName());
125 }
126
127 mPm.mSettings.onVolumeForgotten(fsUuid);
128 mPm.writeSettingsLPrTEMP();
129 }
130 }
2. androidt/frameworks/base/services/core/java/com/android/server/wm/ActivityTaskManagerService.java
6657 @Override
6658 public boolean isBaseOfLockedTask(String packageName) {
6659 synchronized (mGlobalLock) {
6660 return getLockTaskController().isBaseOfLockedTask(packageName);
6661 }
6662 }
3. androidt/frameworks/base/services/core/java/com/android/server/wm/ActivityStarter.java
685 int res;
686 synchronized (mService.mGlobalLock) {
687 final boolean globalConfigWillChange = mRequest.globalConfig != null
688 && mService.getGlobalConfiguration().diff(mRequest.globalConfig) != 0;
689 final Task rootTask = mRootWindowContainer.getTopDisplayFocusedRootTask();
690 if (rootTask != null) {
691 rootTask.mConfigWillChange = globalConfigWillChange;
692 }
693 ProtoLog.v(WM_DEBUG_CONFIGURATION, "Starting activity when config "
694 + "will change = %b", globalConfigWillChange);
695
696 final long origId = Binder.clearCallingIdentity();
697
698 res = resolveToHeavyWeightSwitcherIfNeeded();
699 if (res != START_SUCCESS) {
700 return res;
701 }
702 res = executeRequest(mRequest);
703
704 Binder.restoreCallingIdentity(origId);
4.androidt/frameworks/base/services/core/java/com/android/server/pm/UserManagerService.java
从堆栈中可以看出.mPackagesLock 和mPm.mLock 是同一个对象,都是0x0eed63f5.
1281 @Override
1282 public boolean isQuietModeEnabled(@UserIdInt int userId) {
1283 synchronized (mPackagesLock) {
1284 UserInfo info;
1285 synchronized (mUsersLock) {
1286 info = getUserInfoLU(userId);
1287 }
1288 if (info == null || !info.isManagedProfile()) {
1289 return false;
1290 }
1291 return info.isQuietModeEnabled();
1292 }
1293 }
五. 发现.
从堆栈和代码中发现,"main"(tid=1) 是在做sdcard 格式化,而,"binder:1412_13"(tid=175) 里面有startActivityAsUser 函数,可以推测出来,是在做启动activity.
也就是说格式化sdcard 时,又activity 会造成死锁.
为了验证这种可能性,进行了验证.
在格式化sdcard 时,开启camera,使用命令:
adb shell am start -a "android.media.action.STILL_IMAGE_CAMERA"
测试结果,果然复现.
六. 方案:
临时方案:在格式化手机,禁止activity 启动.这个方案有可能涵盖不全.
1. androidt/frameworks/base/services/core/java/com/android/server/wm/ActivityStarter.java
import com.android.server.pm.StorageEventHelper;
int execute() {
....
int res;
synchronized (mService.mGlobalLock) {
final boolean globalConfigWillChange = mRequest.globalConfig != null
&& mService.getGlobalConfiguration().diff(mRequest.globalConfig) != 0;
final Task rootTask = mRootWindowContainer.getTopDisplayFocusedRootTask();
if (rootTask != null) {
rootTask.mConfigWillChange = globalConfigWillChange;
}
ProtoLog.v(WM_DEBUG_CONFIGURATION, "Starting activity when config "
+ "will change = %b", globalConfigWillChange);
final long origId = Binder.clearCallingIdentity();
res = resolveToHeavyWeightSwitcherIfNeeded();
if (res != START_SUCCESS) {
return res;
}
+++ //XIBIN BEGIN
+++ if(StorageEventHelper.volumeForgotten){
+++ Slog.e(TAG,"the volume is forgetting,interrupt activity start.");
+++ return START_SUCCESS;
+++ }
+++ //XIBIN END
res = executeRequest(mRequest);
2. androidt/frameworks/base/services/core/java/com/android/server/pm/StorageEventHelper.java
//XIBIN BEGIN
public static boolean volumeForgotten = false;
//XIBIN END
@Override
public void onVolumeForgotten(String fsUuid) {
if (TextUtils.isEmpty(fsUuid)) {
Slog.e(TAG, "Forgetting internal storage is probably a mistake; ignoring");
return;
}
// Remove any apps installed on the forgotten volume
+++ //XIBIN BEGIN
+++ try{
+++ volumeForgotten = true;
synchronized (mPm.mLock) {
final List<? extends PackageStateInternal> packages =
mPm.mSettings.getVolumePackagesLPr(fsUuid);
for (PackageStateInternal ps : packages) {
Slog.d(TAG, "Destroying " + ps.getPackageName()
+ " because volume was forgotten");
mPm.deletePackageVersioned(new VersionedPackage(ps.getPackageName(),
PackageManager.VERSION_CODE_HIGHEST),
new PackageManager.LegacyPackageDeleteObserver(null).getBinder(),
UserHandle.USER_SYSTEM, PackageManager.DELETE_ALL_USERS);
// Try very hard to release any references to this package
// so we don't risk the system server being killed due to
// open FDs
AttributeCache.instance().removePackage(ps.getPackageName());
}
mPm.mSettings.onVolumeForgotten(fsUuid);
mPm.writeSettingsLPrTEMP();
}
+++ }finally{
+++ volumeForgotten = false;
+++ }
+++ //XIBIN END